Topic 02-strings.md
Strings
Converting To and From Strings
The tonumber function will explicitly convert a string to a number; it will return
nil
if the conversion is not possible. It can also be used to convert hexadecimal
numbers like so:
val = tonumber('FF',16) -- result is 255
How about converting numbers to strings? tostring does the general job of converting any Lua value into a string. (The print function calls tostring on its arguments.) If you want more control, then use the string.format function:
string.format("%5.2f",math.pi) == '"3.14"
These %
format specifiers will be familiar to C and Python programmers, but basic
usage is straightforward: the ‘f’ specifier has a total field with (5) and a number
of decimal places (2) and gives fixed floating-point format; the ‘e’ specifier gives
scientific notation. ’s' is a string, ‘d’ is an integer, and ‘x’ is for outputing
numbers in hex format.
print(string.format("The answer to the %s is %d", "universe", 42) ) --> The answer to the universe is 42
Concatenation and Substrings
There is a set of standard operations on strings. We saw that ‘adding’ strings would
try to treat them as numbers. To join strings together (concatenate) there is the
..
operator:
"1".."2" == "12"
Most languages use +
to mean this, so note the difference. Using a different
operator makes it clear that 1 .. 2
results in the string “12” and not
the number 3
As with arrays, #s
is the length of the string s
. (This is the number of bytes, not the
number of characters.)
The opposite operation is extracting substrings.
string.sub("hello",1,4) == "hell" string.sub("hello",4) == "lo"
The first number is the start index (starting at one, as with arrays) and the
second number is the final index; the result includes the last index, so that
sub(s,1,1)
gives the first ‘character’ in the string:
-- printing out the characters of a string for i = 1,#s do print(string.sub(s,i,i)) end
Finding and Matching
It is not possible to treat a string as an array – s[i]
is not meaningful. (It
will just silently return nil
) A Lua string is not a sequence of characters, but
a read-only lump of bytes; it is not very efficient to process a string by iterating
over its bytes and in fact Lua provides much more powerful techniques for string
manipulation.
For instance, a naive solution to the problem of finding a character in a string involves looking at one character at a time; the string.find function is faster and less trouble.
string.find('hello','e') == 2
In general, this function will return two values, the index of the start and the finish of the matched substring:
print(string.find('hello','lo')) --> 4 5
(Which are exactly the numbers you need to feed to string.sub .)
This may not seem so very useful, because we knew the length of the substring. However, string.find is much more powerful than a simple string matcher.
In general, the ‘substring’ is a Lua string pattern. If you have previously met regular expressions, then string patterns will seem familiar. For instance, the string pattern ‘l+’ means ‘one or more’ repetitions of ‘l’.
print(string.find('hello','l+')) --> 3 4
‘Character classes’ make string patterns much more powerful. The pattern ‘[a-z]+’ means ‘one or more letter in the range 'a’ to ‘z’:
print(string.find('hello','[a-z]+')) --> 1 5
That is, it matches the whole string. So we could write a function is_lower
like so:
function is_lower(s) i1,i2 = string.find(s,'[a-z]+') return i1 == 1 and i2 == #s end
But there is a neater way. The pattern ‘^[a-z]+$’ does the job, since it says that
the sequence of one or more letters must start at the begining (‘^’) and end at the
finish (‘$’). So string.find will return nil
for ‘ hello’.
Lua provides names for common character classes; ‘%a’ is short for ‘[a-zA-Z]’ and ‘%d’ is short for ‘[0-9]’. ‘%s’ stands for any whitespace, i.e. ‘[ \t\r\n]’. The capital letter versions stand for any characters not in the set, so ‘%S’ stands for anything that is not a space. So the pattern ‘^%S+$’ will match any sequence of characters that does not contain a space. (These are different from the usual regular expression syntax, which is to use a backslash. So Lua patterns tend to be easier to read than regular expressions. However, they are more limited.)
String patterns are an important part of learning Lua well, and we will return to them in this Cookbook. But you should always be aware of them, because string.find normally assumes that the match is a pattern that contains ‘magic’ characters. For instance, ‘$’ stands for ‘end of string’; if you wanted to find an actual ‘$’ in a string then you have two options:
- escape the magic character like so: ‘%$’
- use
string.find(s,sub,1,true)
; the last argument means ‘plain match’.
string.match is similar to string.find , except that it does not return the index range, but rather the match itself.
print(string.match('hello dolly','%a+')) ---> hello
Here the pattern means ‘one or more alphabetic characters’, so the match gives us the first word. You could do this with a combination of string.find and string.sub , but string.match is more general and efficient. Consider:
print(string.match('hello dolly','(%a+)%s+(%a+)')) ---> hello dolly
Here match
returns two matches, which are indicated using parentheses
in the string pattern. These are often called captures. So the pattern
would read like this ‘capture some letters, skip some space, and capture
some more letters’.
Finally, there is string.gmatch which iterates over all the matches in a string. A common task is finding all the words in a string, separated by spaces. The pattern ‘%S+’ means ‘one or more non-space character’, but string.match will only give you a fixed number of matches; string.gmatch will find them all.
local str = 'one two three' for s in string.gmatch(str,'%S+') do print('"'..s..'"') end --> "one" "two" "three"
This suggests the following useful function, which breaks up a string into a table of words:
function split(str) local t = {} for s in string.gmatch(str.'%S+') do t]#+1] = s end return t end
To split a string with other delimiters is just a matter of choosing the right
pattern. For instance, ‘[%S,]+’ matches
‘one or more characters from the set of non-space and comma’. You could
use this to split ‘one, two, three’ into {'one','two','three'}
.
The special pattern ‘.’ matches one arbitrary byte. So
for c in string.gmatch('.') do print(string.byte(c)) end
prints all the byte codes in a string.
Substituting Strings
A very powerful function for modifying strings is string.gsub (for global substitute):
string.gsub('hello help','e','a') --> hallo halp 2
It replaces each match of the pattern with a given string, and returns the resulting string and the number of substitutions. There can also be a fourth argument which lets you set the maximum number of substitutions:
gsub = string.gsub gsub('hello help','e','a',1) --> hallo help 1
There is no form that does a ‘plain match’ like string.find so you will have to be careful to escape magic characters. So if you wanted to replace ‘[’ then you would have to write it as ‘%[’. So the second argument is always a Lua string pattern:
gsub('hello help','%a+','*') --> * * 2
The third substitution argument is very flexible; it can be a string, table or function.
If it’s a string it may refer to captures in the match like %1
,
but apart from having to say ‘%%’ to mean ‘%’ it is otherwise a plain string.
gsub('hello help','%a+','[%1]') --> [hello] [help] 2
If the subtitution is a table, then each capture is looked up in the table:
gsub('hello $you','%$(%a+)',{you = 'help'}) --> hello help 1
And if the substitution is a function, the capture is passed to it; if it returns non-nil the result will be the substitution:
gsub('hello $TMP','%$(%a+)',os.getenv) --> hello C:\Users\steve\AppData\Local\Temp 1