Topic 02-arrays.md

Tables and Arrays

Python-style Lists

One of the elegant things about Lua is that tables do the job of both lists and dicts (as called in Python) or vectors and maps, (as called in C++), and they do it efficiently. However, if we are dealing with 'tables with numerical indices' we may as well call them lists and look for operations which particularly make sense for lists. The Penlight List class was originally written by Nick Trout for Lua 5.0, and translated to 5.1 and extended by myself. It seemed that borrowing from Python was a good idea, and this eventually grew into Penlight.

Here is an example showing List in action; it redefines __tostring, so that it can print itself out more sensibly:

 > List = require 'pl.List'  --> automatic with require 'pl' <---
 > l = List()
 > l:append(10)
 > l:append(20)
 > = l
 {10,20}
 > l:extend {30,40}
 {10,20,30,40}
 > l:insert(1,5)
 {5,10,20,30,40}
 > = l:pop()
 40
 > = l
 {5,10,20,30}
 > = l:index(30)
 4
 > = l:contains(30)
 true
 > = l:reverse()  ---> note: doesn't make a copy!
 {30,20,10,5}

Although methods like sort and reverse operate in-place and change the list, they do return the original list. This makes it possible to do method chaining, like ls = ls:append(10):append(20):reverse():append(1). But (and this is an important but) no extra copy is made, so ls does not change identity. List objects (like tables) are mutable, unlike strings. If you want a copy of a list, then List(ls) will do the job, i.e. it acts like a copy constructor. However, if passed any other table, List will just set the metatable of the table and not make a copy.

A particular feature of Python lists is slicing. This is fully supported in this version of List , except we use 1-based indexing. So List.slice works rather like string.sub :

 > l = List {10,20,30,40}
 > = l:slice(1,1)  ---> note: creates a new list!
 {10}
 > = l:slice(2,2)
 {20}
 > = l:slice(2,3)
 {20,30}
 > = l:slice(2,-2)
 {20,30}
 > = l:slice_assign(2,2,{21,22,23})
 {10,21,22,23,30,40}
 > = l:chop(1,1)
 {21,22,23,30,40}

Functions like slice_assign and chop modify the list; the first is equivalent to Pythonl[i1:i2] = seq and the second to del l[i1:i2].

List objects are ultimately just Lua 'list-like' tables, but they have extra operations defined on them, such as equality and concatention. For regular tables, equality is only true if the two tables are identical objects, whereas two lists are equal if they have the same contents, i.e. that l1[i]==l2[i] for all elements.

 > l1 = List {1,2,3}
 > l2 = List {1,2,3}
 > = l1 == l2
 true
 > = l1..l2
 {1,2,3,1,2,3}

The List constructor can be passed a function. If so, it's assumed that this is an iterator function that can be repeatedly called to generate a sequence. One such function is io.lines ; the following short, intense little script counts the number of lines in standard input:

 -- linecount.lua
 require 'pl'
 ls = List(io.lines())
 print(#ls)

List.iterate captures what List considers a sequence. In particular, it can also iterate over all 'characters' in a string:

 > for ch in List.iterate 'help' do io.write(ch,' ') end
 h e l p >

Since the function iterate is used internally by the List constructor, strings can be made into lists of character strings very easily.

There are a number of operations that go beyond the standard Python methods. For instance, you can partition a list into a table of sublists using a function. In the simplest form, you use a predicate (a function returning a boolean value) to partition the list into two lists, one of elements matching and another of elements not matching. But you can use any function; if we use type then the keys will be the standard Lua type names.

 > ls = List{1,2,3,4}
 > ops = require 'pl.operator'
 > ls:partition(function(x) return x > 2 end)
 {false={1,2},true={3,4}}
 > ls = List{'one',math.sin,List{1},10,20,List{1,2}}
 > ls:partition(type)
 {function={function: 00369110},string={one},number={10,20},table={{1},{1,2}}}

This is one List method which returns a table which is not a List . Bear in mind that you can always call a List method on a plain table argument, so List.partition(t,type) works as expected. But these functions will only operate on the array part of the table.

Stacks occur everywhere in computing. List supports stack-like operations; there is already pop (remove and return last value) and append acts like push (add a value to the end). push is provided as an alias for append, and the other stack operation (size) is simply the size operator #. Queues can also be implemented; you use pop to take values out of the queue, and put to insert a value at the begining.

Map and Set classes

The Map class exposes what Python would call a 'dict' interface, and accesses the hash part of the table. The name 'Map' is used to emphasize the interface, not the implementation; it is an object which maps keys onto values; m['alice'] or the equivalent m.alice is the access operation. This class also provides explicit set and get methods, which are trivial for regular maps but get interesting when Map is subclassed. The other operation is update, which extends a map by copying the keys and values from another table, perhaps overwriting existing keys:

 > Map = require 'pl.Map'
 > m = Map{one=1,two=2}
 > m:update {three=3,four=4,two=20}
 > = m == M{one=1,two=20,three=3,four=4}
 true

The method values returns a list of the values, and keys returns a list of the keys; there is no guarantee of order. getvalues is given a list of keys and returns a list of values associated with these keys:

 > m = Map{one=1,two=2,three=3}
 > = m:getvalues {'one','three'}
 {1,3}
 > = m:getvalues(m:keys()) == m:values()
 true

When querying the value of a Map , it is best to use the get method:

 > print(m:get 'one', m:get 'two')
 1     2

The reason is that m[key] can be ambiguous; due to the current implementation, m["get"] will always succeed, because if a value is not present in the map, it will be looked up in the Map metatable, which contains a method get. There is currently no simple solution to this annoying restriction.

A Set is a special kind of Map , where all the values are true. So get will always return either true or nil; all the values are keys, and the order is not important. So in this case values is defined to return a list of the keys. Sets can display themselves, and the basic operations like union (+) and intersection (*) are defined.

 > Set = require 'pl.Set'
 > = Set{'one','two'} == Set{'two','one'}
 true
 > fruit = Set{'apple','banana','orange'}
 > = fruit['banana']
 true
 > = fruit['hazelnut']
 nil
 > = fruit:values()
 {apple,orange,banana}
 > colours = Set{'red','orange','green','blue'}
 > = fruit,colours
 [apple,orange,banana]   [blue,green,orange,red]
 > = fruit+colours
 [blue,green,apple,red,orange,banana]
 > = fruit*colours
 [orange]

There are also the methods difference and symmetric_difference. The first answers the question 'what fruits are not colours?' and the second 'what are fruits and colours but not both?'

 > = fruit - colours
 [apple,banana]
 > = fruit ^ colours
 [blue,green,apple,red,banana]

Adding elements to a set is either done like fruit['peach'] = true or by fruit:set('peach'). Removing is either fruit['apple'] = nil or fruit:unset('apple').

There are also some useful classes which also inherit from Map . An OrderedMap behaves like a Map but keeps its keys in order if you use its set method to add keys and values. Like all the 'container' classes in Penlight, it defines an iter method for iterating over its values; this will return the keys and values in the order of insertion; the keys and values methods likewise.

A MultiMap allows multiple values to be associated with a given key. So set (as before) takes a key and a value, but calling it with the same key and a different value does not overwrite but adds a new value. get (or using []) will return a list of values.

(See pl.Map and pl.Set )

Useful Operations on Tables

Some notes on terminology: Lua tables are usually list-like (like an array) or map-like (like an associative array or dict); they can of course have a list-like and a map-like part. Some of the table operations only make sense for list-like tables, and some only for map-like tables. (The usual Lua terminology is the array part and the hash part of the table, which reflects the actual implementation used; it is more accurate to say that a Lua table is an associative map which happens to be particularly efficient at acting like an array.)

The functions provided in table provide all the basic manipulations on Lua tables, but as we saw with the List class, it is useful to build higher-level operations on top of those functions. For instance, to copy a table involves this kind of loop:

 local res = {}
 for k,v in pairs(T) do
     res[k] = v
 end
 return res

The tablex module provides this as copy , which does a shallow copy of a table. There is also deepcopy which goes further than a simple loop in two ways; first, it also gives the copy the same metatable as the original (so it can copy objects like List above) and any nested tables will also be copied, to arbitrary depth. There is also icopy which operates on list-like tables, where you can set optionally set the start index of the source and destination as well. It ensures that any left-over elements will be deleted:

 asserteq(icopy({1,2,3,4,5,6},{20,30}),{20,30})   -- start at 1
 asserteq(icopy({1,2,3,4,5,6},{20,30},2),{1,20,30}) -- start at 2
 asserteq(icopy({1,2,3,4,5,6},{20,30},2,2),{1,30}) -- start at 2, copy from 2

(This code from the tablex test module shows the use of pl.test.asserteq )

Whereas, move overwrites but does not delete the rest of the destination:

 asserteq(move({1,2,3,4,5,6},{20,30}),{20,30,3,4,5,6})
 asserteq(move({1,2,3,4,5,6},{20,30},2),{1,20,30,4,5,6})
 asserteq(move({1,2,3,4,5,6},{20,30},2,2),{1,30,3,4,5,6})

(The difference is somewhat like that between C's strcpy and memmove.)

To summarize, use copy or deepcopy to make a copy of an arbitrary table. To copy into a map-like table, use update ; to copy into a list-like table use icopy , and move if you are updating a range in the destination.

To complete this set of operations, there is insertvalues which works like table.insert except that one provides a table of values to be inserted, and removevalues which removes a range of values.

 asserteq(insertvalues({1,2,3,4},2,{20,30}),{1,20,30,2,3,4})
 asserteq(insertvalues({1,2},{3,4}),{1,2,3,4})

Another example:

 > T = require 'pl.tablex'
 > t = {10,20,30,40}
 > = T.removevalues(t,2,3)
 {10,40}
 > = T.insertvalues(t,2,{20,30})
 {10,20,30,40}

In a similar spirit to deepcopy , deepcompare will take two tables and return true only if they have exactly the same values and structure.

 > t1 = {1,{2,3},4}
 > t2 = deepcopy(t1)
 > = t1 == t2
 false
 > = deepcompare(t1,t2)
 true

find will return the index of a given value in a list-like table. Note that like string.find you can specify an index to start searching, so that all instances can be found. There is an optional fourth argument, which makes the search start at the end and go backwards, so we could define rfind like so:

 function rfind(t,val,istart)
     return tablex.find(t,val,istart,true)
 end

find does a linear search, so it can slow down code that depends on it. If efficiency is required for large tables, consider using an index map. index_map will return a table where the keys are the original values of the list, and the associated values are the indices. (It is almost exactly the representation needed for a set.)

 > t = {'one','two','three'}
 > = tablex.find(t,'two')
 2
 > = tablex.find(t,'four')
 nil
 > il = tablex.index_map(t)
 > = il['two']
 2
 > = il.two
 2

A version of index_map called makeset is also provided, where the values are just true. This is useful because two such sets can be compared for equality using deepcompare :

 > = deepcompare(makeset {1,2,3},makeset {2,1,3})
 true

Consider the problem of determining the new employees that have joined in a period. Assume we have two files of employee names:

 (last-month.txt)
 smith,john
 brady,maureen
 mongale,thabo

 (this-month.txt)
 smith,john
 smit,johan
 brady,maureen
 mogale,thabo
 van der Merwe,Piet

To find out differences, just make the employee lists into sets, like so:

 require 'pl'

 function read_employees(file)
   local ls = List(io.lines(file)) -- a list of employees
   return tablex.makeset(ls)
 end

 last = read_employees 'last-month.txt'
 this = read_employees 'this-month.txt'

 -- who is in this but not in last?
 diff = tablex.difference(this,last)

 -- in a set, the keys are the values...
 for e in pairs(diff) do print(e) end

 --  *output*
 -- van der Merwe,Piet
 -- smit,johan

The difference operation is easy to write and read:

 for e in pairs(this) do
   if not last[e] then
     print(e)
   end
 end

Using difference here is not that it is a tricky thing to code, it is that you are stating your intentions clearly to other readers of your code. (And naturally to your future self, in six months time.)

find_if will search a table using a function. The optional third argument is a value which will be passed as a second argument to the function. pl.operator provides the Lua operators conveniently wrapped as functions, so the basic comparison functions are available:

 > ops = require 'pl.operator'
 > = tablex.find_if({10,20,30,40},ops.gt,20)
 3       true

Note that find_if will also return the actual value returned by the function, which of course is usually just true for a boolean function, but any value which is not nil and not false can be usefully passed back.

deepcompare does a thorough recursive comparison, but otherwise using the default equality operator. compare allows you to specify exactly what function to use when comparing two list-like tables, and comparenoorder is true if they contain exactly the same elements. Do note that the latter does not need an explicit comparison function - in this case the implementation is actually to compare the two sets, as above:

 > = compare_no_order({1,2,3},{2,1,3})
 true
 > = compare_no_order({1,2,3},{2,1,3},'==')
 true

(Note the special string '==' above; instead of saying ops.gt or ops.eq we can use the strings '>' or '==' respectively.)

There are several ways to merge tables in PL. If they are list-like, then see the operations defined by pl.List , like concatenation. If they are map-like, then merge provides two basic operations. If the third arg is false, then the result only contains the keys that are in common between the two tables, and if true, then the result contains all the keys of both tables. These are in fact generalized set union and intersection operations:

 > S1 = {john=27,jane=31,mary=24}
 > S2 = {jane=31,jones=50}
 > = tablex.merge(S1, S2, false)
 {jane=31}
 > = tablex.merge(S1, S2, true)
 {mary=24,jane=31,john=27,jones=50}

When working with tables, you will often find yourself writing loops like in the first example. Loops are second nature to programmers, but they are often not the most elegant and self-describing way of expressing an operation. Consider the map function, which creates a new table by applying a function to each element of the original:

 > = map(math.sin, {1,2,3,4})
 {  0.84,  0.91,  0.14, -0.76}
 > = map(function(x) return x*x end, {1,2,3,4})
 {1,4,9,16}

map saves you from writing a loop, and the resulting code is often clearer, as well as being shorter. This is not to say that 'loops are bad' (although you will hear that from some extremists), just that it's good to capture standard patterns. Then the loops you do write will stand out and acquire more significance.

pairmap is interesting, because the function works with both the key and the value.

 > t = {fred=10,bonzo=20,alice=4}
 > = pairmap(function(k,v) return v end, t)
 {4,10,20}
 > = pairmap(function(k,v) return k end, t)
 {'alice','fred','bonzo'}

(These are common enough operations that the first is defined as values and the second as keys .) If the function returns two values, then the second value is considered to be the new key:

 > = pairmap(t,function(k,v) return v+10, k:upper() end)
 {BONZO=30,FRED=20,ALICE=14}

map2 applies a function to two tables:

 > map2(ops.add,{1,2},{10,20})
 {11,22}
 > map2('*',{1,2},{10,20})
 {10,40}

The various map operations generate tables; reduce applies a function of two arguments over a table and returns the result as a scalar:

 > reduce ('+', {1,2,3})
 6
 > reduce ('..', {'one','two','three'})
 'onetwothree'

Finally, zip sews different tables together:

 > = zip({1,2,3},{10,20,30})
 {{1,10},{2,20},{3,30}}

Browsing through the documentation, you will find that tablex and List share methods. For instance, tablex.imap and List.map are basically the same function; they both operate over the array-part of the table and generate another table. This can also be expressed as a list comprehension C 'f(x) for x' (t) which makes the operation more explicit. So why are there different ways to do the same thing? The main reason is that not all tables are Lists: the expression ls:map('#') will return a list of the lengths of any elements of ls. A list is a thin wrapper around a table, provided by the metatable List . Sometimes you may wish to work with ordinary Lua tables; the List interface is not a compulsory way to use Penlight table operations.

Operations on two-dimensional tables

Two-dimensional tables are of course easy to represent in Lua, for instance {{1,2},{3,4}} where we store rows as subtables and index like so A[col][row]. This is the common representation used by matrix libraries like LuaMatrix. pl.array2d does not provide matrix operations, since that is the job for a specialized library, but rather provides generalizations of the higher-level operations provided by pl.tablex for one-dimensional arrays.

iter is a useful generalization of ipairs . (The extra parameter determines whether you want the indices as well.)

 > a = {{1,2},{3,4}}
 > for i,j,v in array2d.iter(a,true) do print(i,j,v) end
 1       1       1
 1       2       2
 2       1       3
 2       2       4

Note that you can always convert an arbitrary 2D array into a 'list of lists' with List(tablex.map(List,a))

map will apply a function over all elements (notice that extra arguments can be provided, so this operation is in effect function(x) return x-1 end)

 > array2d.map('-',a,1)
 {{0,1},{2,3}}

2D arrays are stored as an array of rows, but columns can be extracted:

 > array2d.column(a,1)
 {1,3}

There are three equivalents to tablex.reduce . You can either reduce along the rows (which is the most efficient) or reduce along the columns. Either one will give you a 1D array. And reduce2 will apply two operations: the first one reduces the rows, and the second reduces the result.

 > array2d.reduce_rows('+',a)
 {3,7}
 > array2d.reduce_cols('+',a)
 {4,6}
 > -- same as tablex.reduce('*',array.reduce_rows('+',a))
 > array2d.reduce2('*','+',a)
 21    `

tablex.map2 applies an operation to two tables, giving another table. array2d.map2 does this for 2D arrays. Note that you have to provide the rank of the arrays involved, since it's hard to always correctly deduce this from the data:

 > b = {{10,20},{30,40}}
 > a = {{1,2},{3,4}}
 > = array2d.map2('+',2,2,a,b)  -- two 2D arrays
 {{11,22},{33,44}}
 > = array2d.map2('+',1,2,{10,100},a)  -- 1D, 2D
 {{11,102},{13,104}}
 > = array2d.map2('*',2,1,a,{1,-1})  -- 2D, 1D
 {{1,-2},{3,-4}}

Of course, you are not limited to simple arithmetic. Say we have a 2D array of strings, and wish to print it out with proper right justification. The first step is to create all the string lengths by mapping string.len over the array, the second is to reduce this along the columns using math.max to get maximum column widths, and last, apply stringx.rjust with these widths.

 maxlens = reduce_cols(math.max,map('#',lines))
 lines = map2(stringx.rjust,2,1,lines,maxlens)

There is product which returns the Cartesian product of two 1D arrays. The result is a 2D array formed from applying the function to all possible pairs from the two arrays.

 > array2d.product('{}',{1,2},{'a','b'})
 {{{1,'b'},{2,'a'}},{{1,'a'},{2,'b'}}}

There is a set of operations which work in-place on 2D arrays. You can swap_rows and swap_cols ; the first really is a simple one-liner, but the idea is to give the operation a name. remove_row and remove_col are generalizations of table.remove . Likewise, extract_rows and extract_cols are given arrays of indices and discard anything else. So, for instance, extract_cols(A,{2,4}) will leave just columns 2 and 4 in the array.

List.slice is often useful on 1D arrays; array2d.slice does the same thing, but is generally given a start (row,column) and a end (row,column).

 > A = {{1,2,3},{4,5,6},{7,8,9}}
 > B = slice(A,1,1,2,2)
 > write(B)
  1 2
  4 5
 > B = slice(A,2,2)
 > write(B,nil,'%4.1f')
  5.0 6.0
  8.0 9.0

Here array2d.write is used to print out an array nicely; the second parameter is nil, which is the default (stdout) but can be any file object and the third parameter is an optional format (as used in string.format ).

parse_range will take a spreadsheet range like 'A1:B2' or 'R1C1:R2C2' and return the range as four numbers, which can be passed to slice . The rule is that slice will return an array of the appropriate shape depending on the range; if a range represents a row or a column, the result is 1D, otherwise 2D.

This applies to iter as well, which can also optionally be given a range:

 > for i,j,v in iter(A,true,2,2) do print(i,j,v) end
 2       2       5
 2       3       6
 3       2       8
 3       3       9
generated by LDoc 1.2