Topic 07-functional.md

Functional Programming

Sequences

A Lua iterator (in its simplest form) is a function which can be repeatedly called to return a set of one or more values. The for in statement understands these iterators, and loops until the function returns nil. There are standard sequence adapters for tables in Lua (ipairs and pairs ), and io.lines returns an iterator over all the lines in a file. In the Penlight libraries, such iterators are also called sequences. A sequence of single values (say from io.lines ) is called single-valued, whereas the sequence defined by pairs is double-valued.

pl.seq provides a number of useful iterators, and some functions which operate on sequences. At first sight this example looks like an attempt to write Python in Lua, (with the sequence being inclusive):

 > for i in seq.range(1,4) do print(i) end
 1
 2
 3
 4

But range is actually equivalent to Python's xrange, since it generates a sequence, not a list. To get a list, use seq.copy(seq.range(1,10)), which takes any single-value sequence and makes a table from the result. seq.list is like ipairs except that it does not give you the index, just the value.

 > for x in seq.list {1,2,3} do print(x) end
 1
 2
 3

enum takes a sequence and turns it into a double-valued sequence consisting of a sequence number and the value, so enum(list(ls)) is actually equivalent to ipairs . A more interesting example prints out a file with line numbers:

 for i,v in seq.enum(io.lines(fname)) do print(i..' '..v) end

Sequences can be combined, either by 'zipping' them or by concatenating them.

 > for x,y in seq.zip(l1,l2) do print(x,y) end
 10      1
 20      2
 30      3
 > for x in seq.splice(l1,l2) do print(x) end
 10
 20
 30
 1
 2
 3

seq.printall is useful for printing out single-valued sequences, and provides some finer control over formating, such as a delimiter, the number of fields per line, and a format string to use (@see string.format)

 > seq.printall(seq.random(10))
 0.0012512588885159 0.56358531449324 0.19330423902097 ....
 > seq.printall(seq.random(10), ',', 4, '%4.2f')
 0.17,0.86,0.71,0.51
 0.30,0.01,0.09,0.36
 0.15,0.17,

map will apply a function to a sequence.

 > seq.printall(seq.map(string.upper, {'one','two'}))
 ONE TWO
 > seq.printall(seq.map('+', {10,20,30}, 1))
 11 21 31

filter will filter a sequence using a boolean function (often called a predicate). For instance, this code only prints lines in a file which are composed of digits:

 for l in seq.filter(io.lines(file), stringx.isdigit) do print(l) end

The following returns a table consisting of all the positive values in the original table (equivalent to tablex.filter(ls, '>', 0))

 ls = seq.copy(seq.filter(ls, '>', 0))

We're already encounted seq.sum when discussing input.numbers . This can also be expressed with seq.reduce :

 > seq.reduce(function(x,y) return x + y end, seq.list{1,2,3,4})
 10

seq.reduce applies a binary function in a recursive fashion, so that:

 reduce(op,{1,2,3}) => op(1,reduce(op,{2,3}) => op(1,op(2,3))

it's now possible to easily generate other cumulative operations; the standard operations declared in pl.operator are useful here:

 > ops = require 'pl.operator'
 > -- can also say '*' instead of ops.mul
 > = seq.reduce(ops.mul,input.numbers '1 2 3 4')
 24

There are functions to extract statistics from a sequence of numbers:

 > l1 = List {10,20,30}
 > l2 = List {1,2,3}
 > = seq.minmax(l1)
 10      30
 > = seq.sum(l1)
 60      3

It is common to get sequences where values are repeated, say the words in a file. count_map will take such a sequence and count the values, returning a table where the keys are the unique values, and the value associated with each key is the number of times they occurred:

 > t = seq.count_map {'one','fred','two','one','two','two'}
 > = t
 {one=2,fred=1,two=3}

This will also work on numerical sequences, but you cannot expect the result to be a proper list, i.e. having no 'holes'. Instead, you always need to use pairs to iterate over the result - note that there is a hole at index 5:

 > t = seq.count_map {1,2,4,2,2,3,4,2,6}
 > for k,v in pairs(t) do print(k,v) end
 1       1
 2       4
 3       1
 4       2
 6       1

unique uses count_map to return a list of the unique values, that is, just the keys of the resulting table.

last turns a single-valued sequence into a double-valued sequence with the current value and the last value:

 > for current,last in seq.last {10,20,30,40} do print (current,last) end
 20      10
 30      20
 40      30

This makes it easy to do things like identify repeated lines in a file, or construct differences between values. filter can handle double-valued sequences as well, so one could filter such a sequence to only return cases where the current value is less than the last value by using operator.lt or just '<'. This code then copies the resulting code into a table.

 > ls = {10,9,10,3}
 > = seq.copy(seq.filter(seq.last(s),'<'))
 {9,3}

Sequence Wrappers

The functions in pl.seq cover the common patterns when dealing with sequences, but chaining these functions together can lead to ugly code. Consider the last example of the previous section; seq is repeated three times and the resulting expression has to be read right-to-left. The first issue can be helped by local aliases, so that the expression becomes copy(filter(last(s),'<')) but the second issue refers to the somewhat unnatural order of functional application. We tend to prefer reading operations from left to right, which is one reason why object-oriented notation has become popular. Sequence adapters allow this expression to be written like so:

 seq(s):last():filter('<'):copy()

With this notation, the operation becomes a chain of method calls running from left to right.

'Sequence' is not a basic Lua type, they are generally functions or callable objects. The expression seq(s) wraps a sequence in a sequence wrapper, which is an object which understands all the functions in pl.seq as methods. This object then explicitly represents sequences.

As a special case, the constructor (which is when you call the table seq ) will make a wrapper for a plain list-like table. Here we apply the length operator to a sequence of strings, and print them out.

 > seq{'one','tw','t'} :map '#' :printall()
 3 2 1

As a convenience, there is a function seq.lines which behaves just like io.lines except it wraps the result as an explicit sequence type. This takes the first 10 lines from standard input, makes it uppercase, turns it into a sequence with a count and the value, glues these together with the concatenation operator, and finally prints out the sequence delimited by a newline.

 seq.lines():take(10):upper():enum():map('..'):printall '\n'

Note the method upper, which is not a seq function. if an unknown method is called, sequence wrappers apply that method to all the values in the sequence (this is implicit use of mapmethod )

It is straightforward to create custom sequences that can be used in this way. On Unix, /dev/random gives you an endless sequence of random bytes, so we use take to limit the sequence, and then map to scale the result into the desired range. The key step is to use seq to wrap the iterator function:

 -- random.lua
 local seq = require 'pl.seq'

 function dev_random()
     local f = io.open('/dev/random')
     local byte = string.byte
     return seq(function()
         -- read two bytes into a string and convert into a 16-bit number
         local s = f:read(2)
         return byte(s,1) + 256*byte(s,2)
     end)
 end

 -- print 10 random numbers from 0 to 1 !
 dev_random():take(10):map('%',100):map('/',100):printall ','

Another Linux one-liner depends on the /proc filesystem and makes a list of all the currently running processes:

 pids = seq(lfs.dir '/proc'):filter(stringx.isdigit):map(tonumber):copy()

This version of Penlight has an experimental feature which relies on the fact that all Lua types can have metatables, including functions. This makes implicit sequence wrapping possible:

 > seq.import()
 > seq.random(5):printall(',',5,'%4.1f')
  0.0, 0.1, 0.4, 0.1, 0.2

This avoids the awkward seq(seq.random(5)) construction. Or the iterator can come from somewhere else completely:

 > ('one two three'):gfind('%a+'):printall(',')
 one,two,three,

After seq.import, it is no longer necessary to explicitly wrap sequence functions.

But there is a price to pay for this convenience. Every function is affected, so that any function can be used, appropriate or not:

 > math.sin:printall()
 ..seq.lua:287: bad argument #1 to '(for generator)' (number expected, got nil)
 > a = tostring
 > = a:find(' ')
 function: 0042C920

What function is returned? It's almost certain to be something that makes no sense in the current context. So implicit sequences may make certain kinds of programming mistakes harder to catch - they are best used for interactive exploration and small scripts.

List Comprehensions

List comprehensions are a compact way to create tables by specifying their elements. In Python, you can say this:

 ls = [x for x in range(5)]  # == [0,1,2,3,4]

In Lua, using pl.comprehension :

 > C = require('pl.comprehension').new()
 > = C ('x for x=1,10') ()
 {1,2,3,4,5,6,7,8,9,10}

C is a function which compiles a list comprehension string into a function. In this case, the function has no arguments. The parentheses are redundant for a function taking a string argument, so this works as well:

 > = C 'x^2 for x=1,4' ()
 {1,4,9,16}
 > = C '{x,x^2} for x=1,4' ()
 {{1,1},{2,4},{3,9},{4,16}}

Note that the expression can be any function of the variable x!

The basic syntax so far is <expr> for <set>, where <set> can be anything that the Lua for statement understands. <set> can also just be the variable, in which case the values will come from the argument of the comprehension. Here I'm emphasizing that a comprehension is a function which can take a list argument:

 > = C '2*x for x' {1,2,3}
 {2,4,6}
 > dbl = C '2*x for x'
 > = dbl {10,20,30}
 {20,40,60}

Here is a somewhat more explicit way of saying the same thing; _1 is a placeholder refering to the first argument passed to the comprehension.

 > = C '2*x for _,x in pairs(_1)' {10,20,30}
 {20,40,60}
 > = C '_1(x) for x'(tostring,{1,2,3,4})
 {'1','2','3','4'}

This extended syntax is useful when you wish to collect the result of some iterator, such as io.lines . This comprehension creates a function which creates a table of all the lines in a file:

 > f = io.open('array.lua')
 > lines = C 'line for line in _1:lines()' (f)
 > = #lines
 118

There are a number of functions that may be applied to the result of a comprehension:

 > = C 'min(x for x)' {1,44,0}
 0
 > = C 'max(x for x)' {1,44,0}
 44
 > = C 'sum(x for x)' {1,44,0}
 45

(These are equivalent to a reduce operation on a list.)

After the for part, there may be a condition, which filters the output. This comprehension collects the even numbers from a list:

 > = C 'x for x if x % 2 == 0' {1,2,3,4,5}
 {2,4}

There may be a number of for parts:

 > = C '{x,y} for x = 1,2 for y = 1,2' ()
 {{1,1},{1,2},{2,1},{2,2}}
 > = C '{x,y} for x for y' ({1,2},{10,20})
 {{1,10},{1,20},{2,10},{2,20}}

These comprehensions are useful when dealing with functions of more than one variable, and are not so easily achieved with the other Penlight functional forms.

Creating Functions from Functions

Lua functions may be treated like any other value, although of course you cannot multiply or add them. One operation that makes sense is function composition, which chains function calls (so (f * g)(x) is f(g(x)).)

 > func = require 'pl.func'
 > printf = func.compose(io.write,string.format)
 > printf("hello %s\n",'world')
 hello world
 true

Many functions require you to pass a function as an argument, say to apply to all values of a sequence or as a callback. Often useful functions have the wrong number of arguments. So there is a need to construct a function of one argument from one of two arguments, binding the extra argument to a given value.

currying takes a function of n arguments and returns a function of n-1 arguments where the first argument is bound to some value:

 > p2 = func.curry(print,'start>')
 > p2('hello',2)
 start>  hello   2
 > ops = require 'pl.operator'
 > = tablex.filter({1,-2,10,-1,2},curry(ops.gt,0))
 {-2,-1}
 > tablex.filter({1,-2,10,-1,2},curry(ops.le,0))
 {1,10,2}

The last example unfortunately reads backwards, because curry alway binds the first argument!

Currying is a specialized form of function argument binding. Here is another way to say the print example:

 > p2 = func.bind(print,'start>',func._1,func._2)
 > p2('hello',2)
 start>  hello   2

where _1 and _2 are placeholder variables, corresponding to the first and second argument respectively.

Having func all over the place is distracting, so it's useful to pull all of pl.func into the local context. Here is the filter example, this time the right way around:

 > utils.import 'pl.func'
 > tablex.filter({1,-2,10,-1,2},bind(ops.gt, _1, 0))
 {1,10,2}

tablex.merge does a general merge of two tables. This example shows the usefulness of binding the last argument of a function.

 > S1 = {john=27, jane=31, mary=24}
 > S2 = {jane=31, jones=50}
 > intersection = bind(tablex.merge, _1, _2, false)
 > union = bind(tablex.merge, _1, _2, true)
 > = intersection(S1,S2)
 {jane=31}
 > = union(S1,S2)
 {mary=24,jane=31,john=27,jones=50}

When using bind to curry print , we got a function of precisely two arguments, whereas we really want our function to use varargs like print . This is the role of _0:

 > _DEBUG = true
 > p = bind(print,'start>', _0)
 return function (fn,_v1)
     return function(...) return fn(_v1,...) end
 end

 > p(1,2,3,4,5)
 start>  1       2       3       4       5

I've turned on the global _DEBUG flag, so that the function generated is printed out. It is actually a function which generates the required function; the first call binds the value of _v1 to 'start>'.

Placeholder Expressions

A common pattern in Penlight is a function which applies another function to all elements in a table or a sequence, such as tablex.map or seq.filter . Lua does anonymous functions well, although they can be a bit tedious to type:

 > = tablex.map(function(x) return x*x end, {1,2,3,4})
 {1,4,9,16}

pl.func allows you to define placeholder expressions, which can cut down on the typing required, and also make your intent clearer. First, we bring contents of pl.func into our context, and then supply an expression using placeholder variables, such as _1,_2,etc. (C++ programmers will recognize this from the Boost libraries.)

 > utils.import 'pl.func'
 > = tablex.map(_1*_1, {1,2,3,4})
 {1,4,9,16}

Functions of up to 5 arguments can be generated.

 > = tablex.map2(_1+_2,{1,2,3}, {10,20,30})
 {11,22,33}

These expressions can use arbitrary functions, altho they must first be registered with the functional library. func.register brings in a single function, and func.import brings in a whole table of functions, such as math .

 > sin = register(math.sin)
 > = tablex.map(sin(_1), {1,2,3,4})
 {0.8414709848079,0.90929742682568,0.14112000805987,-0.75680249530793}
 > import 'math'
 > = tablex.map(cos(2*_1),{1,2,3,4})
 {-0.41614683654714,-0.65364362086361,0.96017028665037,-0.14550003380861}

A common operation is calling a method of a set of objects:

 > = tablex.map(_1:sub(1,1), {'one','four','x'})
 {'o','f','x'}

There are some restrictions on what operators can be used in PEs. For instance, because the __len metamethod cannot be overriden by plain Lua tables, we need to define a special function to express `#_1':

 > = tablex.map(Len(_1), {'one','four','x'})
 {3,4,1}

Likewise for comparison operators, which cannot be overloaded for different types, and thus also have to be expressed as a special function:

 > = tablex.filter(Gt(_1,0), {1,-1,2,4,-3})
 {1,2,4}

It is useful to express the fact that a function returns multiple values. For instance, tablex.pairmap expects a function that will be called with the key and the value, and returns the new value and the key, in that order.

 > = pairmap(Args(_2,_1:upper()),{fred=1,alice=2})
 {ALICE=2,FRED=1}

PEs cannot contain nil values, since PE function arguments are represented as an array. Instead, a special value called Nil is provided. So say _1:f(Nil,1) instead of _1:f(nil,1).

A placeholder expression cannot be automatically used as a Lua function. The technical reason is that the call operator must be overloaded to construct function calls like _1(1). If you want to force a PE to return a function, use func.I .

 > = tablex.map(_1(10),{I(2*_1),I(_1*_1),I(_1+2)})
 {20,100,12}

Here we make a table of functions taking a single argument, and then call them all with a value of 10.

The essential idea with PEs is to 'quote' an expression so that it is not immediately evaluated, but instead turned into a function that can be applied later to some arguments. The basic mechanism is to wrap values and placeholders so that the usual Lua operators have the effect of building up an expression tree. (It turns out that you can do symbolic algebra using PEs, see symbols.lua in the examples directory, and its test runner testsym.lua, which demonstrates symbolic differentiation.)

The rule is that if any operator has a PE operand, the result will be quoted. Sometimes we need to quote things explicitly. For instance, say we want to pass a function to a filter that must return true if the element value is in a set. set[_1] is the obvious expression, but it does not give the desired result, since it evaluates directly, giving nil. Indexing works differently than a binary operation like addition (set+_1 is properly quoted) so there is a need for an explicit quoting or wrapping operation. This is the job of the _ function; the PE in this case should be _(set)[_1]. This works for functions as well, as a convenient alternative to registering functions: _(math.sin)(_1). This is equivalent to using the `lines' method:

 for line in I(_(f):read()) do print(line) end

Now this will work for any 'file-like' object which which has a read method returning the next line. If you had a LuaSocket client which was being 'pushed' by lines sent from a server, then _(s):receive '*l' would create an iterator for accepting input. These forms can be convenient for adapting your data flow so that it can be passed to the sequence functions in `pl.seq'.

Placeholder expressions can be mixed with sequence wrapper expressions. lexer.lua will give us a double-valued sequence of tokens, where the first value is a type, and the second is a value. We filter out only the values where the type is 'iden', extract the actual value using map, get the unique values and finally copy to a list.

 > str = 'for i=1,10 do for j = 1,10 do print(i,j) end end'
 > = seq(lexer.lua(str)):filter('==','iden'):map(_2):unique():copy()
 {i,print,j}

This is a particularly intense line (and I don't always suggest making everything a one-liner!); the key is the behaviour of map, which will take both values of the sequence, so _2 returns the value part. (Since filter here takes extra arguments, it only operates on the type values.)

There are some performance considerations to using placeholder expressions. Instantiating a PE requires constructing and compiling a function, which is not such a fast operation. So to get best performance, factor out PEs from loops like this;

 local fn = I(_1:f() + _2:g())
 for i = 1,n do
     res[i] = tablex.map2(fn,first[i],second[i])
 end
generated by LDoc 1.2