Saturday 31 January 2009

Permutations

This last week I happened to have the necessity of generating all the possible permutations of an array. This seems to be a fairly easy task but many programmers won't come up with a simple solution to this problem very fast. There is at least one intuitive, recursive, solution to this problem, many more complex but *slightly* more efficient, iterative, solutions and no really time-efficient solution. After all, the problem is to generate n! orderings of an array so we, at least, have to visit each one once.

The well-known solution


The most widely spread solution to this problem is the Dijkstra algorithm for generating the next permutation. It is based on the assumption that there is a total order between the elements of the array and the elements of the array are unique (there are no repetitions). If these conditions are met, the algorithm always generates the next permutation in lexicographical order.

The algorithm is very straight forward and goes as follow:

def get_next(a):
N = len(a)
i = len(a)-1

while a[i-1] >= a[i]:
i -= 1

j = N

while a[j-1] <= a[i-1]:
j -= 1

a[i-1], a[j-1] = a[j-1], a[i-1]

i += 1
j = N

while i < j:
a[i-1], a[j-1] = a[j-1], a[i-1]
i+=1
j-=1

return a



To use this algorithm to generate every possible permutation we just call it n! times:

def dijkstra(a):
import operator
fact = reduce(operator.mul, range(1,len(a)+1))
the_list = [copy(a)]
for i in range(fact-1):
a = get_next(a)
the_list.append(copy(a))

return the_list


the only possibly uncommon thing here is line 3 which is a functional-style factorial function.

This works very well for generating the permutations of n (the permutations of the array [0..n-1]) but it won't work for arrays of arbitrary elements. So we explore other options.

The known recursive solution


There's a (not-so-well-)known recursive solution to solving the permutation problem. It is known as the HeapPermute algorithm or Heap's algorithm for permuting an array.
My python implementation of the algorithm goes as follow:

def heap_aux(a, n, the_list):
if n == 1:
the_list.append(copy(a))
else:
for i in range(n):
heap_aux(a, n-1, the_list);

if n % 2 == 1:
a[0], a[n-1] = a[n-1], a[0]
else:
a[i], a[n-1] = a[n-1], a[i]


def heap(a):
the_list = []
heap_aux(a, len(a), the_list)
return the_list

This algorithm works very well and it is very efficient too but it generates the permutations in an order that I find particularly disturbing.
Also it is not as easy to understand as I would like.

Enter my recursive solution
This is probably the most straight forward solution I could come up with. It is a clasical recursive design.
Want to permute an array of n elements? How would you do it if you already had a function to solve the problem for a smaller array?
Simple. Select each possible element of the array to be "the first element of the permutation" and then permute the rest of the array:
A permutation of 1,2,3,4 would be:
1 + permutation_of(2,3,4)
2 + permutation_of(1,3,4)
3 + permutation_of(2,1,4)
4 + permutation_of(2,3,1)

Simple, right?

Let's code it:

for j in range(i, len(a)):
a[i], a[j] = a[j], a[i]
permutation(a, i+1, the_list)
a[i], a[j] = a[j], a[i]

This loop captures the essence of the procedure described above. We have to remember to put the elements back where they were after the recursive call to allow next iteration to work. Not doing so would result in swapping the wrong element in the next iteration.

Puting it all together (and adding a small optimization):

def perm_aux(a, i, the_list):
if i == len(a):
the_list.append(copy(a))
else:
perm_aux(a, i+1, the_list)
for j in range(i+1, len(a)):
a[i], a[j] = a[j], a[i]
perm_aux(a, i+1, the_list)
a[i], a[j] = a[j], a[i]


def permutate(a):
the_list = []
perm_aux(a, 0, the_list)
return the_list

This algorithm generates the list in a nicer order (though it's not lexicographical). And as its predecessor is quite efficient (taking into account the complexity of the problem).

Performance

Normally recursive algorithms are put down for performance reasons: They keep all local variables stored in the stack while the other recursive calls are made and have to incur in function calling overhead. In the case of these two algorithms the good thing is that the recurtion depth is not proportional to the size of the problem solution but to the size of the array (actually it is exactly as deep as the array's length). Also, if the list variables are passed by reference (this is programming language specific), only the reference to the variable needs to be stored in the stack and there is no overhead of coping the variables on every call.
It will probably never be as efficient as their iterative counterparts but for me readability (with small performance penalties) definitely pays off.

To measure the times I used the following python decorator:

def timing(func):
def inside(*args):
t1 = time.time()
res = func(*args)
t2 = time.time()
print '%s took %0.3f ms' % (func.func_name, (t2-t1)*1000.0)
return res
return inside




Given the complexity of the problem, the graph grows rapidly out of scale. So a graph in logarithmic scale is in order



As you can see the algorithms behave very similarly. The largest difference (for an array of size 9) between the fastest (heap) and the slowest (permute) is 0.6 seconds. This of course will grow with the array size, but in a linear instead of exponential manner.


Have another solution to this problem? comment on! =)


Friday 30 January 2009

sed one-liners

Sed never ceases to amaze me. But I was specially amazed by the fact that "sed is turing complete". Check out this useful sed commands that'll make your life a lot easier!.


Friday 16 January 2009

Django gets aggregation support!

As for Changeset 9742 Django gets aggregation support. For those of you that don't know, I worked in this project as part of Google Summer of Code and I'm very glad to see this finally coming true!.

I also wanted to thank Russell Keith-Magee for mentoring me throughout the project and working in the code after GSoC was finished, Malcolm Tredinnick for giving amazing code reviews during the whole SoC and the many others that worked in getting this done.

I encourage you to check it out. The documentation is a great start! =)


Monday 5 January 2009

Emulating Ruby blocks in Python

There are some Ruby features I like a lot. A handful of them are borrowed from other languages (like lisp or perl), but the Matz found a way of making the language a very nice and powerful combination of them. I've been using Ruby lately and got thinking on how could I implement one of Ruby's most beloved features, blocks, in Python (remember, "imitation is the sincerest form of flattery"). As it turns out, it is not all that hard. Here's what I came up with:

But before that...

On Ruby blocks


Blocks in ruby provide a way of creating functions that act on a code block defined later on.
An example of blocks in ruby is the following:

array = [1, 2, 3, 4]
array.each { |n| puts n ** 2 }
1
4
9
16

According to their site: "A block is like an anonymous function or lambda [and] the variable between pipe characters is the parameter for this block". What's missing from this description is that ruby also provides the syntactic sugar to create functions that receive blocks using the "yield" statement. In other words, it is a way of creating closures and attaching them to other methods. Using closures in ruby comes very easily even for people that donsn't know what a closure is. If we were to implement the previous behaviour ourselves we would do something like this:


class Array
def each2
for i in self
yield(i)
end
end
end

array = [1, 2, 3, 4]
array.each2 { |n| puts n ** 2 }
1
4
9
16


I won't go into details because there is a lot of documentation on ruby blocks out there.

So, onto Python...

Design
The Python "builtin" that resembles the most to blocks is that of PEP343, the with statement; but I wanted something that immitetad the ruby syntax as much as possible. The with statement is nice, but it doesn't cover all the cases.

So I decided to use a decorator to convert the function that uses the "block" into something that receives the block, inserts it into the namespace, and execute the original function with the block as a corutine.

The idea was something like this:

@receive_block
def simple_iterate():
for i in [1,2,3]:
print block()

@simple_iterate
def _():
return "a"

This copies the Ruby syntax except for the receive_block decorator, but I considered it a reasonable sacrifice.
Using "def _()" leaves the function as anonymous and allows you to specify parametrs for the block.

Implementing blocks in Python

So, to implement the syntax I just need to write the receive_block decorator.
The objective of this decorator is to convert the block receiving function, A, into another decorator that receives a function, B, introduces B into A's scope and subsequently calls A.
The key step is to add the block function to the scope. To do this we use Python's builtin types module. It includes the FunctionType method which creates a function object.

types.FunctionType(func.__code__, scope)

There is more to this method than what we use here, but I won't go into details about the method since we only need the simplest use of it.
Once we know this, the decorator is pretty simple:

import types

def receive_block(func):
def decorator(block):

# Add block to globals
scope = func.func_globals #globals()
scope.update({'block':block})

#create the function with the new scope
new_func = types.FunctionType(func.__code__, scope)

return new_func()

return decorator


Lets see how it works:

@receive_block
def external():
for i in [1,2,3]:
print block()

print "External"
@external
def _():
return "a"

This will print

External
a
a
a

And if we add a parameter:

@receive_block
def param_external():
for i in [1,2,3]:
print block(i)

print "External with param"
@param_external
def _(i):
return "a " + unicode(i)

It, as expected, prints:

External with param
a 1
a 2
a 3

But what if we wanted to implement something like ruby's Array class? Lets create an Array class that extends the builtin list type and see what happens.

class Array(list):
@receive_block
def each(self):
for i in self:
print block(i)

When calling

a = Array([1,2,3,4])

print "Each Square"
@a.each
def _(x):
return x**2

we get the exception:

decorator() takes exactly 1 argument (2 given)

because the instance method takes self as the first argument.
So we modify our initial decorator to work with instance methods:

def receive_block(func):
def decorator(*args):
if len(args) == 1:
block, = args
instance = None
elif len(args) == 2:
instance, block = args

# Add block to globals
scope = func.func_globals #globals()
scope.update({'block':block})

#create the function with the new scope
new_func = types.FunctionType(func.__code__, scope)

if instance:
return new_func(instance)
else:
return new_func()

return decorator

This modification is pretty straight forward. So I won't explain it because it speaks for itself.

Lets write some functions:

class Array(list):
@receive_block
def each(self):
for i in self:
print block(i)

@receive_block
def collect(self):
for (i, value) in enumerate(self):
self[i] = block(value)

@receive_block
def handled(self):
for i in self:
try:
block(i)
except:
print "This raised an exception"

and pass them some blocks:

a = Array([1,2,3,4])

print "Each Square"
@a.each
def _(x):
return x**2

print "Each"
@a.each
def _(x):
return x

print "Collect"
@a.collect
def _(x):
return x**2

print a # a is changed

print "Handled"
@a.handled
def _(x):
if x != 9:
raise Exception("this won't work")
else:
print "this works"

We, then, obtain the desired output:

Each Square
1
4
9
16
Each
1
2
3
4
Collect
[1, 4, 9, 16]
Handled
This raised an exception
This raised an exception
this works
This raised an exception


It works. =)

Another interesting way to do this would have been to add the block variable as a free variable of the function and have the code object reference it. In Python, when a closure is created the free variables are stored in an attribute of the function's code object and it's values are stored in the function itself using the cell type. Take this closure as example:

def test():
a = 10
b = lambda x: x+2
def inner():
print a, b(a)
return inner



>>> i = test()
>>> i.func_code.co_freevars
('a', 'b')
>>> i.func_closure
(< cell int object at 0x802400 >,
< cell function object at 0xf68848 >)

Adding the closure to the new function should be easy, since FunctionType accepts a closure keyword argument to do so. Unfortunately, the code's co_freevars attribute is read only:

>>> i.func_code.co_freevars += ('block',)
TypeError: readonly attribute

If anyone, who knows better than I do, cares to provide an implementation using closures I'd love to hear your solutions.

So this is how we implement Ruby-style blocks in Python using decorators. Hope you enjoyed it.


Disclaimer

This is by no means meant to be used in a production environment. Not even in a semi-serious environment. It is just a hack to demonstrate the how this could be done and it hasn't been tested.


Other notes

* The code for this project is hosted in http://github.com/nicolaslara/blocks/tree/master
* I wanted to implement some of the builtin Ruby functions that use blocks but I didn't have the time. If somebody is up to the task I'd love to see what interesting things could be done with this.
* Also, if somebody is willing to improve the code you are more than welcome.
* Some of the problems of this code:
-It clutters the global namespace.
-The word "block" is introduced as a global reserved word.
* For some reason blogger doesn't let me edit this site's template in 2009, so I couldn't add code syntax highlighting.