Monday 5 January 2009

Emulating Ruby blocks in Python

There are some Ruby features I like a lot. A handful of them are borrowed from other languages (like lisp or perl), but the Matz found a way of making the language a very nice and powerful combination of them. I've been using Ruby lately and got thinking on how could I implement one of Ruby's most beloved features, blocks, in Python (remember, "imitation is the sincerest form of flattery"). As it turns out, it is not all that hard. Here's what I came up with:

But before that...

On Ruby blocks


Blocks in ruby provide a way of creating functions that act on a code block defined later on.
An example of blocks in ruby is the following:

array = [1, 2, 3, 4]
array.each { |n| puts n ** 2 }
1
4
9
16

According to their site: "A block is like an anonymous function or lambda [and] the variable between pipe characters is the parameter for this block". What's missing from this description is that ruby also provides the syntactic sugar to create functions that receive blocks using the "yield" statement. In other words, it is a way of creating closures and attaching them to other methods. Using closures in ruby comes very easily even for people that donsn't know what a closure is. If we were to implement the previous behaviour ourselves we would do something like this:


class Array
def each2
for i in self
yield(i)
end
end
end

array = [1, 2, 3, 4]
array.each2 { |n| puts n ** 2 }
1
4
9
16


I won't go into details because there is a lot of documentation on ruby blocks out there.

So, onto Python...

Design
The Python "builtin" that resembles the most to blocks is that of PEP343, the with statement; but I wanted something that immitetad the ruby syntax as much as possible. The with statement is nice, but it doesn't cover all the cases.

So I decided to use a decorator to convert the function that uses the "block" into something that receives the block, inserts it into the namespace, and execute the original function with the block as a corutine.

The idea was something like this:

@receive_block
def simple_iterate():
for i in [1,2,3]:
print block()

@simple_iterate
def _():
return "a"

This copies the Ruby syntax except for the receive_block decorator, but I considered it a reasonable sacrifice.
Using "def _()" leaves the function as anonymous and allows you to specify parametrs for the block.

Implementing blocks in Python

So, to implement the syntax I just need to write the receive_block decorator.
The objective of this decorator is to convert the block receiving function, A, into another decorator that receives a function, B, introduces B into A's scope and subsequently calls A.
The key step is to add the block function to the scope. To do this we use Python's builtin types module. It includes the FunctionType method which creates a function object.

types.FunctionType(func.__code__, scope)

There is more to this method than what we use here, but I won't go into details about the method since we only need the simplest use of it.
Once we know this, the decorator is pretty simple:

import types

def receive_block(func):
def decorator(block):

# Add block to globals
scope = func.func_globals #globals()
scope.update({'block':block})

#create the function with the new scope
new_func = types.FunctionType(func.__code__, scope)

return new_func()

return decorator


Lets see how it works:

@receive_block
def external():
for i in [1,2,3]:
print block()

print "External"
@external
def _():
return "a"

This will print

External
a
a
a

And if we add a parameter:

@receive_block
def param_external():
for i in [1,2,3]:
print block(i)

print "External with param"
@param_external
def _(i):
return "a " + unicode(i)

It, as expected, prints:

External with param
a 1
a 2
a 3

But what if we wanted to implement something like ruby's Array class? Lets create an Array class that extends the builtin list type and see what happens.

class Array(list):
@receive_block
def each(self):
for i in self:
print block(i)

When calling

a = Array([1,2,3,4])

print "Each Square"
@a.each
def _(x):
return x**2

we get the exception:

decorator() takes exactly 1 argument (2 given)

because the instance method takes self as the first argument.
So we modify our initial decorator to work with instance methods:

def receive_block(func):
def decorator(*args):
if len(args) == 1:
block, = args
instance = None
elif len(args) == 2:
instance, block = args

# Add block to globals
scope = func.func_globals #globals()
scope.update({'block':block})

#create the function with the new scope
new_func = types.FunctionType(func.__code__, scope)

if instance:
return new_func(instance)
else:
return new_func()

return decorator

This modification is pretty straight forward. So I won't explain it because it speaks for itself.

Lets write some functions:

class Array(list):
@receive_block
def each(self):
for i in self:
print block(i)

@receive_block
def collect(self):
for (i, value) in enumerate(self):
self[i] = block(value)

@receive_block
def handled(self):
for i in self:
try:
block(i)
except:
print "This raised an exception"

and pass them some blocks:

a = Array([1,2,3,4])

print "Each Square"
@a.each
def _(x):
return x**2

print "Each"
@a.each
def _(x):
return x

print "Collect"
@a.collect
def _(x):
return x**2

print a # a is changed

print "Handled"
@a.handled
def _(x):
if x != 9:
raise Exception("this won't work")
else:
print "this works"

We, then, obtain the desired output:

Each Square
1
4
9
16
Each
1
2
3
4
Collect
[1, 4, 9, 16]
Handled
This raised an exception
This raised an exception
this works
This raised an exception


It works. =)

Another interesting way to do this would have been to add the block variable as a free variable of the function and have the code object reference it. In Python, when a closure is created the free variables are stored in an attribute of the function's code object and it's values are stored in the function itself using the cell type. Take this closure as example:

def test():
a = 10
b = lambda x: x+2
def inner():
print a, b(a)
return inner



>>> i = test()
>>> i.func_code.co_freevars
('a', 'b')
>>> i.func_closure
(< cell int object at 0x802400 >,
< cell function object at 0xf68848 >)

Adding the closure to the new function should be easy, since FunctionType accepts a closure keyword argument to do so. Unfortunately, the code's co_freevars attribute is read only:

>>> i.func_code.co_freevars += ('block',)
TypeError: readonly attribute

If anyone, who knows better than I do, cares to provide an implementation using closures I'd love to hear your solutions.

So this is how we implement Ruby-style blocks in Python using decorators. Hope you enjoyed it.


Disclaimer

This is by no means meant to be used in a production environment. Not even in a semi-serious environment. It is just a hack to demonstrate the how this could be done and it hasn't been tested.


Other notes

* The code for this project is hosted in http://github.com/nicolaslara/blocks/tree/master
* I wanted to implement some of the builtin Ruby functions that use blocks but I didn't have the time. If somebody is up to the task I'd love to see what interesting things could be done with this.
* Also, if somebody is willing to improve the code you are more than welcome.
* Some of the problems of this code:
-It clutters the global namespace.
-The word "block" is introduced as a global reserved word.
* For some reason blogger doesn't let me edit this site's template in 2009, so I couldn't add code syntax highlighting.


8 comments:

  1. Nicolas: FYI the indentation in some of your code blocks is a bit messed up, this is caused by using <code> instead of <pre> (or vice versa, I forget which :P ).

    ReplyDelete
  2. Thanks Alex! I was using <pre> because that's what my syntax highlighter uses but I couldn't add the highlighter 'cuz blogger wouldn't let me edit the template.
    It's fixed now thanks to a a cool widget by FaziBear

    ReplyDelete
  3. Beautiful post!

    Did you see this: http://www.voidspace.org.uk/python/articles/code_blocks.shtml ?

    I've combined your way to define anonymous code blocks and technique from this article to inject `block' variable to closures.

    Result is little bit ugly, "but it works" :)

    http://pastie.org/402041

    ReplyDelete
  4. Nice implementation. I found it quite clean, actually. Though I don't love the dependence on byteplay or the @@ syntax, but it definitelly gets the job done. Nice work!

    ReplyDelete
  5. Just a thought - using _ as a function name can interfere with gettext.

    ReplyDelete
  6. Why not just?:
    def simple_iterate(block):
    for i in [1,2,3]:
    print block()

    @simple_iterate
    def _():
    return "a"

    ReplyDelete
  7. No particular reason. I just wanted to avoid passing the argument explicitly. Note also that this breaks when using other paramenters in the block handling function. We would need some use of partial to evaluate the params.

    ReplyDelete
  8. I was playing with similar idea the other day

    ..def _(e):
    ....print(e)
    ..map(_, [1,2,3,4,5])

    It's a bit backwards, but once you accept that def _ will be followed by a mapping function or some such, it would be quite readable methinks.

    ReplyDelete