Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Wednesday, 4 November 2009

My Pycon talk (Python Metaprogramming)

If you followed the link on my previous post and looked at #64 you might have noticed I'll be giving a talk on Python Metaprogramming. Here's a sneak peak at what the talk will be like along with some comments.

Outline


  • Metaprogramming:
    A short explanation of what metaprogramming is, why it is needed and how it can save you time and effort.

  • Metaprogramming Languages:

    What languages are considered "metaprogramming languages" and what constructs do they provide to be considered as such. A 30000 ft view exploration of the metaprogramming features of languages like Lisp, Ruby, Perl, Javascript, Haskell and OCaml.

  • Cool uses/examples of metaprogramming in other languages:

    Some interesting examples of metaprogramming regardless of the language. These examples will be ported to Python at the end of the talk. I will go through this very fast just to show what we want to achieve in each example and what techniques are used by other programming languages to achieve this effects. A more detailed explanation will be done at the end of the talk by porting the examples to Python.


  • Tools for metaprogramming in Python:

    Each of the subsections of "Tools for metaprogramming in Python" should contain the following information:

    • Explaination: short explaination

    • Short example in Python

    • An example of using the technique to port some othe language metaprogramming features

    • Patterns: Common or useful patterns to incorporate that metaprogramming tool into your code

    • Advantages, disadvantages and perils.
The subsections I'll be covering are the following:
  1. Code generation: eval() and exec():
    The basics. Code Generation as strings and getting them to execute.

  2. Python (A.K.A.: Python's data model):
    Exploring the flexibility of Python's data model: Duck-typing, special methods, attribute access, callable emulation, behaviour on operators.

  3. Decorators:
    An introduction to decoratos and what can be achieved with them. Emphasis on decorator patterns.

  4. Metaclasses:
    A basic introduction to metaclasses: What they are and when are they needed. Includes an idea of when can metaclasses be replaced by decorators.

  5. types. Specially types.FunctionType:
    Exploring type objcts. Specially function objects, code attributes, locals, globals defaults and closures

  6. Others: import hooks, ByteCode, etc:
    A seccion on other useful metaprogramming tools that might fit into the time frame of the talk. (this seccion might be included depending on the time or if there are in depth talks in pycon about somme of the above techniques that suggest cutting down on them in favour of other options)


  • Putting it all together:
    Just a pointer to a resourse that shows example code using most of the techniques shown throughout the talk. The example will probably be creating a DSL: possibly language generation (HTML, CSS, SQL or other), a generic problem solver or a small framework for distributed computation. I might take a lightning talk to go through this example in more detail.


  • The Good, the Bad and the Ugly:
    A short conclusion about metaprogramming in Python. Why use it, when to use it, when to avoid it and what is python missing regarding metaprogramming that other languages have.



  • Alternative titles


    There are some alternative titles I'm considering for the talk. Any favourites?
    • Baddass Metaprogramming with Python (Thanks, Ahsanul)
    • The swiss army knife for metaprogramming in Python
    • Metaprogramming tricks for Python programmers
    • A zoo of Python metaprogramming tools
    • Tools and techniques for Python metaprogramming
    • Enhancing your code through metaprogramming
    • Python metaprogramming tricks for your toolbox


    Pycon 2010

    The list of scheduled talks for Pycon 2010 is out and they are looking very good. Thanks to all the people involved in making Pycon happen!

    I am specially fond of #64. Go check it out!

    If you missed Pycon 2009 (I did), you can take a look at what went on here.

    As for the Blog Posting Month not all of my posts during the month will be like this. A real one is on the way. Promise!


    Thursday, 19 February 2009

    Acronyms

    In the last post I mentioned the book Programming Pearls. In the second chapter, Jon Bentley presents a simple, but interesting problem: Given a list of words, group those that are acronyms together.

    If you don't know what acronyms are, it is simply a words formed from the initial letters of others. For instance, the words "pot" and "top" are acronyms; so are the words "slow" and "owls".

    When I read it, I thought "This could be done with unix tools and pipes". Actually, I didn't find a way of doing it efficiently with native unix tools. The reason being that there's no horizontal sort implementation in unix. You could write something like this:


    for i in `cat dict | sed 'p' | sed 'N;s/\n/*-/g'`; do echo $i |
    cut -d "*" -f 2 | sed "s/\(.\)/\1-/g" | tr "-" "\n" | sort |
    tr -d "\n"; echo "$i" | cut -d "*" -f 2 ; done | sort


    But it won't be very efficient and it's very ugly. I didn't spend too much time thinking of a better way to do it unix-style. If you have one, please post it on the comments =)

    Instead, I wrote a small python script


    #!/usr/bin/env python
    import sys

    if __name__ == '__main__':
    f = open(sys.argv[1])
    for line in f.xreadlines():
    line = line.strip()
    print '%s-%s' % (''.join(sorted(line)), line)


    and I executed:


    ./expand.py dict2 | sort


    which runs in about 0.47 seconds. Then, we could add another step to pretty print the acronyms, but I wasn't interested in that.

    Unfortunately, the only thing that remains 'unix' about this solution is the use of sort and one pipe. This post is not so much to show my findings but to look for a better solution. Does anyone have a suggestion?


    Saturday, 31 January 2009

    Permutations

    This last week I happened to have the necessity of generating all the possible permutations of an array. This seems to be a fairly easy task but many programmers won't come up with a simple solution to this problem very fast. There is at least one intuitive, recursive, solution to this problem, many more complex but *slightly* more efficient, iterative, solutions and no really time-efficient solution. After all, the problem is to generate n! orderings of an array so we, at least, have to visit each one once.

    The well-known solution


    The most widely spread solution to this problem is the Dijkstra algorithm for generating the next permutation. It is based on the assumption that there is a total order between the elements of the array and the elements of the array are unique (there are no repetitions). If these conditions are met, the algorithm always generates the next permutation in lexicographical order.

    The algorithm is very straight forward and goes as follow:

    def get_next(a):
    N = len(a)
    i = len(a)-1

    while a[i-1] >= a[i]:
    i -= 1

    j = N

    while a[j-1] <= a[i-1]:
    j -= 1

    a[i-1], a[j-1] = a[j-1], a[i-1]

    i += 1
    j = N

    while i < j:
    a[i-1], a[j-1] = a[j-1], a[i-1]
    i+=1
    j-=1

    return a



    To use this algorithm to generate every possible permutation we just call it n! times:

    def dijkstra(a):
    import operator
    fact = reduce(operator.mul, range(1,len(a)+1))
    the_list = [copy(a)]
    for i in range(fact-1):
    a = get_next(a)
    the_list.append(copy(a))

    return the_list


    the only possibly uncommon thing here is line 3 which is a functional-style factorial function.

    This works very well for generating the permutations of n (the permutations of the array [0..n-1]) but it won't work for arrays of arbitrary elements. So we explore other options.

    The known recursive solution


    There's a (not-so-well-)known recursive solution to solving the permutation problem. It is known as the HeapPermute algorithm or Heap's algorithm for permuting an array.
    My python implementation of the algorithm goes as follow:

    def heap_aux(a, n, the_list):
    if n == 1:
    the_list.append(copy(a))
    else:
    for i in range(n):
    heap_aux(a, n-1, the_list);

    if n % 2 == 1:
    a[0], a[n-1] = a[n-1], a[0]
    else:
    a[i], a[n-1] = a[n-1], a[i]


    def heap(a):
    the_list = []
    heap_aux(a, len(a), the_list)
    return the_list

    This algorithm works very well and it is very efficient too but it generates the permutations in an order that I find particularly disturbing.
    Also it is not as easy to understand as I would like.

    Enter my recursive solution
    This is probably the most straight forward solution I could come up with. It is a clasical recursive design.
    Want to permute an array of n elements? How would you do it if you already had a function to solve the problem for a smaller array?
    Simple. Select each possible element of the array to be "the first element of the permutation" and then permute the rest of the array:
    A permutation of 1,2,3,4 would be:
    1 + permutation_of(2,3,4)
    2 + permutation_of(1,3,4)
    3 + permutation_of(2,1,4)
    4 + permutation_of(2,3,1)

    Simple, right?

    Let's code it:

    for j in range(i, len(a)):
    a[i], a[j] = a[j], a[i]
    permutation(a, i+1, the_list)
    a[i], a[j] = a[j], a[i]

    This loop captures the essence of the procedure described above. We have to remember to put the elements back where they were after the recursive call to allow next iteration to work. Not doing so would result in swapping the wrong element in the next iteration.

    Puting it all together (and adding a small optimization):

    def perm_aux(a, i, the_list):
    if i == len(a):
    the_list.append(copy(a))
    else:
    perm_aux(a, i+1, the_list)
    for j in range(i+1, len(a)):
    a[i], a[j] = a[j], a[i]
    perm_aux(a, i+1, the_list)
    a[i], a[j] = a[j], a[i]


    def permutate(a):
    the_list = []
    perm_aux(a, 0, the_list)
    return the_list

    This algorithm generates the list in a nicer order (though it's not lexicographical). And as its predecessor is quite efficient (taking into account the complexity of the problem).

    Performance

    Normally recursive algorithms are put down for performance reasons: They keep all local variables stored in the stack while the other recursive calls are made and have to incur in function calling overhead. In the case of these two algorithms the good thing is that the recurtion depth is not proportional to the size of the problem solution but to the size of the array (actually it is exactly as deep as the array's length). Also, if the list variables are passed by reference (this is programming language specific), only the reference to the variable needs to be stored in the stack and there is no overhead of coping the variables on every call.
    It will probably never be as efficient as their iterative counterparts but for me readability (with small performance penalties) definitely pays off.

    To measure the times I used the following python decorator:

    def timing(func):
    def inside(*args):
    t1 = time.time()
    res = func(*args)
    t2 = time.time()
    print '%s took %0.3f ms' % (func.func_name, (t2-t1)*1000.0)
    return res
    return inside




    Given the complexity of the problem, the graph grows rapidly out of scale. So a graph in logarithmic scale is in order



    As you can see the algorithms behave very similarly. The largest difference (for an array of size 9) between the fastest (heap) and the slowest (permute) is 0.6 seconds. This of course will grow with the array size, but in a linear instead of exponential manner.


    Have another solution to this problem? comment on! =)


    Monday, 5 January 2009

    Emulating Ruby blocks in Python

    There are some Ruby features I like a lot. A handful of them are borrowed from other languages (like lisp or perl), but the Matz found a way of making the language a very nice and powerful combination of them. I've been using Ruby lately and got thinking on how could I implement one of Ruby's most beloved features, blocks, in Python (remember, "imitation is the sincerest form of flattery"). As it turns out, it is not all that hard. Here's what I came up with:

    But before that...

    On Ruby blocks


    Blocks in ruby provide a way of creating functions that act on a code block defined later on.
    An example of blocks in ruby is the following:

    array = [1, 2, 3, 4]
    array.each { |n| puts n ** 2 }
    1
    4
    9
    16

    According to their site: "A block is like an anonymous function or lambda [and] the variable between pipe characters is the parameter for this block". What's missing from this description is that ruby also provides the syntactic sugar to create functions that receive blocks using the "yield" statement. In other words, it is a way of creating closures and attaching them to other methods. Using closures in ruby comes very easily even for people that donsn't know what a closure is. If we were to implement the previous behaviour ourselves we would do something like this:


    class Array
    def each2
    for i in self
    yield(i)
    end
    end
    end

    array = [1, 2, 3, 4]
    array.each2 { |n| puts n ** 2 }
    1
    4
    9
    16


    I won't go into details because there is a lot of documentation on ruby blocks out there.

    So, onto Python...

    Design
    The Python "builtin" that resembles the most to blocks is that of PEP343, the with statement; but I wanted something that immitetad the ruby syntax as much as possible. The with statement is nice, but it doesn't cover all the cases.

    So I decided to use a decorator to convert the function that uses the "block" into something that receives the block, inserts it into the namespace, and execute the original function with the block as a corutine.

    The idea was something like this:

    @receive_block
    def simple_iterate():
    for i in [1,2,3]:
    print block()

    @simple_iterate
    def _():
    return "a"

    This copies the Ruby syntax except for the receive_block decorator, but I considered it a reasonable sacrifice.
    Using "def _()" leaves the function as anonymous and allows you to specify parametrs for the block.

    Implementing blocks in Python

    So, to implement the syntax I just need to write the receive_block decorator.
    The objective of this decorator is to convert the block receiving function, A, into another decorator that receives a function, B, introduces B into A's scope and subsequently calls A.
    The key step is to add the block function to the scope. To do this we use Python's builtin types module. It includes the FunctionType method which creates a function object.

    types.FunctionType(func.__code__, scope)

    There is more to this method than what we use here, but I won't go into details about the method since we only need the simplest use of it.
    Once we know this, the decorator is pretty simple:

    import types

    def receive_block(func):
    def decorator(block):

    # Add block to globals
    scope = func.func_globals #globals()
    scope.update({'block':block})

    #create the function with the new scope
    new_func = types.FunctionType(func.__code__, scope)

    return new_func()

    return decorator


    Lets see how it works:

    @receive_block
    def external():
    for i in [1,2,3]:
    print block()

    print "External"
    @external
    def _():
    return "a"

    This will print

    External
    a
    a
    a

    And if we add a parameter:

    @receive_block
    def param_external():
    for i in [1,2,3]:
    print block(i)

    print "External with param"
    @param_external
    def _(i):
    return "a " + unicode(i)

    It, as expected, prints:

    External with param
    a 1
    a 2
    a 3

    But what if we wanted to implement something like ruby's Array class? Lets create an Array class that extends the builtin list type and see what happens.

    class Array(list):
    @receive_block
    def each(self):
    for i in self:
    print block(i)

    When calling

    a = Array([1,2,3,4])

    print "Each Square"
    @a.each
    def _(x):
    return x**2

    we get the exception:

    decorator() takes exactly 1 argument (2 given)

    because the instance method takes self as the first argument.
    So we modify our initial decorator to work with instance methods:

    def receive_block(func):
    def decorator(*args):
    if len(args) == 1:
    block, = args
    instance = None
    elif len(args) == 2:
    instance, block = args

    # Add block to globals
    scope = func.func_globals #globals()
    scope.update({'block':block})

    #create the function with the new scope
    new_func = types.FunctionType(func.__code__, scope)

    if instance:
    return new_func(instance)
    else:
    return new_func()

    return decorator

    This modification is pretty straight forward. So I won't explain it because it speaks for itself.

    Lets write some functions:

    class Array(list):
    @receive_block
    def each(self):
    for i in self:
    print block(i)

    @receive_block
    def collect(self):
    for (i, value) in enumerate(self):
    self[i] = block(value)

    @receive_block
    def handled(self):
    for i in self:
    try:
    block(i)
    except:
    print "This raised an exception"

    and pass them some blocks:

    a = Array([1,2,3,4])

    print "Each Square"
    @a.each
    def _(x):
    return x**2

    print "Each"
    @a.each
    def _(x):
    return x

    print "Collect"
    @a.collect
    def _(x):
    return x**2

    print a # a is changed

    print "Handled"
    @a.handled
    def _(x):
    if x != 9:
    raise Exception("this won't work")
    else:
    print "this works"

    We, then, obtain the desired output:

    Each Square
    1
    4
    9
    16
    Each
    1
    2
    3
    4
    Collect
    [1, 4, 9, 16]
    Handled
    This raised an exception
    This raised an exception
    this works
    This raised an exception


    It works. =)

    Another interesting way to do this would have been to add the block variable as a free variable of the function and have the code object reference it. In Python, when a closure is created the free variables are stored in an attribute of the function's code object and it's values are stored in the function itself using the cell type. Take this closure as example:

    def test():
    a = 10
    b = lambda x: x+2
    def inner():
    print a, b(a)
    return inner



    >>> i = test()
    >>> i.func_code.co_freevars
    ('a', 'b')
    >>> i.func_closure
    (< cell int object at 0x802400 >,
    < cell function object at 0xf68848 >)

    Adding the closure to the new function should be easy, since FunctionType accepts a closure keyword argument to do so. Unfortunately, the code's co_freevars attribute is read only:

    >>> i.func_code.co_freevars += ('block',)
    TypeError: readonly attribute

    If anyone, who knows better than I do, cares to provide an implementation using closures I'd love to hear your solutions.

    So this is how we implement Ruby-style blocks in Python using decorators. Hope you enjoyed it.


    Disclaimer

    This is by no means meant to be used in a production environment. Not even in a semi-serious environment. It is just a hack to demonstrate the how this could be done and it hasn't been tested.


    Other notes

    * The code for this project is hosted in http://github.com/nicolaslara/blocks/tree/master
    * I wanted to implement some of the builtin Ruby functions that use blocks but I didn't have the time. If somebody is up to the task I'd love to see what interesting things could be done with this.
    * Also, if somebody is willing to improve the code you are more than welcome.
    * Some of the problems of this code:
    -It clutters the global namespace.
    -The word "block" is introduced as a global reserved word.
    * For some reason blogger doesn't let me edit this site's template in 2009, so I couldn't add code syntax highlighting.