Sunday, 8 November 2009


Books are great! Some say the internet holds all the answers. I'd say it holds most of the answers and it will hold them all as soon as publishers, authors and hardware companies get off their own way and create serious mechanisms to manage (publish, buy, sell, read, own, share) digital books. If they're as stubborn as the music industry, this will not happen soon; but I think that, with the lessons learned from music, we're moving in the right direction. Let's see what the future holds (or let's ourselves make the future happen). For now, here's a list of technical books I've read lately and I'd like to recommend:

  • Beautiful Code

    This book is a must read. It is also a very easy read. It's comprised of series of write-ups by different authors of the likes of Brian Kernighan and Simon Peyton Jones where the authors have chosen a problem they've faced and solved "Beautifully". Each author has their own definition of beautiful code (great design, speed and reliability, testability, versatility and more), but all of them teach you valuable ways to think about problem solving, design and development. With languages from Haskell to C (through Python, Ruby, Perl Lisp and others) and topics from quicksort to software transactional memory (through Python's dictionaries, testing, debugging, mapreduce and the original implementation of Paint among others) I can assure you that this book will be of interest to any good computer scientist or software engineer. My recommendation: go buy this book and put it on your night stand; it is very easy to just read a chapter every day.

  • Real World Haskell

    This book is just amazing. It covers Haskell from zero-knowledge to advanced and has something to offer to programmers of most backgrounds. Those who have never programmed in Haskell (or even functionally) will definitely benefit from the paradigm switch and seeing the problems from another perspective. Those that already know Haskell will probably find it a little bit slow as it explores a lot of the basics, but there are, for certain, hidden pearls to be found on every section; plus, you can skim fast through the "Haskell" parts and get to the "Real World" parts to get great examples of the uses of this powerful, different, language.

  • Programming pearls

    This is a book of the like of Beautiful Code but written by only one author (Jon Bentley, who also writes a great article in Beautiful Code) and slightly older. It showcases a series of problems and different ways to approach and solve them. It is not so much focused on the solutions themselves but the way of thinking for arriving to those solutions. The problems in themselves are not overly complex, but the detail in which the different options are explored and the trade-offs and solutions presented is very appealing. All the problems are accompanied with the C code used for solving it but avoiding any unnecessary boilerplate. It's simply a great book to keep you sharp on your computer-problem solving abilities.

  • The algorithm design manual

    While I think this book is a little hard to read at some points it provides an amazingly complete set of algorithms, their implementations and example problems. One thing I liked a lot about this book is that it's not solely based in simple explanatory examples but also shows how the different techniques showcased in the book can be used to solve real life problems (that actually happened). But the really amazing thing of this book is that it contains a chapter with a comprehensive catalogue of algorithms categorized by the type of problem they solve. It is literally half of the book, they all have an illustration to represent what the algorithm does and it serves as a great reference for any future algorithmic problems you might encounter.

Honorable mentions go to:

  • Coders at work: a series of interviews with great programmers in which they expose their thoughts on programming along with their personal experience. As of today I've only read a few chapters, but I go so far as to recommend the whole book as it has left a very good impression.

  • Introduction to the theory of computation: a great introduction, from regular expressions, finite and push-down automatons to reductions and computability. Very well explained from an expert in the field. It is also well written and very short. Loved it.

  • Artificial Intelligence: a modern approach, this book contains at least introductory material to most fields in artificial intelligence. It has in-depth material about a lot of topics and instead of trying to deeply explore all of the areas of AI in one book (which would be impossible) it focuses on the most important areas and gives you a great introduction on the rest for you to go and read the papers or a more specific book.

Later this month I'll make a list of non-technical books I have read or am reading and I'd like to recommend.

Any technical books you can recommend me?
I am particularly interested in advanced data structures, modern concurrency (Programming Erlang is in my to-read list), compilers (Compilers by Aho et. al. is in my to-read list), optimization and computational geometry. But that's just a list of stuff that popped in my head this instant, so if you have any recommendations on another subject please be sure to let me know!

Saturday, 7 November 2009

Emacs mark-stack

There's no worst feeling than being lost in your code. You know, when you need to look at a bunch of code fragments buried in other files or in the same, very large, file; or having to trace back a bunch of function calls spanning multiple emacs buffers and not knowing where the previous stuff was.
Okey, there are probably a lot of feelings that are worst. But you gotta give me it's annoying.

If only there was a way to push positions to a stack and then come back to them later on! I know about C-@ and C-U C-@, but if you use delete-selection-mode or anything else in which you do a lot of marks that's not an option. Plus, the default key bindings suck and it's only buffer local. So I decided to write my own mark system in emacs lisp.

Before people come rushing down to the comments to tell me there's already a better solution for this problem, take into account that I didn't look much into it. I did this not only to solve the problem but to practice my elisp. Nevertheless, if there are alternatives I would love to hear about them; so comment on, just don't troll =)

If you don't know emacs-lisp there's a guide by Xah Lee I can't recommend enough.


So I want two things: (i) To be able to push and pop marks locally to a buffer and (ii) to be able to pop marks globally. The later means that if I push marks in two different buffers and I do a global pop, the current buffer will change to the one with the latest mark.

To do this, I used two datastructures: A hash table that maps a buffer name to its stack, and a global stack that keeps all the pushed positions.

The first one looks like this:

"marks.el" -> (1622 1042 1283)

((1622 "marks.el") (1042 "marks.el") (194 "*Messages*") (1283 "marks.el"))

In the global stack we keep the name of the buffer so we can move to it and remove the mark from the right stack.

So, onto the code.


First we need a helper function to add the list to the hash if it doesn't exist already or push to the stack on an existing list

(defun add-or-push (key value hash)
(if (gethash key hash)
(puthash key (cons value (gethash key hash)) hash))
(puthash key (list value) hash))))

And some functions to clear the local and global stack should things go wrong.

(defun clear-push-mark-for-buffer ()
"Resets the buffer's stack"
(puthash (buffer-name) () local-mark-stack))

(defun clear-global-push-mark ()
"Resets the buffer's stack"
(setq global-mark-stack '())
(maphash (lambda (kk vv) (puthash kk () local-mark-stack)) local-mark-stack)

Now, the real code:

For pushing, we just need one function that adds the current possition to both datastructures:

(defun local-push-mark ()
"Pushes a the current point to a stack"
(if (boundp 'local-mark-stack)
(let (buffer point)
(setq buffer (buffer-name))
(setq point (point))
(add-or-push buffer point local-mark-stack)
(message "Pushed %d on %s" point buffer)
(if (boundp 'global-mark-stack)
(setq global-mark-stack (cons (list point buffer) global-mark-stack))
(setq global-mark-stack (list point buffer)))))
(setq local-mark-stack (make-hash-table))

For popping, however, we need a function for the local stack and another for the global stack

(defun local-pop-mark ()
"Pops the a mark from the current buffer's stack"
(let (stack)
(setq stack (gethash (buffer-name) local-mark-stack))
(if (and (boundp 'local-mark-stack) stack)
(goto-char (pop stack))
(puthash (buffer-name) stack local-mark-stack)
(setq global-mark-stack
(remove (list (point) (buffer-name)) global-mark-stack)))
(message "No marks to pop"))))

(defun global-pop-mark ()
"Pops a mark from any buffer"
(let (pos buffer)
(setq pos (car global-mark-stack))
(setq buffer (nth 1 (car global-mark-stack)))
(setq stack (gethash buffer local-mark-stack))
(if (and (boundp 'global-mark-stack) stack)
(switch-to-buffer buffer)
(goto-char (pop stack))
(puthash buffer stack local-mark-stack)
(setq global-mark-stack (remove (list pos buffer) global-mark-stack)))
(message "No marks to pop"))))

the only important detail to notice is that the global pop removes both from the local and global stacks. The rest should be self-explanatory.

And we add a statement to provide the marks module

(provide 'marks)

So now, we just need to load the module and add some key bindings:

(require 'marks)
(global-set-key (kbd "C-x ") 'local-push-mark)
(global-set-key (kbd "C-x ") 'local-pop-mark)
(global-set-key (kbd "C-x S-") 'global-pop-mark)

Using it.

If your using my .emacs you just need to ckeck out the code and compile it.

If not just copy everything up to " (provide "marks") " into a file called marks.el and load it in your .emacs.


Edit: Apparently there's no lisp syntax support for the highlighter I'm using. sorry for that.

Wednesday, 4 November 2009

My Pycon talk (Python Metaprogramming)

If you followed the link on my previous post and looked at #64 you might have noticed I'll be giving a talk on Python Metaprogramming. Here's a sneak peak at what the talk will be like along with some comments.


  • Metaprogramming:
    A short explanation of what metaprogramming is, why it is needed and how it can save you time and effort.

  • Metaprogramming Languages:

    What languages are considered "metaprogramming languages" and what constructs do they provide to be considered as such. A 30000 ft view exploration of the metaprogramming features of languages like Lisp, Ruby, Perl, Javascript, Haskell and OCaml.

  • Cool uses/examples of metaprogramming in other languages:

    Some interesting examples of metaprogramming regardless of the language. These examples will be ported to Python at the end of the talk. I will go through this very fast just to show what we want to achieve in each example and what techniques are used by other programming languages to achieve this effects. A more detailed explanation will be done at the end of the talk by porting the examples to Python.

  • Tools for metaprogramming in Python:

    Each of the subsections of "Tools for metaprogramming in Python" should contain the following information:

    • Explaination: short explaination

    • Short example in Python

    • An example of using the technique to port some othe language metaprogramming features

    • Patterns: Common or useful patterns to incorporate that metaprogramming tool into your code

    • Advantages, disadvantages and perils.
The subsections I'll be covering are the following:
  1. Code generation: eval() and exec():
    The basics. Code Generation as strings and getting them to execute.

  2. Python (A.K.A.: Python's data model):
    Exploring the flexibility of Python's data model: Duck-typing, special methods, attribute access, callable emulation, behaviour on operators.

  3. Decorators:
    An introduction to decoratos and what can be achieved with them. Emphasis on decorator patterns.

  4. Metaclasses:
    A basic introduction to metaclasses: What they are and when are they needed. Includes an idea of when can metaclasses be replaced by decorators.

  5. types. Specially types.FunctionType:
    Exploring type objcts. Specially function objects, code attributes, locals, globals defaults and closures

  6. Others: import hooks, ByteCode, etc:
    A seccion on other useful metaprogramming tools that might fit into the time frame of the talk. (this seccion might be included depending on the time or if there are in depth talks in pycon about somme of the above techniques that suggest cutting down on them in favour of other options)

  • Putting it all together:
    Just a pointer to a resourse that shows example code using most of the techniques shown throughout the talk. The example will probably be creating a DSL: possibly language generation (HTML, CSS, SQL or other), a generic problem solver or a small framework for distributed computation. I might take a lightning talk to go through this example in more detail.

  • The Good, the Bad and the Ugly:
    A short conclusion about metaprogramming in Python. Why use it, when to use it, when to avoid it and what is python missing regarding metaprogramming that other languages have.

  • Alternative titles

    There are some alternative titles I'm considering for the talk. Any favourites?
    • Baddass Metaprogramming with Python (Thanks, Ahsanul)
    • The swiss army knife for metaprogramming in Python
    • Metaprogramming tricks for Python programmers
    • A zoo of Python metaprogramming tools
    • Tools and techniques for Python metaprogramming
    • Enhancing your code through metaprogramming
    • Python metaprogramming tricks for your toolbox

    Pycon 2010

    The list of scheduled talks for Pycon 2010 is out and they are looking very good. Thanks to all the people involved in making Pycon happen!

    I am specially fond of #64. Go check it out!

    If you missed Pycon 2009 (I did), you can take a look at what went on here.

    As for the Blog Posting Month not all of my posts during the month will be like this. A real one is on the way. Promise!

    Tuesday, 3 November 2009

    Blog posting month

    November is said to be the (Inter)National Blog Posting Month. The idea is simple: 30 days - 30 posts.

    People that read this blog regularly would know that it hasn't been very active lately (More than 6 months without posting). That's just wrong. Let's change it.

    Keeping up with the Blog Posting Month should be a good motivator to start writing again. I give you that I am already a bit late as today is November 3rd, so I have some catching up to do. From the technical perspective expect stuff in the lines of: concurrency, new-ish programming languages, emacs, linux, comet, python, algorithms and optimization. However, technical posts take some time to prepare and though I have some half-cooked posts already that should be coming out this month do expect to read some lighter content during this month.

    That's it for now. Till tomorrow.

    Monday, 20 April 2009

    GSoC projects I'm excited about

    Today the list of accepted projects for Google Summer of Code was made public. Here's the list of the top 25 projects that I'm personally excited about. Im adding an asterisk to those that either I'm really looking forward to use or that really blew my mind.

    • Graph Partitioning in The Boost Graph Library
    • Relations data type *
    • Boost.Python py3k support
    • Multiple Database Support in Django *
    • Model aware validation
    • UI improvements for the admin interface
    • Large dataset manager
    • Automatic parallelization in Graphite *
    • Portage backend for PackageKit
    • Universal Select Tool *
    • Tree-wide collision checking and provided files database
    • Advanced GUI for brush dynamics (GIMP)
    Google open source programs
    • Build a distributed object-capability system on the Caja platform.
    • Improving space profiling experience
    GNU Project
    • Improve "sort" on multi-core systems
    • GDB - Python Scripting API Enhancements
    • Emacs GDB/MI migration
    • Improving Search and Virtual Folders in KDE4 *
    • Plasma Media Center Components
    • New Widget Explorer
    • PlasMate Editor - An editor for Plasmoids, DataEngines and Themes
    • Web pages over rsync **
    • Python Interfaces For OpenCog Framework API
    • Natural Language Generation using RelEx and the Link Parser
    Python Software Foundation
    • Apache Axis2 extension for Jython

    I know that there are a lot more great projects but I had to settle for 25. Which projects are you excited about and why?

    Spiral matrix

    Today I found a very interesting set of 'simple' problems. It is a list of Berkeley University's "hardcore tech-interview style riddles and mathematical puzzles". I was specially interested in the Computer Science section (as you might have guessed) and, in particular, one problem caught my attention (probably because of the large matrix).

    The problem

    The problem reads as follows:

    Write a program that will display a "spiral" of NxN numbers, using constant space (no arrays allowed). For example, here's what the spiral looks like for N=10:

    99 98 97 96 95 94 93 92 91 90
    64 63 62 61 60 59 58 57 56 89
    65 36 35 34 33 32 31 30 55 88
    66 37 16 15 14 13 12 29 54 87
    67 38 17 4 3 2 11 28 53 86
    68 39 18 5 0 1 10 27 52 85
    69 40 19 6 7 8 9 26 51 84
    70 41 20 21 22 23 24 25 50 83
    71 42 43 44 45 46 47 48 49 82
    72 73 74 75 76 77 78 79 80 81

    Sounds simple enough.

    The solution

    As I've been reading a lot of "Real World Haskell" lately (and considering the language is great for the task at hand) I decided to code this program in Haskell. If you don't already know Haskell I recommend you to go and learn it on your free time. It is an excellent language with a very unorthodox view of programming (not only functional, but pure!) that will open your mind to a very new and interesting programming style. A good resource for learning Haskell (besides the book, which I highly recommend and will do so with further details in a future post about books) is the classical Gentle introduction to Haskell. But if you prefer a nice colourful website with cool drawings and good info to match the cool design (I know I do!) you can check Learn you a Haskell for great good.

    Enough with the language. Into the problem.

    To analyse the solution I enumerated some of the matrices, from which matrices of size 4 and 5 were the most useful

    15, 14, 13, 12,
    4, 3, 2, 11,
    5, 0, 1, 10,
    6, 7, 8, 9

    24, 23, 22, 21, 20,
    9, 8, 7, 6, 19,
    10, 1, 0, 5, 18,
    11, 2, 3, 4, 17,
    12, 13, 14, 15, 16

    A matrix of size n contains the numbers from n*n-1 to 0. So the next matrix will contain 2n-1 new elements and the rest will be the same elements in a different order. This gives us the idea that the algorithm can be solved recursively.
    If we look with more detail into the two matrices outlined above, we can notice that the top row and rightmost column contains all the new elements and that, due to the spiral ordering described in the problem, they will always be located in these positions (there are 2n-1 slots there).

    What's left is to figure out how the elements in the previous matrix map to those in the next one and we'll have ourselves a recursive algorithm. We can notice that the first element in the matrix of size 4 is the last element in the sub-matrix formed by removing the new elements from the matrix of size 5. Because of the spiral order the next element to the right in the old matrix becomes the next to the left in the new one and the element that was beneath a number is transposed above it. We can see it as a 90 degree rotation on the old matrix.

    The code

    Knowing this all that's left is to write the code.
    We will need a function that given the position in the n-sized matrix will return the element
    that belongs to that position.

    spiral :: Int -> Int -> Int -> Int
    spiral 0 j n = n*n - 1 - j
    spiral i j n
    | j == (n-1) = n*n - n - i
    |otherwise = spiral (n-1-i) (n-j-2) (n-1)

    The first line of this function (after the signature) is one of our base cases: The top row of the matrix is formed by the numbers from (n*n)-1 to n*(n-1). The second definition our second base case: the rightmost column. In this case each element descends from the element in the top right corner (i.e.: n*(n-1) ) one number by row. The rest of the function (the otherwise clause) is our recursive case. On the matrix of size n-1, we are interested in the element that occupies the row n2-i where n2 is the size of the smaller matrix (i.e.: n-1) and the column n2-j-1. Note that the difference between the row and the column is that in the column case we need to substract an extra 1 which refers to the extra columns that appears in the larger matrix. If we don't do this, the transformation would be transposed to the right.


    This code is enough to solve the problem. But to better calculate the results (without having to calculate each index at a time) I provided some extra functions.

    -- claculates an index based on the previous matrix
    spiral :: Int -> Int -> Int -> Int
    spiral 0 j n = n*n - 1 - j
    spiral i j n
    | j == (n-1) = n*n - n - i
    |otherwise = spiral (n-1-i) (n-j-2) (n-1)

    -- returns the numbers from 0 to n-1
    range :: Int -> [Int]
    range n = take n (iterate (1+) 0)

    -- returns every possible index of an NxN matrix
    matrix :: Int -> [(Int,Int)]
    matrix n = [(i,j) | i<-range n, j<-range n] -- a helper function for the map in printSpiral revSp :: Int -> (Int, Int) -> Int
    revSp n (i,j) = spiral i j n

    -- returns the result for each index
    printSpiral :: Int -> [Int]
    printSpiral n = map (revSp n) (matrix n)

    -- calculates all spiral matrices from 0 to n
    through :: Int -> [[Int]]
    through n = map printSpiral (range (n+1))

    Does anyone have a better (or different) solution?

    Saturday, 28 February 2009

    Publishing an emacs buffer

    Yes, another emacs post, in row, but I found this so amazingly cool that I couldn't just keep quiet.

    I told myself today: "It would be cool if I could convert what is rendered in an emacs buffer to html so I can show it to others". Of course, some people had already beaten me to it.

    Making it happen

    I used an emacs add-on by Hrvoje Niksic called htmlize. It does all the hard work, and it does it very well.

    Still, I wanted a one key solution to publish it. So I just coded these small emacs-lisp functions:

    (defun publish-buffer-to (file)
    "Converts buffer to html and writes it to file"
    (interactive "Ffile: ")
    (require 'htmlize)
    (with-current-buffer (htmlize-buffer (current-buffer))
    (write-file file)
    (kill-buffer (current-buffer))))
    (message (concat "current buffer contents published to " file)))

    (defun publish-buffer ()
    "Converts buffer to html and writes it to ~/public_html/emacs.html"
    (publish-buffer-to "~/public_html/emacs.html"))

    (snnipets provided by htmlize! =) )

    Now all that was left was to provide a simple key shortcut.

    (global-set-key [f7] 'publish-buffer)

    And thanks to Ruslan Spivak's code I could also add an easy way to create code snippets out of regions.

    So now I can easily share my emacs buffers not only on my local server but on my blog =).

    Apparently there's another module to do this that might be added to emacs soon, but for now I'm very happy with this solution.

    In the future, I'd like to integrate this with tramp and add better file management support so I can seamlessly post to a remote server. I would also like to make it autoreload so it can track live changes, but we'll see.

    If you liked this, you can check out my .emacs

    Sunday, 22 February 2009

    My .emacs

    I have been delaying this post for quite a while. I've been waiting until my .emacs file is "complete". Today, I realised it is never going to be complete. So here's a few notes regarding my emacs configuration and how you can use it.


    I must confess, a few weeks back my emacs configuration consisted of one HUGE file and a few emacs lisp files scattered in a directory. Also, half of the tools and modes I used were installed in a system wide manner via the OS's packaging system. ...A mess.
    I decided to put some order (beauty?) to it. So now my .emacs file is 9 non-comment non-whitespace lines and the rest of the configuration is well distributed several different in appropriately named files and directories. With all tools included it is 535149 LOC.

    Mobility (sort of)

    I love working in emacs so much that I consider most other editing options a pain. I've even considered implementing emacs-lisp in javascript so I can turn every textarea into a fully functional emacs buffer (this is way down in my list of personal-projects-i-will-do-someday due to its complexity and my lack of free time)[1].

    Being a little more realistic, I decided just to put my .emacs in a git repo that I could access everywhere and which included all the things I normally use when developing (or writting in general). This way I could have all my tools everywhere customized the way I want them.
    With this approach installing new tools (like JDEE) becomes a little more complicated because they are commonly packaged to be installed in a system wide manner. Also, I cannot use my OS packaging system. But the upside of having everything in one places pays off.

    So, why does it say "sort of" in the title? Well, some of the tools (the ones that need non-emacs-lisp components) are compiled for my system (A 64-bit Arch Linux box). So you might have problems with them if you just clone my emacs repo in a different system.
    What can go wrong? Mostly there might be problems with JDEE and AucTex. But besides that I don't see much that could go wrong in a unix system.


    There are not many new and amazing things. So here's a non-exhaustive list of stuff I use every day:
    * Color themes
    * git support
    * JDEE
    * Auctex
    * ECB
    * word-to-emacs (open .doc files in emacs)
    * some modes: (java, haskell, erlang, javascript, etc)
    * some key-bindings:
    * C-x p: go to matching paren
    * f9: cross highlight (line and column)
    * f11: fullscreen
    * f10: remove scrollbar (combined with f11 it makes a great fullscreen experience =) )
    * M-/ M-: move line up/down
    * f8: make frame on display
    * f5: go-to-line
    * enhanced clipboard interaction
    * in place annotations (I'd like to make this one better)
    * automatic backups
    * tramp
    * everything else that I forgot

    What's missing

    Currently I am missing:
    * Slime (I haven't used Lisp in quite a while, but I'd like to start using it again. Wait for my Clojure exploration)
    * Better Python support. The default mode is very good. But I'd love to have more advanced stuff like Ropemacs.
    * Bookmarks. A personal project I might add soon.
    * Probably some stuff I'm forgetting


    I very often get asked by firends for my .emacs file (or updates from older ones). So here's a way anyone can go and use my emacs configuration:
    Go to github and get a copy of my emacs repo:

    > git clone git:// ~/elisp
    > ln -s ~/elisp/.emacs ~/.emacs

    It doesn't go without a disclaimer: This is, and will always be, an incomplete project. Though it probably is portable, it is made for my personal use and there is no guarantee it will work in different non-linux setups.

    Now, I'd love to make this even more portable and add different tools to it. So if anyone have a problem or suggestion I'd love to hear it. I would, of course, like it better if the suggestions come with a link to a fork of the project where the problems are fixed and I can just pull it ;)


    So, Whats the emacs feature you like the most? Which one amazes you the most? anything you couldn't live without? What do you miss from other text editors when working in emacs?


    [1] The folks at Mozilla labs have done a very nice contribution to the online editing world with Bespin. I hope they keep on the good work and turn to project into something emacs-like for the web.

    Thursday, 19 February 2009


    In the last post I mentioned the book Programming Pearls. In the second chapter, Jon Bentley presents a simple, but interesting problem: Given a list of words, group those that are acronyms together.

    If you don't know what acronyms are, it is simply a words formed from the initial letters of others. For instance, the words "pot" and "top" are acronyms; so are the words "slow" and "owls".

    When I read it, I thought "This could be done with unix tools and pipes". Actually, I didn't find a way of doing it efficiently with native unix tools. The reason being that there's no horizontal sort implementation in unix. You could write something like this:

    for i in `cat dict | sed 'p' | sed 'N;s/\n/*-/g'`; do echo $i |
    cut -d "*" -f 2 | sed "s/\(.\)/\1-/g" | tr "-" "\n" | sort |
    tr -d "\n"; echo "$i" | cut -d "*" -f 2 ; done | sort

    But it won't be very efficient and it's very ugly. I didn't spend too much time thinking of a better way to do it unix-style. If you have one, please post it on the comments =)

    Instead, I wrote a small python script

    #!/usr/bin/env python
    import sys

    if __name__ == '__main__':
    f = open(sys.argv[1])
    for line in f.xreadlines():
    line = line.strip()
    print '%s-%s' % (''.join(sorted(line)), line)

    and I executed:

    ./ dict2 | sort

    which runs in about 0.47 seconds. Then, we could add another step to pretty print the acronyms, but I wasn't interested in that.

    Unfortunately, the only thing that remains 'unix' about this solution is the use of sort and one pipe. This post is not so much to show my findings but to look for a better solution. Does anyone have a suggestion?

    Saturday, 31 January 2009


    This last week I happened to have the necessity of generating all the possible permutations of an array. This seems to be a fairly easy task but many programmers won't come up with a simple solution to this problem very fast. There is at least one intuitive, recursive, solution to this problem, many more complex but *slightly* more efficient, iterative, solutions and no really time-efficient solution. After all, the problem is to generate n! orderings of an array so we, at least, have to visit each one once.

    The well-known solution

    The most widely spread solution to this problem is the Dijkstra algorithm for generating the next permutation. It is based on the assumption that there is a total order between the elements of the array and the elements of the array are unique (there are no repetitions). If these conditions are met, the algorithm always generates the next permutation in lexicographical order.

    The algorithm is very straight forward and goes as follow:

    def get_next(a):
    N = len(a)
    i = len(a)-1

    while a[i-1] >= a[i]:
    i -= 1

    j = N

    while a[j-1] <= a[i-1]:
    j -= 1

    a[i-1], a[j-1] = a[j-1], a[i-1]

    i += 1
    j = N

    while i < j:
    a[i-1], a[j-1] = a[j-1], a[i-1]

    return a

    To use this algorithm to generate every possible permutation we just call it n! times:

    def dijkstra(a):
    import operator
    fact = reduce(operator.mul, range(1,len(a)+1))
    the_list = [copy(a)]
    for i in range(fact-1):
    a = get_next(a)

    return the_list

    the only possibly uncommon thing here is line 3 which is a functional-style factorial function.

    This works very well for generating the permutations of n (the permutations of the array [0..n-1]) but it won't work for arrays of arbitrary elements. So we explore other options.

    The known recursive solution

    There's a (not-so-well-)known recursive solution to solving the permutation problem. It is known as the HeapPermute algorithm or Heap's algorithm for permuting an array.
    My python implementation of the algorithm goes as follow:

    def heap_aux(a, n, the_list):
    if n == 1:
    for i in range(n):
    heap_aux(a, n-1, the_list);

    if n % 2 == 1:
    a[0], a[n-1] = a[n-1], a[0]
    a[i], a[n-1] = a[n-1], a[i]

    def heap(a):
    the_list = []
    heap_aux(a, len(a), the_list)
    return the_list

    This algorithm works very well and it is very efficient too but it generates the permutations in an order that I find particularly disturbing.
    Also it is not as easy to understand as I would like.

    Enter my recursive solution
    This is probably the most straight forward solution I could come up with. It is a clasical recursive design.
    Want to permute an array of n elements? How would you do it if you already had a function to solve the problem for a smaller array?
    Simple. Select each possible element of the array to be "the first element of the permutation" and then permute the rest of the array:
    A permutation of 1,2,3,4 would be:
    1 + permutation_of(2,3,4)
    2 + permutation_of(1,3,4)
    3 + permutation_of(2,1,4)
    4 + permutation_of(2,3,1)

    Simple, right?

    Let's code it:

    for j in range(i, len(a)):
    a[i], a[j] = a[j], a[i]
    permutation(a, i+1, the_list)
    a[i], a[j] = a[j], a[i]

    This loop captures the essence of the procedure described above. We have to remember to put the elements back where they were after the recursive call to allow next iteration to work. Not doing so would result in swapping the wrong element in the next iteration.

    Puting it all together (and adding a small optimization):

    def perm_aux(a, i, the_list):
    if i == len(a):
    perm_aux(a, i+1, the_list)
    for j in range(i+1, len(a)):
    a[i], a[j] = a[j], a[i]
    perm_aux(a, i+1, the_list)
    a[i], a[j] = a[j], a[i]

    def permutate(a):
    the_list = []
    perm_aux(a, 0, the_list)
    return the_list

    This algorithm generates the list in a nicer order (though it's not lexicographical). And as its predecessor is quite efficient (taking into account the complexity of the problem).


    Normally recursive algorithms are put down for performance reasons: They keep all local variables stored in the stack while the other recursive calls are made and have to incur in function calling overhead. In the case of these two algorithms the good thing is that the recurtion depth is not proportional to the size of the problem solution but to the size of the array (actually it is exactly as deep as the array's length). Also, if the list variables are passed by reference (this is programming language specific), only the reference to the variable needs to be stored in the stack and there is no overhead of coping the variables on every call.
    It will probably never be as efficient as their iterative counterparts but for me readability (with small performance penalties) definitely pays off.

    To measure the times I used the following python decorator:

    def timing(func):
    def inside(*args):
    t1 = time.time()
    res = func(*args)
    t2 = time.time()
    print '%s took %0.3f ms' % (func.func_name, (t2-t1)*1000.0)
    return res
    return inside

    Given the complexity of the problem, the graph grows rapidly out of scale. So a graph in logarithmic scale is in order

    As you can see the algorithms behave very similarly. The largest difference (for an array of size 9) between the fastest (heap) and the slowest (permute) is 0.6 seconds. This of course will grow with the array size, but in a linear instead of exponential manner.

    Have another solution to this problem? comment on! =)

    Friday, 30 January 2009

    sed one-liners

    Sed never ceases to amaze me. But I was specially amazed by the fact that "sed is turing complete". Check out this useful sed commands that'll make your life a lot easier!.

    Friday, 16 January 2009

    Django gets aggregation support!

    As for Changeset 9742 Django gets aggregation support. For those of you that don't know, I worked in this project as part of Google Summer of Code and I'm very glad to see this finally coming true!.

    I also wanted to thank Russell Keith-Magee for mentoring me throughout the project and working in the code after GSoC was finished, Malcolm Tredinnick for giving amazing code reviews during the whole SoC and the many others that worked in getting this done.

    I encourage you to check it out. The documentation is a great start! =)

    Monday, 5 January 2009

    Emulating Ruby blocks in Python

    There are some Ruby features I like a lot. A handful of them are borrowed from other languages (like lisp or perl), but the Matz found a way of making the language a very nice and powerful combination of them. I've been using Ruby lately and got thinking on how could I implement one of Ruby's most beloved features, blocks, in Python (remember, "imitation is the sincerest form of flattery"). As it turns out, it is not all that hard. Here's what I came up with:

    But before that...

    On Ruby blocks

    Blocks in ruby provide a way of creating functions that act on a code block defined later on.
    An example of blocks in ruby is the following:

    array = [1, 2, 3, 4]
    array.each { |n| puts n ** 2 }

    According to their site: "A block is like an anonymous function or lambda [and] the variable between pipe characters is the parameter for this block". What's missing from this description is that ruby also provides the syntactic sugar to create functions that receive blocks using the "yield" statement. In other words, it is a way of creating closures and attaching them to other methods. Using closures in ruby comes very easily even for people that donsn't know what a closure is. If we were to implement the previous behaviour ourselves we would do something like this:

    class Array
    def each2
    for i in self

    array = [1, 2, 3, 4]
    array.each2 { |n| puts n ** 2 }

    I won't go into details because there is a lot of documentation on ruby blocks out there.

    So, onto Python...

    The Python "builtin" that resembles the most to blocks is that of PEP343, the with statement; but I wanted something that immitetad the ruby syntax as much as possible. The with statement is nice, but it doesn't cover all the cases.

    So I decided to use a decorator to convert the function that uses the "block" into something that receives the block, inserts it into the namespace, and execute the original function with the block as a corutine.

    The idea was something like this:

    def simple_iterate():
    for i in [1,2,3]:
    print block()

    def _():
    return "a"

    This copies the Ruby syntax except for the receive_block decorator, but I considered it a reasonable sacrifice.
    Using "def _()" leaves the function as anonymous and allows you to specify parametrs for the block.

    Implementing blocks in Python

    So, to implement the syntax I just need to write the receive_block decorator.
    The objective of this decorator is to convert the block receiving function, A, into another decorator that receives a function, B, introduces B into A's scope and subsequently calls A.
    The key step is to add the block function to the scope. To do this we use Python's builtin types module. It includes the FunctionType method which creates a function object.

    types.FunctionType(func.__code__, scope)

    There is more to this method than what we use here, but I won't go into details about the method since we only need the simplest use of it.
    Once we know this, the decorator is pretty simple:

    import types

    def receive_block(func):
    def decorator(block):

    # Add block to globals
    scope = func.func_globals #globals()

    #create the function with the new scope
    new_func = types.FunctionType(func.__code__, scope)

    return new_func()

    return decorator

    Lets see how it works:

    def external():
    for i in [1,2,3]:
    print block()

    print "External"
    def _():
    return "a"

    This will print


    And if we add a parameter:

    def param_external():
    for i in [1,2,3]:
    print block(i)

    print "External with param"
    def _(i):
    return "a " + unicode(i)

    It, as expected, prints:

    External with param
    a 1
    a 2
    a 3

    But what if we wanted to implement something like ruby's Array class? Lets create an Array class that extends the builtin list type and see what happens.

    class Array(list):
    def each(self):
    for i in self:
    print block(i)

    When calling

    a = Array([1,2,3,4])

    print "Each Square"
    def _(x):
    return x**2

    we get the exception:

    decorator() takes exactly 1 argument (2 given)

    because the instance method takes self as the first argument.
    So we modify our initial decorator to work with instance methods:

    def receive_block(func):
    def decorator(*args):
    if len(args) == 1:
    block, = args
    instance = None
    elif len(args) == 2:
    instance, block = args

    # Add block to globals
    scope = func.func_globals #globals()

    #create the function with the new scope
    new_func = types.FunctionType(func.__code__, scope)

    if instance:
    return new_func(instance)
    return new_func()

    return decorator

    This modification is pretty straight forward. So I won't explain it because it speaks for itself.

    Lets write some functions:

    class Array(list):
    def each(self):
    for i in self:
    print block(i)

    def collect(self):
    for (i, value) in enumerate(self):
    self[i] = block(value)

    def handled(self):
    for i in self:
    print "This raised an exception"

    and pass them some blocks:

    a = Array([1,2,3,4])

    print "Each Square"
    def _(x):
    return x**2

    print "Each"
    def _(x):
    return x

    print "Collect"
    def _(x):
    return x**2

    print a # a is changed

    print "Handled"
    def _(x):
    if x != 9:
    raise Exception("this won't work")
    print "this works"

    We, then, obtain the desired output:

    Each Square
    [1, 4, 9, 16]
    This raised an exception
    This raised an exception
    this works
    This raised an exception

    It works. =)

    Another interesting way to do this would have been to add the block variable as a free variable of the function and have the code object reference it. In Python, when a closure is created the free variables are stored in an attribute of the function's code object and it's values are stored in the function itself using the cell type. Take this closure as example:

    def test():
    a = 10
    b = lambda x: x+2
    def inner():
    print a, b(a)
    return inner

    >>> i = test()
    >>> i.func_code.co_freevars
    ('a', 'b')
    >>> i.func_closure
    (< cell int object at 0x802400 >,
    < cell function object at 0xf68848 >)

    Adding the closure to the new function should be easy, since FunctionType accepts a closure keyword argument to do so. Unfortunately, the code's co_freevars attribute is read only:

    >>> i.func_code.co_freevars += ('block',)
    TypeError: readonly attribute

    If anyone, who knows better than I do, cares to provide an implementation using closures I'd love to hear your solutions.

    So this is how we implement Ruby-style blocks in Python using decorators. Hope you enjoyed it.


    This is by no means meant to be used in a production environment. Not even in a semi-serious environment. It is just a hack to demonstrate the how this could be done and it hasn't been tested.

    Other notes

    * The code for this project is hosted in
    * I wanted to implement some of the builtin Ruby functions that use blocks but I didn't have the time. If somebody is up to the task I'd love to see what interesting things could be done with this.
    * Also, if somebody is willing to improve the code you are more than welcome.
    * Some of the problems of this code:
    -It clutters the global namespace.
    -The word "block" is introduced as a global reserved word.
    * For some reason blogger doesn't let me edit this site's template in 2009, so I couldn't add code syntax highlighting.