Tuesday 30 December 2008

Doing it right!

(or "my primer on studying computer science the right way")

For those of you how don't know it, I'm a CS major and am currently finishing the last three months of my career. I've been looking back at the whole five years of cs curriculum and noticing what I did right, what I should have done right and, specially, what I started to do right late and wish someone would have told me to do right from the beginning. So here is my advice to those staring to study for a computer science degree. I will not categorize all of these ideas according to how I did or didn't do them. Instead, I'll elaborate on what they actually mean, why I think they're important and how to do them effectively.

Motivation
One would expect that after completing a degree in computer science people would be very well acquainted with the different aspects of the discipline. Moreover, they should be fluent in many programming language, understand the inner workings of a computer and of operating systems, know the advantages and disadvantages of different implementation decisions in programing language design, understand (and apply) algorithm complexity analysis, know several data structures and abstract data types along with their uses and implementations, understand networks and network software, just to name a few. This list could go forever, but besides knowing subjects related to the CS curriculum one would expect a computer science newly grad to be aware of different tools to improve productivity (i.e. not to code in notepad), to be able to work their way through unix, to strive for software quality, to be curious about different/new approaches to problem solving (like alternatives to sockets that make distributed programming suck less), do testing, be able to design correct and efficient algorithms, use version control; again, just to name a few.
Unfortunately, many people can get through their career without actually knowing more than shallow information on all of those subjects (and I mean those explicitly in the career program, many people have probably never even heard of subversion or grep or emacs). Some might claim the problem is in the education system. I personally believe that, even if the education system do have many problems, the causes are lack of curiosity, lack of motivation or lack of information.
"The information is out there!" you might say, and you would be right, but it is very hard to actually find it if you don't know what you're looking for. This is the case of the freshmen which, with curiosity of motivation, could actually find the information to start doing computer science "right" but who might also just need to discover this information first to start building their curiosity and getting motivated. I could very well be a problem in the education system in general but I will not address that issue here (at least not in this post). Today I'll focus in what I believe could help you obtain the right information to start doing it right.

But I just want to learn to program, do I also need to do/learn all this stuff?
Short answer: yes.
Long answer: yes, but go get a CS degree. I am not saying that you *need* a CS degree to be a programmer. I'm not even saying you need the degree to excel at it. But certainly most people do. What you *do* need to be a programmer is training in discreet mathematics, algorithms, data structures, group theory, machine architecture, probabilities, set theory, logic, amongst many others which, again, I will not list. Sure, some people could teach themselves all these things, but most people can't. And even if you can, why not get the degree while you're at it?
Anyway, I phrase this article talking about computer science majors but it is as valid for anyone wanting to take programming, technology or computation, in general, seriously.

For whom is this intended
These tips are not necessarily for you. This article is intended for those interested in tackling the art of programming or designing programs. If you want to work as a manager, a company spokesman or at selling software (not making it), you probably won't need any of this. If you, on the other hand, are interested in creating good software, whether your are a cs major or not, then read on for a list of what, to my understanding, is a good set of advices to becoming better at it.

Advices

* Learn (many) different languages
This is probably one of the most important items in this list. If I hear somebody telling me "I only know C and Java", I read between the lines "You don't actually have any real interest in programing. You probably were never interested in learning things deeper than what they showed you in your courses". If they add "And, well, I once used Scheeme (or Haskell) for my programming languages course, but nobody uses that kind of languages in the real world, so I never really learned how to use it" I read: "You probably care more about getting some job than actually doing great software" and "You probably can think of new, different, solutions to the problems besides the straight forward procedural or OO solutions that come in a cereal box".
Learning new programming languages is not only about using them to solve problems in your job (though they might comehttp://www.dabeaz.com/generators/genfind.py in very handy. If you don't believe me check out this implementation of unix find command; it is not only very short but a different way of thinking to what you would do in Java or C). Even if you might not use some programming language in your job (though you might), learning to program in a different paradigm and with a different set of constructs and restrictions will definitely make you a better programmer. Learning Erlang, for example, will give you a new way to think of network programming. Learning Haskell, or functional programming in general, will help you better understand recursion, its advantages, its disadvantages and how to overcome them (not to mention pattern matching, the importance of side effects and the usefulness of first-classs functions). Python or Ruby will help you understand the importance of writing beautiful code (and will become great addictive tools to solve problems very elegantly and fast). Languages from the lisp family will make you question why C or Fortran became so popular back when there where not many other programing languages. Lisp will also make you understand the usefulness of building domain specific languages and the possibilities of expanding the language that many others neglect. I just named a few languages I know, but there is something to learn from every possible languages since each has its own way of solving problems.
Even learning a "worst" programming language will be of great use. It provides constrains that were not there before and you need to think of a way to overcome them.

* Hold maths in high esteem
Edsger Dijkstra once said: "Computer science is no more about computers than astronomy is about telescopes."
Computer science is about computation. Whether we use a Von Neuman machine or not, the way to reason about computation is with mathematics. If you just want to program very simple websites you will probably not need much complex math, but for any slightly more interesting task you should learn (and love if possibly) math or you might end up doing things like solving a problem by iteration when you might just calculate a closed form (resulting in bad efficiency) . This article talks about some interesting problems and how math is used, along with computing, to solve them.

* Learn English
Knowing english is essential for this field of study. If you don't know english (or not enough english) just go and learn it. Seriously, go now!. English is not my first language and I am always trying to improve my english (writting this blog in english is a part of it). I also often see people complaining about having to use english for programming; I even saw once somebody claiming that programming language constructs (like IF-THEN-ELSE, FOR, etc) should be translated into multiple language. Excel does this for its macro system and I consider it plain STUPID. Many professors in my university even _force_ you to code using variable names and code comments in spanish. This is not necessarily wrong. I personally prefer to do all the coding and documentation in english since it gives my code more reach but if I _know_ the code and documentation will only be read by X-speaking people, I believe it is ok to write it in X.
I also consider it is easier to read code that has variable names in english because you could read some code expressions out loud and they would make perfectly well formed english expressions. This is especially true for languages like python or ruby; C, java, Haskell or others with a more obscure syntax probably don't have this advantage.

* Read
A few days back, reading about public speaking, I found this tips:

- Listen to great speakers: Attend as many programs of great speakers as possible. Subject spoken is immaterial here, what you are learning is the “Master's” way of doing it.
- Read about doing presentations: There are now plenty of books on doing effective presentations and Internet has numerous pages on this. Read them.

It seams very obvious that doing this stuff will make you a better public speaker. The analogous of this for coding could be:

- Read code written by great coders
- Read about writing code

I should be very obvious that doing these two will also make you a better programmer, but lets expand a little bit on why (and how to do it).
The first one, read code written by great coders, might seem hard (at least it seems a little harder than listening to a great speaker). You can start by reading small chunks of code, the kind of snippets you find in a book, and move into larger projects as you get the hang of it. You could of course just jump directly to reading sections of a project and then the whole project (and while you're at it, why not contribute a bit!). It is very easy to find great code, written by great coders, to read: just go and download the source of your favorite open source project (some suggestions: django, emacs, git, your favorite linux distro, kde, mootools/jquery, blender; they go from operating systems to 3D tools, there's something for everyone).
The second one is a lot easier to do, and maybe even more educative. There's a lot of material on the internet that aims to teach you how to program well, and then there's books!. Many authors will contradict each other because the "right way" of programming is not the same for everyone. There are many different thoughts on how to do it right but most of them share some basic grounds so you'll have to decide whether what they suggest makes sense or not. It is here where you'll start having your own opinions on programming forged from the thoughts of many.
So, pick your a language or an area of interest (for example: "OCaml" or "Image generation/manipulation" or "artificial intelligence" or, simpler, "sorting algorithms") and go find information on how they're done. Read, read, read, read. And then, if possible, implement.

There is another type of reading that I believe might be useful but with which you should be careful:
- Read bad code
Read bad code and try to understand why it is bad code and what are the "anti-patterns" that are present in that code that makes it suck. If possible try and make it not-suck anymore.

Note: If you're trying to tell somebody that their code sucks: don't be an ass, be constructive and helpful.

* Read blogs
There's a lot of smart people out there. Many of them have blogs and they blog about interesting cool stuff. It doesn't need to be about technology or science. I would avoid things like "the paris hilton blog"; but, besides that, anything that promotes your intellect creativity or recreation is fine. I am not going to tell you which blogs to read (but read mine! ;) ), you'll build your own list as you start finding people that write interesting articles. There are a lot of sites like digg, reddit and swik where you can find tons of articles every day. They are also classified by subject. Just read the ones you like, and if someone writes interesting stuff add their feed to your aggregator. (If you don't know what an aggregator is, just check google reader out)

* Read books
The internet is nice but it sure doesn't have all the answers. There are a few problems with obtaining all of your knowledge from the internet: The information doesn't have an explicit order, you need to decide in which order you learn stuff and doing so can sometimes make it harder to learn; The internet is "Breadth first", finding topics in depth can be hard and, even if the information is there, it is not as comprehensive as it would be in a book; There is no formal review process, though some sites, like wikipedia and some blogs, have a very good process for socially reviewing information, not all of the information gets reviewed or, if it does get reviewed, it might not be updated to reflect the improvements of the review. Books are very good at treating complex (or simple) topics in depth, their narrative is designed to be appealing and to ensure a good learning path and they're reviewed many times by different people to ensure their quality. Not all books are great, but certainly there are a lot that are.
It's simple. Search for books on what interests you: learning a language (or better, theory of programming languages), learning algorithms, learning a specific topic (AI, Operating Systems, Testing, Graphics, Security...), what ever you fancy the most. Read reviews of the books, choose one, buy it, read it. I might come up with a list of book recommendations sometime in the future, but for the time being you're on your own (but with a little help from amazon and the internet).

* Use Open Source
There are a lot of open source tools available out there. You should always try to look them up and see for yourself what they're all about. Sometimes they're better than their closed counterparts, some other times they're not. It pretty much depends on what you're trying to do. I am not telling you to use openoffice here. I am talking about development or deployment tools or frameworks that can help you make a better job. You need a web server? why not check apache out? you need a database management system? there's more than oracle out there (try postgres or mysql). Need a mapreduce framework, let's try out hadoop and see how it goes. compilers? gcc, gcj, etc. There are also great open source programs you can use as a final user, Amarok, Inkscape, GIMP, blender (really check this out, blender is amazing!)..., but it is more importante that you know which tools are available to make you do a better job.
As I said before, the opensource programs many times are better than the closed source ones, but this is not always the case. In such cases where there are good closed alternatives it is important that you know in which ways the open source options are better and if they are suitable for the task you wish to accomplish. In the near future there is probably going to be an open source alternative to every closed tool out there and, if things continue to be as they are now, this tools will be better than the closed alternatives (or at least _very_ competitive).

* Do Open Source
I had never worked on open source until I participated in Google's summer of code project earlier this year. I had the opportunity to work with the Django community which is just great. Lately I haven't had the time to continue working on opensource projects but I definitely wish to continue doing so as soon as possible.
Doing opensource teaches you many things. Technologically, it gives you the possibility to work with many great programmers and learn a lot from them, by reading their code, by having them review your code, by finding and fixing their errors, by discussing the best approaches to solving a problem. Humanly, it allows you to work with a fairly large number of people scattered around the globe with different mindsets, different cultures, different languages, different interests; having to work with so many different people encourages you to take your work seriously and have a good work and communication methodology.
Open source development requires dedication, but it is very well rewarded when you know you've done something good and other people are actually using and improving your code. It is also rewarded with the amount of knowledge and expertise that you earn. You could practice programming by doing toy programs at home. But you could also practice programing by contributing to real world programs and state of the art technologies. It is better for both you and the world.

* Learn Unix
("unix" in this section refers to any *nix, especially linux)
Unix is great. Once you learn it you probably won't be able to live without it. It is pretty much like Firefox for an IE user. Most IE users will tell you it has everything they need. They are just unable to imagine anything better. They might even resist using it because they don't see the advantages and they just don't want to change their browser. You, then, convince them to change to FF just for a month or two. Some time later they will probably come thanking you for showing them FF and rant about how they could never use IE anymore. It is one of those things you don't actually get until you've taste them.
Unix is probably the same for most people: "the thing geeks use that seems to be pretty much the same but looks kind of odd". Just taste it for a while...
But I am not saying you should use Unix on a daily basis. I'm not even saying you should ever use it as *your* operating system. I am just saying you should "know" it. Why? Many reasons: First, the shallow reasons, it looks nice on your resume; Second you'll probably need to use it some time during your professional life and you don't want to look stupid asking how to read a file; Third, It will help you solve problems really easily ("get me the number of html files that contain the word X", done: "find -iname "*html" -exec grep X {} + | wc -l"); Forth, just learn it! ok?. This article is not for preaching about unix, there are enough of those on the net, this is for recommending you to learn how to use it.
If you can also learn how the internals work I also recommend you do so. It is not hard. Just set yourself the goal of learning a different part of the OS every week (month?). Ask yourself a question like: What (exactly) happens when I turn the computer on, from pushing the button to browsing the web? (this question might take a while but it is very educative, you could consider breaking it up into pieces) what are services? daemons? what are kernel modules? how do I compile a kernel? how do I load "drivers"? how does authentication work in my system? how do I make packages for my distro? what (exactly) happens when I install a package? how does the networking work? and, of course, what cool tricks can I do with my OS?
These questions should strike you quite interesting if you don't know the answer and they will probably provide you with tons of new things to learn. You could go nd install some distro that doesn't try to do everything for you (ubuntu). This will also give you more freedom to do anything with your system. I am currently using Arch and loving it (except for a few issues that I'll blog about later). Before, I was a strong Gentoo user (and I still have a place for gentoo in my heart, I'm just tired of compiling everything for my personal computer).

* Use Unix
I lied. I *am* going to tell you to use it on a daily basis. If you really want to learn it, what better way to practice than by using it every day as your primary OS? And, as I said, once you get the hang of it you won't be able to understand why would somebody use anything else.

* Text Editors
This is probably the most important point in the series. Mastering a good text editor will make you tens of times more productive, so learn a good text editor and learn it well.
There are a handful of text editors out there. I use emacs, and consider it to be the best, but if you use vi/vim it is probably ok. If you own a mac, I've heard that TextMate is also very good.
Now, it is not enough just to learn how to get in and out of your editor and maybe some copy/paste shortcuts. Remember that the idea is to make you more productive. These text editors have an very large number of cool features that will never cease to amaze you, so learn it well. It will not only make you more productive but will give you bragging rights: "you built a program to convert that huge data file into a different format (re-ordering columns, removing irrelevant information, substituting expressions, etc)? I can do that with emacs in less than a minute". And it will be true!
Some (inevitably incomplete list of) cool features in emacs that will help you very often (at least they help me almost every day): cut and paste a rectangle, move a line up or down, highlight a line or a column, search and replace regular expressions, define macros, repeat an action several times, split the screen, open a shell and copy/paste from/to it, match parens, move to a specific line, display the name of the function you're in, display the line and column number, display code syntax errors, swap two words/characters, move by word... These are just examples. Start learning new tricks every day and put them to practice.
Some editors also provide the possibility of extending it. Use an editor that does so. By doing this you can change any behaviour you want and adapt the editor to your way of working. You could, then, share it with the world.
So, is a good text editor better than an IDE? For me: yes. Some text editors even behave like IDEs for some languages. But even if they didn't, I believe that having good editing capabilities makes you a lot more productive than giving you language hints and compile/debug buttons. If you still prefer the IDEs sometimes, you can always use them for some specific programming languages or tasks. Eventually, you'll find some simple but tedious task that your IDE just can't perform and go back to emacs (sorry, your text editor of choice). Even if this never happens, have some text editor knowledge handy just in case you would need it (while on ssh, for instance).

* Get trendy
Read about the latest trends, if possible try them out and see what they're all about. Trends are not necessarily good, but they're getting a lot of attention for a reason, and many people will know about these trends and love/hate them. You should decide for yourself whether they are good or bad and why. I personally don't like the whole J2EE and Design Patterns buzzwords. Instead I'd go for TDD, functional-style programming and dynamic typing. I am not saying these are all the latest trends or that you should follow them, it is just an example of my likes and dislikes. I am not saying either that you should abandon your path and go chasing buzzwords as they appear. I like dynamic typing, but I still understand the advantages of static types and won't abandon C/C++ style of programming. I believe J2EE is bullshit, but I don't consider Java should be completely knocked out of the way. I, also, in no way believe design patterns are useless, I just think they get way to much more attention than they should. Staying on top of the trends helps you absorb the good stuff about each one of them and understand their limitations. You will be able to tell people what paradigms you love or hate and why they are good or bad. You can also understand other programmers points of view and consider in which cases different approaches should be used.
How do you stay on top of the trends? Read blogs.

* Use Version Control
I am amazed at how many people, with a computer science or computer engineering degree, don't know or use version control systems. There are many advantages of using version control: recovering previous work, seamlessly integrating your work with other's, undoing undesired changes, keeping different "copies" (branches) of your project for experimental development, viewing the history of changes for your project along with who did each change, and many others you'll find when you start using version control on your projects.
I am currently using git as version control system and loving it. If you're using another version control system I highly recommend you give git a try (check this out for reasons) . If you are new to version control many people would recommend you to go for something like mercurial or bazaar (because they are apparently simpler, I haven't use them) but If you are fierce I would also tell you to check git out. It is not _that_ hard and there are a lot of resources to learn it.

* Test
This is not a must, but it would be good if you got used to testing your software. It will help you build better programs and know many testing tools in the future.
If you use Java the most common testing tools is JUnit; for C++ I recommend googletest and cxxtest, but for a more detailed comparison you can check this article; for Python I'd recommend py.test, or the standard python testing modules: doctests and unittests.
Testing might be a little time consuming, but it will certainly make you think about your software design and create better code.

* Try and make your courses better
This is a short one. Think about what you would like to learn in every course and compare it with what you'll actually learn in it. If there are things missing try and learn them yourself. If you think of a better way to work for a course (specially in software development courses) propose it or try to do it by yourself.

* A note on grades
Grades don't necessarily represent how much you know a given subject. Instead they represent how much dedication have you put to the course and to achieving the grade. Personally, I have always taken learning to be more important than achieving a grade. In courses where you choose a project to develop you could choose a simple project that will be easier to complete but will provide little to no knowledge (for example choosing a problem that is isomorphic to the class examples, or just using a library to solve the problems instead of implementing the algorithms), or you could choose a more complex project that will provide more learning but that is, also, more error prone (besides you might not be able to finish it in time). My recommendation is to always take learning as your primary objective and leave the grades to be a consequence of your good learning, but take this one with a grain of salt and don't go blaming me for your bad grades (though I believe if you follow this advice you'll probably thank me instead for the amount of learning you got for not fearing the grades)


Wednesday 17 December 2008

If programming Languages were religions

Very funny/interesting article =)


Tuesday 16 December 2008

Blogging

I finally decided to start a blog!. It had actually been decided for quite a while, but I hadn't had time to start writing. That, and I was waiting to build my own (perfect) blog website. Anyway, I gave up and decided give blogger a try at least for the time being.

why blogging?
Well, there are many reasons. In the first place there is an awful lot of (smart) people telling you "
you should write blogs" (the link is just one example, check google for more) and their reasons seem to quite convincing. But thinking about it, I didn't need them to tell me to write a blog. There's a much more compeling reason to do so: iThink.
No, I'ts not a new apple product. I just tend to think (and imagine) stuff all the time (seriously, *all* the time). But the problem with being thinking all the time is that thought is ephemeral. I've always wanted to carry around one of these old style recording devices that journalists used to carry with them and use it to record the interesting ideas I had. But even if I did, I wouldn't get much value of it cuz the recorded ideas have two main problems:
First, they are messy. Even if I could have them organized it would be just endless rants that wont make much sense (I'm not claiming my blog wont be like that, it probably will =p ).
And second, they would only reach me. I am not expecting to have much readers, none actually. But who knows.
I could also do a vlog (video log) using something like seesmic but, seriously, who wants to spend their time looking at somebody's face while he talks?. Also to make a good video I'd have to write it first. I wouldn't be able to fix my mistakes and it would be hard to skip to a specific part of the post.
Writting, on the other hand, has many advantages. Sure, it's not as dynamic as talking and it's a lot slower, but it helps you (forces you to) organize your ideas to create a (semi) readable text. It is also persistent and can evolve over time. I would love it if blogs could have version control almost as much as I would love it if textareas could have emacs bindings (these are some of the things that would exist in my perfect blogging website), but for now I'm happy with the ability to edit the things I wrote in the past.

But the most important advantage of writing is, for me, that it helps me get things done (tm). One of the problems of this "iThink" property of mine that I mentioned earlier is that I think of so many different stuff every day, and add so many things to my in-brain to-do list, that I hardly ever have time to do any of them. So this blog is for me a way of organizing what I think and forcing myself to actually doing or expressing it. Think of it as a place for my finished projects, essays or ideas I'd like to get feedback on (plus any random stuff I come up with).

So, What will I write about?
Pretty much anything. I believe I'll mostly write about computer science related stuff, but you could expect anything from sorting algorithms trough compiler technics trough movies to politics.

Why in english?
English is a nice language. A great part of the world can speak it, or at least understand it to some extent. Also I could use some english writting skills. I might do a post in spanish every so often, but don't expect it to be the rule.

Why /random/thoughts ?
I know, it is probably the most un-original blog name out there. Google shows 28,100 pages with the phrase "random thoughts" and the world blog in the title. But as it turns out it is not that easy to pick a blog name. I thought of a few different names but this blog is basically about me writting down random thoughts. Also it has just the right amount of geekyness; it is easy enough for anyone to read and slightly interesting for those *nix minded. Plus, "cat /dev/c3po > /mnt/blog" was already taken. Just kidding, I'm not actually *that* geeky =)

Oficial Comment:
After writting my first post I've decided that blogger's wysiwyg editor sucks-a-lot. From now on I'm posting from Emacs.