Category Archives: Programming

Ruby in scientific computing

I had a dream, which was not all a dream.
The bright sun was extinguish’d, and the stars
Did wander darkling in the eternal space,
Rayless, and pathless, and the icy earth
Swung blind and blackening in the moonless air;
Darkness, George Gordon Byron

My dream wasn’t scary as the one Lord Byron had in the 19th century; I simply imagined Ruby being more used in scientific computations.

SciRuby was accepted as a mentoring organization for Google Summer of Code 2013. This is an opportunity for Rubyists all over the world to see that there are big guys interested in this subject — remember the grant from the Ruby Association last year.

Our mailing list and IRC channel has been receiving lots of attention from people with project ideas and other suggestions, which is great! I’ll wait some more time to write about what I envision for the future, but for now I want to talk about what I’d like to work on for the next couple of months.

Sciruby::Dataframe

Many Ruby gems have ad-hoc implementations of the data frame concept as there is in the R language. Some examples are:

This situation is obviously inefficient. This was already discussed in SciRuby’s mailing list: we need to create a library to be used in data-heavy projects with NMatrix at its core. Pandas is a great Python example of what I want to build.

One of the GSoC projects is based on designing and implementing this, but, unfortunately, no one demonstrated interest in it yet. As it’s very important (imo), I’ll probably start it anyway.

NMatrix

There are various points that need improvement in NMatrix. Documentation, rational operations, better algorithms for non-BLAS dtypes, some bugfixes, an easier installation procedure, &c.

I did a lot of work on documentation during my fellowship and some rational operations (determinants and matrix inversion) are partly working. There are some students already asking me about it, so I expect to see lots of progress on it during GSoC.

General documentation and user guides

In my opinion, the nastiest problem in the Ruby community is the idea that “code is documentation”. This is pure bullshit. Thanks to the language’s elegance, some developers say that “you should read the code” or simply write a wiki page showing how to get started.

Of course, if you’re working on maintaing a library, it makes sense to say that code is documentation (to some extent). Not so if you’re a user pulling his hair out trying to understand why something obvious is failing.

Thus, I’ll continue to improve NMatrix documentation. Two of my goals are to create a good RDoc template for SciRuby in general, probably based off on Rails’, and “SciRuby Guides”, inspired by RailsGuides.

Remember: code isn’t documentation.

By the way, if you’re interested, check SciRuby’s project ideas page or #sciruby at freenode. We need mentors for GSoC, but documentation, filling tickets or writing user stories, any help is appreciated.

Let’s hope that my dream was in fact a premonition.

Creating a new NMatrix

As I’ve been working with NMatrix documentation lately, I thought that a good explanation of how to create a new NMatrix object and all the options would be useful.

Let’s start with the simplest thing possible: to create a NMatrix from an array of values, without any options:

Continue reading

The Vector Field Histogram algorithm

I’m taking a course on real-time systems at my college. In one of the projects, we have to build a system that uses “concurrent programming” design approaches, with various threads acting separately and communicating through queues.

For this, I remembered some papers I’ve read in Brazil about path planning algorithms and one of them – the Vector Field Histogram (VFH) – sounded sufficiently simple to be implemented in two weeks. Of course, I had to implement the whole system: communication with sensors, some kind of output (in my case, via UART), the interthread communication mechanism, etc.

Continue reading

What is yield self in Ruby

A question that arises almost every week on #ruby-lang is this:

I know how to create code blocks for my methods, but what is this yield self?

After trying to explain it every time, I started to use this gist. And now I’m writing a blog post about it. Yeah, it’s that recurrent.

I’m going to assume you know what a code block is – a piece of code that receives an |argument| as parameter and does some computation, like in Array#each or String#each_line. But iterators like these aren’t the only use for them.

In my post about IO in Ruby, I showed how File::open accepts a code block:

As you can see, the file variable is in fact the opened object inside the block. After calling the code block, file.close is called before passing control back to your code, avoiding “Oh, I forgot to close this file” kind of errors. (the operating system won’t let you open more files if you don’t close the ones you already used)

Let’s create the Pokemon class:

As you can see, inside the code block you can call the method use_move on the argument passed to it. This means that when you call yield self inside your method, you’re passing the object as a parameter.

It’s a parameter like any other object. And why is this useful? Well, house-keeping like calling a method after the block, but before control is returned to the program (logging is a nice example) and configuration, like in ActiveRecord.

In this migration, look at the create_table method, receiving a code block in which t is a parameter – it’s the table being created. And all the methods called on it – string, text, timestamps – are used to generate columns.

I hope that this blog post helps people understand what yield self means. Truly.

Further reading

How to write a method that uses code blocks

How Jekyll works

Last saturday, I decided to build a Jekyll plugin to get the quotes from my Tumblr account (through a quotes.json file conveniently saved using a Rake task, inspired by [1]) and generate a “quotes index”.

So… I needed a generator. (d’oh)

There’s an example in the Github wiki:

After some hacking, I started to think it wasn’t as easy as it appeared. Ok, coding while being sleep-deprived isn’t very smart.

I started again after some hours of sleep and sufficient caffeine, but this time desiring to understand how Jekyll works. I imagined some kind of data structure holding all the generators, some other holding the converters and they were called during the “compilation” of the site. But something was missing: when are the files read from the filesystem? How to generate a file without an initial template, effectively creating the file during “compile-time”?

I started with the Jekyll executable. It’s pretty simple: the command-line parameters, the defaults and the _config.yml (through Jekyll::configuration method) are used to create an options hash and then a new site is instantiated:

After that, it starts to watch the necessary directories if the --auto option was used.

The site is built through a call to site.process, the main method in the Jekyll::Site class. Finally, it runs the local server if --server was specified.

I was happy, but didn’t yet understand how my blog was created. Well, the site.process call was the obvious answer, so I opened lib/jekyll/site.rb on TextMate:

And it turned out to be very similar to my initial guess. For sake of completeness: site.reset and site.setup are called during initialization of the site to initialize its data structures and to load libraries, plugins, generators and converters, respectively. Ok, let’s see what each of these methods do:

  • reset: initialize the layouts, categories and tags hashes and the posts, pages and static_files arrays.
  • read: get site data from the filesystem and store it in internal data structures.
  • generate: call each of the generators’ generate method.
  • render: call the render method for each post and page.
  • cleanup: All pages, posts and static_files are stored in a Set and everything else (unused files, empty directories) is deleted.
  • write: call the write method of each post, page and static_file, copying them to the destination folder.

So if I wanted to create a generator, I just needed a generate method. Pretty easy. But there were some doubts still: How to specify a layout for my page through code, without creating a previous file on the filesystem?

The Jekyll::Site.read method stores the contents from the filesystem in internal data structures. This is the stage where the files in _layouts, _posts and possibly other directories are read and their respective objects (Layout, Post, Page and StaticFile) are stored inside arrays.

So, my page should be created dynamically by my generator. Nice. I did it, but after some tests, the quotes/ directory wasn’t being created.

The catch is the Jekyll::Site.cleanup method: it’ll remove from the destination everything that isn’t inside the internal data structures, including empty directories. So the solution was easy:

Just add the page to the site.pages array before returning. To understand what I’m saying, take a look at the plugin (after all this code archeology, it’s amazing how simple it turned out to be). The current code for the generator can be found in this gist.

I wrote a How Jekyll works? page for the wiki in Jekyll’s repository to help others interested in this.

Sources

[1] Generating Jekyll pages from data
[2] Jekyll git repository

Reading and writing files with Ruby

I use Ruby to automate a good amount of daily tasks. They all involve manipulating files in some way: writing logs, creating testbenches for VHDL and some exercises from Code Jam, to cite a few.

In this post, I’m going to describe how to do basic file I/O in Ruby, for it is useful and actually necessary even for basic scripting tasks. Let’s start.

The File class

The File class is used to access the filesystem, e.g. create a file, write to it, read it. It inherits some cool methods from the IO class, like #readlines.

What you’ll probably use the most is File#open, which accepts a parameter to specify the operation and a block. One very cool thing is that if you use a block to handle the file, it is closed automatically. This doesn’t happen in the normal invocation, so if you use:

You need to close the file later:

While using a block is way easier:

There are lots of useful methods in this class. You have a command line tool and want to create a symlink in the /usr/bin folder? Use File::symlink. Need the path of some file? File::realpath. Also the extension? File::extname.

I’ve used these methods, but there’re much, much more. Read the documentation for details.

How to read from a file

Code Jam is a coding competition hosted by Google to test the abilities of the participants in questions about algorithms (and how to implement them). There are some tricky problems, but this isn’t the reason I’m talking about it.

Suppose you’re using their problems to improve your programming skills and you decided to be a hipster who uses Ruby. Ok, the first thing you need to be able to do is to read the .in files they send you, like this one:

This is from the problem Minimum Scalar Product. The first line is the number of test cases and, after that, each one is composed of 3 lines: the dimensions of the vectors and their respective elements. How could we get all this info?

The solution is simple if you use Ruby’s APIs correctly.

Open the file, create an array of lines with File#readlines (actually IO#readlines), then iterate over the tests, 3 lines at a time using Array#each_slice. The apparently difficult method chain in

is simply a way to get an array of number from the string. First remove (chomp) the newline character, split them into an array and convert each string to a number.

The second parameter to File::open is a mode used to specify what you want to do with the opened file. All the available modes can be found at the Ruby IO class documentation. As you might guess, 'r' means we’re opening the file to read it.

How to write to a file

One of my ideas when I started this blog was of using Jekyll, a Ruby program to generate static sites. No more wrestling with databases and super easy deployment sounded good enough.

Time passes by, and when I did the last redesign and wrote some posts, using mkdir 2012-month-day-title.md to create a new post turned out to be boring and very error-prone. To solve this, I made a simple script that resides in blog/bin/newpost and use it like this:

It’s very simple and straightforward, without most of the Cool Stuff that’d be nice in a “true” command-line program:

The complete script is available as a gist. I want to improve it to handle tags as command-line arguments, so I’ll probably write a post about the OptionParser class someday.

The File.open is creating a file in the format specified, with the name given as a command-line option (using ARGV[0]). The 'w+' identifier is a mode, like 'r'.

Further reading

Atenção ao usar Array.prototype.reduce

Estava trabalhando na avaliadora de times do Mojambo (typeCalc) e comecei a usar bastante map(), reduce() e afins para simplificar o código.

Depois de uns testes, cheguei a um problema que levei uns minutos para entender e corrigir, mas que me parece bastante fácil de cometer. Primeiro, a sintaxe do método Array.prototype.reduce é:

Tem mais dois parâmetros no callback (índice do elemento e o array sendo percorrido), mas não são necessários agora. O valor de acumulador é, inicialmente, initialValue. A ideia é utilizar reduce() para calcular um valor a partir de todos os elementos do array.

O que eu estava fazendo envolvia um callback assim:

O problema é que o valor do acumulador é o valor retornado pela última iteração, então ele inicia em initialValue, processa o primeiro elemento e se torna undefined. A solução é simplória:

Fiquei tão bravo por ter deixado isso passar que julguei necessário colocar um post na Internet a respeito. Espero ajudar alguém.

Aprendendo a usar o asset pipeline no Rails 3.1+

Depois de ter tido alguns problemas com a assets pipeline, tomei vergonha na cara e li o guia no RailsGuides. Agora que entendi melhor como ele funciona, posso falar um pouco mais a respeito.

Meu primeiro problema com ela foi “Mas para que diabos isso serve se eu não uso CoffeeScript ou SASS?” Depois de um tempo descobri que isso não é verdade. A asset pipeline foi feita para pegar arquivos CSS, JS e imagens, organizá-los e manter esses arquivos em seus respectivos níveis de abstração. O criador do Rails, David Heinemeier Hansson (DHH), falou a respeito disso tudo (e mais) na RailsConf 2011:

Há três lugares básicos para deixar os assets: app/assets, lib/assets e vendor/assets. Isso nos permite separar código criado por nós e por outros de forma natural – criamos um “pushback” que nos mantém organizados, como o DHH falou em sua apresentação. Dessa forma, é possível saber onde colocar código sem precisar pensar muito ou largar tudo em uma pasta só.

Frameworks

Outro fato legal é que torna-se possível trocar a biblioteca JavaScript padrão do Rails mudando apenas uma linha do Gemfile. Por padrão, temos a seguinte linha:

Caso queira usar a Prototype, apenas modifique para isso:

E podemos colocar vários frameworks assim. Por exemplo, o Bootstrap:

A gem pode ser baixada aqui, onde também tem instruções de uso no README.

Caching

Uma última característica que eu achei interessante é relacionada com o caching dos arquivos application.js e application.css. Agora, após a precompilação (rake assets:precompile), uma hash MD5 é adicionada ao nome do arquivo, o que impede problemas com arquivos sendo mudados mas tendo versões antigas no cache. Um exemplo:

Caso queira entender melhor, a sessão no guia da RailsGuides é boa. Leia mais aqui.

Acredito que é só. Escrevi este post com o intuito de organizar tudo isso na minha cabeça e, quem sabe, ajudar alguém. Até mais.