Saturday, November 2, 2013

Travel in the country of Programming

(In French : Voyage au pays des programmeurs)

I have a bad habit: when I do something, half of my attention is spend in watching the action in progress. This earned me a bad ranking on these tests which evaluates the intellect as the timeliness of responses.

Enjoying a little free time in August, I went back to programming and it allowed me to do a few observations on myself. It is good to remember the episodes where one encounters obstacles (see mon apprentissage de LaTeX): it helps to avoid making the same mistake repeatedly.

Experienced readers will find me ridiculous because I'm neither a professional programmer nor even a good programmer, but I don't mind. Those who believe that programming is an ancillary activity will stop reading me if they never read me, but I don't mind.

*     *

I confess: I love programming. I am a clumsy, inexperienced freshman, but this experience gives me the joy of exploration. When it works, I am delighted to have been able to bend the computer to do what I wanted him to do - it's much more satisfying than using a program written by someone else.

So I decided to program the methods of data analysis that I taught at ENSAE during the 70s. At the time I did not know programming, so I used the programs written by others and it annoyed me. I had to take revenge on this ignorance.

I know that there are excellent software for data analysis and that what I would do would bring nothing new, but my goal was not to launch a new product on the market.

Here, highly condensed, the story of this adventure.

*     *

I need algorithms to invert a matrix, extract its eigenvalues and eigenvectors, project onto a plane the image of a cloud of points immersed in a space of a few tens of dimensions, find and aggregate the points which are the closest at an ad hoc "distance" etc.

These are math problems. I have lost the virtuosity of my twenties and would be a poor candidate for the exams, but no problem can withstand a night of sleep: solutions arrive in the morning, I just have to wait for them.

I begin by correspondence analysis. I hastily program algorithms, fearing that they do not work. Miraculously, it works : I know how to compute the first eigenvector of a matrix of inertia.

But I need the second eigenvector, the third, the fourth, I need even the general formula that provides the n-th. I enrich the original algorithm, whose length begins to look serious. I complete the calculation of the "factors", as they say, and that of the "interpretive aids" (those who know these things see what I mean).

At this stage the mathematical problem is solved. I am pleased with myself, because my knowledge of the properties of factor analysis helps me to conceive formulas that would be a headache for a non-mathematician programmer. But I have not finished, far from it.

*     *

I edit the factors in the first four axes. For the first axis, it goes fast. The second is slower. For the third, the computer stalled. For the fourth, several very long seconds pass between two successive displays of numbers - and there are a lot of numbers to display!

It appears that solving the math problem is not enough because the math ignore duration: the world of thought is expressed in purely logical terms, an exact formula is exact for all eternity. But it is not the same with the processor because its performance is subject to physical constraints. Then it is not enough to be a mathematician if one wants to write an effective program: one should be also a physicist, be able to leave the world of pure thought and comply with the constraints of the natural world.

My algorithm, I realize, calculates many times the same thing: to extract the n-th eigenvector it recalculates the first, the second, the third etc.. to the
(n – 1)-th. To calculate the first n eigenvectors, it must actually compute n (n + 1) / 2 of them... Horror: my algorithm is of order n2! After a night's sleep I see how to reduce that to the order n: I have simply to remove at each step from the inertia matrix the eigenvector I just calculated.

The algorithm is now much shorter, I deleted a lot of rows. I thought to be a real programmer able to write an impressive code, and now my code is reduced to something very simple. I console myself in thinking that this simplicity is elegance. But wouldn't I have reach this result at first, if only I had been smart?

*     *

It is not enough to have the results on the screen, in the lower half of the interface of Racket: I must also edit the graphic which will make visible the projection of the cloud of points on the plane that forms a pair of eigenvectors (also called "factorial axes").

Now begins a haggard search in the documentation. If one can write algorithms using logical, understandable vocabulary and syntax, it happens that interfaces, interaction with screen and keyboard and relations between various modules obey a hodgepodge of conventions that pure logic will never guess and that must be learned by heart.

To master the tricks and conventions of a programming language into his inner guts, one has to program full time. Whoever programs like me a few weeks a year is irretrievably a freshman and documentation considers him accordingly: it is made for professionals, busy and learned people who know how to read shorthand conventional notations.

I am unable to understand how I could extract, surfing so many Web pages, a piece of code in which I had to introduce parameters whose format I had to look for, and that I submitted to a set of tests until I got a suitable graphic. I believe that it took me half a day but I'm afraid my memory is unfaithful: this passage in the dark lasted probably two or three days...

*     *

At this point, the correspondence analysis is programmed. Then I see with horror that my haste played again a trick on me.

The various methods of factor analysis - correspondence analysis, principal component analysis, discriminant analysis (and also spherical analysis, my little invention) - all use the same algorithms, only changing the way the cloud of points is defined. If I had thought a little, I would have first set a general factor analysis, and then call it in small individual programs specific for each of these methods.

I am discouraged. I will have to redo everything in depth, because in the program of correspondence analysis I inextricably intertwined the notations for this method with the more general ones, which would be suitable for all methods ...

*     *

Laurent Bloch, who is not a freshman, made on my program a profound observation. To link the program to the file that contains the data I used the notation suitable for clumping two programs. So said Laurent, I entered data into the program and this is not clean: I will have to master the syntax of "imports" and "exports".

Besides, I have used only the interpreter. I will have to compile my program to produce an executable, and before that it will have to introduce the user interface: asking him the address of the data file he wants to "import" for analysis, the method he wants to use, the number of factorial axes to extract, the nature of "interpretive aids" he desires, what do I know! And I will also have to publish the results in a readable form: a pdf file, for example, through a layout in LaTeX. Then I must write a documentation, even if I am the only user, because I need a reminder.

Laurent told me: the algorithmic part of the program, which is the only one who gives pleasure to a mathematician (and the only one on which textbooks abound), takes a few days; programming GUIs and dialogue with the user requests weeks or months of an ungrateful work for something that will look natural and easy...

*     *

I understand why many programmers prefer to complicate the life of the user! First it is easier because the development of a convenient interface is a heavy job. Then isn't it immoral that the user does have only to settle in his chair to see everything happen "naturally", and that the programmer have to bend under the yoke of notations and conventions? Is it not necessary for the user to drool too?

Once the strength of this temptation is measured, we admire programmers that provide convenient programs. They are heroes! But only those who have written some programs can perceive this heroism.

That is why it is important that education does not focus solely on the use of computers: for a great artists to hatch, it takes a knowledgeable audience. We can not have good programs if we do not know enough about programming to appreciate the beautiful work.

No comments:

Post a Comment