My talk @ EuroPython 2016

I had the great ex­pe­ri­ence of pre­sent­ing at Eu­roPy­thon 2016. My talk en­ti­tled “Clean code in Python”, was about good de­vel­op­ment prac­tices, down to low-lev­el de­sign (with code ex­am­ples), for Python. The idea of the talk, was to present the “python­ic” ap­proach for writ­ing code, and how do gen­er­al con­cepts of ­clean code ap­ply to Python.

These ex­am­ples might be use­ful for be­gin­ner­s, de­vel­op­ers ex­pe­ri­enced in oth­er lan­guages com­ing to Python, and peo­ple us­ing Python for sci­en­tif­ic ap­pli­ca­tion­s. The ex­am­ples could al­so be help­ful for se­nior de­vel­op­er­s, be­cause they re­mind re­al sit­u­a­tions that might ap­pear in pull re­quest­s, while do­ing a code re­view.

Here is the video on YouTube:

And the slides (which I al­so made avail­able along with the source code, short­ly af­ter the pre­sen­ta­tion fin­ished).

The pre­sen­ta­tion was well re­ceived: some at­ten­dees ­told me they liked it (even asked for the slides and code), and I got good ad­vices. The fol­low­ing days of the con­fer­ence, more peo­ple told me that they liked the pre­sen­ta­tion, and some oth­er­s ­men­tioned (some­thing I did not think at the be­gin­ning, but that it makes per­fect sense), that these ideas are re­al­ly use­ful for peo­ple us­ing Python in sci­en­tif­ic en­vi­ron­ments.

I am glad it was use­ful for the com­mu­ni­ty.

EuroPython 2016 remarks

Last week, Eu­roPy­thon 2016 fin­ished, and it was an amaz­ing con­fer­ence I had the plea­sure to at­tend. Here is my re­view of those days.

The conference

I ar­rived on Sat­ur­day noon at Bil­bao, Spain, the day be­fore the con­fer­ence, so I had some time to know the city, see the venues, etc. The next day, on Sun­day, was for two sep­a­rate work­shop­s: Djan­go girls and Be­gin­ner’s day. I at­tend­ed the be­gin­ner’s day as a coach, and helped a group of in­ter­me­di­ate de­vel­op­ers with­ ­sev­er­al ex­er­cis­es aimed at ex­plain­ing some Python con­cept­s, such as: con­text man­ager­s, dec­o­ra­tors, ­mag­ic meth­od­s, gen­er­a­tors, etc. It was re­al­ly cu­ri­ous that some of these top­ics were those I was go­ing to cov­er on my talk on Wednes­day, so I felt re­al­ly glad about that. I took an oath (very fun­ny BTW) for be­com­ing a be­gin­ner’s men­tor, and so I did (it was re­al­ly good ac­tu­al­ly). I had a great time help­ing oth­er de­vel­op­er­s, ex­chang­ing ideas and ex­pe­ri­ences dur­ing lunch, ­solv­ing prob­lem­s, and get­ting a first glimpse on what the con­fer­ence was go­ing to be like.

../ep2016-beginners-mentor-oath.png

The mo­ment of the oath for be­com­ing a men­tor, and earn­ing the badge.

Af­ter the work­shop fin­ished, I walked to the main venue, and gave a hand pack­ing the bags of the con­fer­ence. Af­ter that, time to see around Bil­bao.

From Mon­day to Fri­day was the con­fer­ence it­self, with all the talk­s, and train­ings.

Mon­day start­ed with the in­tro­duc­tion to the con­fer­ence, and short­ly there­after, the very first key­note by Rachel Willmer, who gave a great pre­sen­ta­tion, shar­ing a lot of ex­pe­ri­ence, and in­ter­est­ing ideas.

At around noon there was a key­note by N. Toller­vey about Mi­croPy­thon. The pre­sen­ta­tion was ex­cel­lent (one of the ones I liked the most), and the idea of the project is awe­some. On top of that, it was an­nounced that the BBC was ­giv­ing away mi­cro:bits for the at­ten­dees of the con­fer­ence, so it was a great sur­prise to pick up mine at the con­fer­ence desk. I even start­ed play­ing around a bit with it (more in a fu­ture post).

The rest of the af­ter­noon, I at­tend­ed sev­er­al talk­s. At the end, there were, of course the light­ning talk­s, which were amaz­ing.

Tues­day start­ed with the key­note by P. Hilde­bran­t, pre­sent­ing how Dis­ney us­es sev­er­al tech­nolo­gies, in­clud­ing Python, as sup­port for movies and pro­duc­tion­s. It was very good and en­light­en­ing to see an en­deav­our of such ex­tent with Python. After that, dur­ing morn­ing I at­tend­ed a work­shop about Async web de­vel­op­men­t, with sev­er­al Python tech­nolo­gies ­for do­ing asyn­chro­nous com­pu­ta­tion.

Dur­ing the af­ter­noon, I watched sev­er­al great talk­s, in­clud­ing “Pro­tect you users with Cir­cuit Break­er­s”, and ­sev­er­al oth­er good ones, clos­ing with the light­ning talk­s.

Wednes­day was the day of my talk, so I at­tend­ed some talks dur­ing morn­ing and then, at the af­ter­noon, I pre­sent­ed mine. I re­al­ly liked how it de­vel­ope­d. More­over, it was re­al­ly good to re­ceive good feed­back from some at­ten­dees, say­ing they ­liked it, and that it was use­ful for them. Short­ly there­after, I pub­lished the slides and the source code.

On Thurs­day, there were some talks about async/await and asyn­chro­nous pro­gram­ming in Python 3, mock­s, and high­-avail­abil­i­ty ar­chi­tec­ture.

On Fri­day, the key­note was about how Python is used by the sci­en­tif­ic com­mu­ni­ty. It was very en­light­en­ing, and in­ter­est­ing ­to see an­oth­er use case of Python, and how is be­com­ing the main tech­nol­o­gy on this area.

The talks dur­ing morn­ing in this case, were di­vid­ed among sev­er­al top­ic­s, be­ing the main ones: in­stru­men­ta­tion for per­for­mance ­met­ric­s, “How to mi­grate form Post­greSQL to HD­F5 and live hap­pi­ly ev­er af­ter”, “S­plit Up! Fight­ing the mono­lith”. Dur­ing the af­ter­noon, I joined a work­shop about Dock­er, on which we built an ap­pli­ca­tion us­ing Dock­er-­com­bine, and fol­lowed good prac­tices.

It is worth men­tion­ing, that on Fri­day there was an spe­cial edi­tion for light­ning talk­s, which was not in the orig­i­nal sched­ule. Af­ter ­mak­ing some ar­range­ments, and due to some on-the-fly changes, it was pos­si­ble to have an­oth­er ses­sion for light­ning talk­s, right be­fore the sprints ori­en­ta­tion and the clos­ing ses­sion.

Sat­ur­day and Sun­day were for sprints (hackathon­s). On Sat­ur­day I joined to sprint on aio­http, and ac­tu­al­ly ­sub­mit­ted a pull re­quest, that was merged, where­as on Sun­day I want­ed to check on a pytest is­sue.

My talk

It was great to have the op­por­tu­ni­ty to present at Eu­roPy­thon. What was even bet­ter, was the pos­i­tive feed­back I got from oth­er at­ten­dees, and the fact that it was use­ful and in­ter­est­ing for them (which was, in the end, what I cared most about). I found the ex­pe­ri­ence very ­pos­i­tive.

From the com­ments, I gath­ered some­thing I have not no­ticed when I first en­vi­sioned the talk, which is how use­ful these con­cepts might be for peo­ple us­ing Python for sci­en­tif­ic ap­pli­ca­tion­s. It seem­s, sci­en­tists us­ing Python for da­ta pro­cess­ing or com­pu­ta­tion, do not usu­al­ly have the back­ground of a de­vel­op­er, so con­cepts like code read­abil­i­ty, tech­ni­cal debt, and main­tain­abil­i­ty, are help­ful in or­der ­to im­prove the code base. This gave me the idea of adapt­ing the ex­am­ples, per­haps adding one re­lat­ed to these ar­eas.

Python use cases

There were peo­ple from many coun­tries, in­dus­tries, and com­pa­nies with dif­fer­ent back­ground­s. The trend seems to be now on ­da­ta sci­ence, but Python is wide­ly used in many ar­eas.

I be­lieve the main ar­eas of fo­cus for Python are: soft­ware de­vel­op­men­t, sys­tem ad­min­is­tra­tion / Dev Op­s, and sci­ence.

There were talk­s, track­s, ses­sion­s, and train­ings for all of them, with very tech­ni­cal de­tail.

Highlights

There were so many great talks and re­sources that I can­not name each sin­gle one of them, so I will point the main ­topics and some of the talks that grabbed my at­ten­tion the most, but please keep in mind that all were great.

Among the many things pend­ing to test and re­search, are al­so book­s. I learned about PY­RO4, for man­ag­ing Python re­mote ob­ject­s, which seems like a promis­ing tech­nol­o­gy. I will dive in­to more de­tail on con­da and the build­ing sys­tem­s, con­da chan­nel­s, etc. The talk “Ex­plor­ing your Python in­ter­preter” was re­al­ly in­ter­est­ing, and it was a good in­tro­duc­tion, in or­der ­to be­come in­volved with CPython de­vel­op­men­t.

I at­tend­ed many talks about the lat­est fea­tures of Python 3.5, such as asyn­cIO, corou­ti­nes, and all the new func­tion­al­i­ties for asyn­chronous pro­gram­ming, and they all were re­al­ly in­ter­est­ing. In par­tic­u­lar “The re­port of Twist­ed’s Death” was very in­ter­est­ing, and (spoil­er alert), it looks like still has an in­ter­est­ing fu­ture com­pet­ing with the new li­braries and stan­dard­s.

On the light­ning talk­s, it was pre­sent­ed a re­verse de­bug­ger (revd­b), and its de­mo was amaz­ing.

Conclusion

Af­ter at­tend­ing many talk­s, and train­ings, talk­ing to many oth­er ex­pe­ri­ence de­vel­op­er­s, sys­tem ad­min­is­tra­tors, and da­ta sci­en­tist­s, I can state that the con­fer­ence has an amaz­ing learn­ing en­vi­ron­men­t, and the out­come was com­plete­ly pos­i­tive. It was use­ful ­for catch­ing up with tech­nol­o­gy, check­ing the en­vi­ron­ment and see how Python is be­ing used or de­ployed in the wild, learn from use cas­es, ex­pe­ri­ences, and ex­change ideas.

The con­tent was re­al­ly in­spir­ing and open-­mind­ing. I have lots of items to check, as points for re­search, which I will cov­er in fol­low­ing en­tries.

Python 3 is much more wide­ly used than one would ex­pec­t. It is ac­tu­al­ly the stan­dard now, and many talks (in­clud­ing mine), were us­ing Python 3 ­code, but most im­por­tant­ly, most projects are now in this ver­sion, where­as Python 2 looks like the lega­cy op­tion. Good news :-)

All in al­l, this edi­tion of Eu­roPy­thon was awe­some, and I am look­ing for­ward to pre­sent­ing again next year!

Upcoming talk at EuroPython 2016

I am glad to in­form that I will be speak­ing at Eu­roPy­thon 2016 con­fer­ence.

My sub­mis­sion about clean code in Python was ac­cept­ed, so in the next edi­tion of Eu­roPy­thon 2016, in Bil­bao, Spain, I will talk about clean code prin­ci­ples for Python. Here is the ab­strac­t:

https://ep2016.europython.eu/conference/talks/clean-code-in-python

The full list of talks is avail­able at:

https://ep2016.europython.eu/en/events/sessions/

If you are in­ter­est­ed, sub­scribe to the Eu­roPy­thon blog and Youtube chan­nel. I will in­clude more de­tail­s in a sep­a­rate post.

Glimpses of a Vim configuration

It’s been a while since I start­ed track­ing ver­sions of my cus­tom Vim con­fig­u­ra­tion, and mak­ing it avail­able as an open source soft­ware in Github. The best of this project is, in my opin­ion, to have it un­der ver­sion con­trol, so I can track changes and re­leas­es.

Ev­ery once in a while, when I find a new set­ting, or a great new fea­ture, I mod­i­fy the con­fig­u­ra­tion, so they will be­come avail­able on the next re­lease. Be­sides the fea­tures that are men­tioned in the pro­jec­t, and the cus­tomiza­tions made, I feel very com­fort­able with the colour scheme I made.

Here are some glimpses of it:

../vim-capture1.png

First cap­ture of colours, and lay­out

The colour scheme is gen­er­al for the syn­tax high­light­ing of all types rec­og­nized by Vim. Please note this might al­so de­pend on the con­fig­u­ra­tion of your ter­mi­nal.

../vim-capture2.png

The tabs are al­so themed ac­cord­ing to the menus.

Any sug­ges­tions or im­prove­ments to the code and con­fig­u­ra­tion can be made on the Github projec­t.

Deleting commented out code

This is a rule I al­ways en­cour­age in soft­ware de­vel­op­men­t. More­over, I con­sid­er it to be some­thing that has ­to be in­clud­ed in ev­ery cod­ing guide­line of a good projec­t.

There are sev­er­al rea­sons for this, but prob­a­bly the best ex­pla­na­tion can be found in the book “Clean Code”[1] by un­cle Bob, on which ex­plains that the code, gets out­dat­ed (rot­ten) with the rest of the sur­round­ing code, and hence it makes a place for con­fu­sion, lead­ing to an er­ror-prone spot.

There are, how­ev­er, peo­ple that seem to find some ar­gu­ments for com­ment­ing out code, or leav­ing it. ­Some com­mon ar­gu­ments/rea­sons usu­al­ly are:

  • I might need this func­tion­al­i­ty lat­er..”

We have source con­trol sys­tems (for ex­am­ple git) for this. In git, any­thing can be re­stored from a pre­vi­ous point. If the soft­ware is prop­er­ly un­der ver­sion con­trol, there is no rea­son to fear da­ta loss. Trust git, code fear­less­ly.

  • This is tem­po­rary dis­abled… It will be re­stored lat­er”.

Again, same prin­ci­ple, re­ly on the ver­sion con­trol sys­tem. Save a patch, and then re­store lat­er, or stash the changes, re­vert the com­mit, etc. As you see, there are plen­ty of bet­ter op­tions for solv­ing this sce­nari­o.

  • Code that was left from the fist ver­sion

Prob­a­bly de­bug­ging left­over­s. No doubt here: seek, lo­cate, de­stroy.

There is, a clear prob­lem with code that is un­der com­men­t, which is that it is “frozen” in time: it was good at some point, but then it was left there while the rest of the code around it, evolved, so this old code might not cer­tain­ly work (hence it is “rot­ten”), so un-­com­ment­ing it is a bad idea be­cause it will prob­a­bly crash.

An­oth­er prob­lem is that it can be a source of bias for some oth­er de­vel­op­er, who wants to main­tain that code at a fu­ture point in time. The one who left the rot­ten code, might have thought that it was a source of in­spi­ra­tion for when this func­tion­al­i­ty was ­go­ing to be ap­plied, but in­stead, it is just bi­as­ing the new de­vel­op­er with this skele­ton, pre­vent­ing from a brand new, fresh idea for that code.

There­fore, for these main rea­sons (an prob­a­bly much more), hav­ing code that is com­ment­ed in a code base, is a poor prac­tice, (not to ­men­tion a code smell). If you are a sea­soned de­vel­op­er, who cares about code qual­i­ty, and best prac­tices, you must not doubt when delet­ing it. Delete com­ment­ed out code mer­ci­less­ly: seek, lo­cate and de­stroy.

[1] A book I highly recommend: https://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882

strncpy and strncat for copying strings in C

Recently, I’ve read an interesting article [1], explaining why the strncpy function in C is not safer than strcpy. The post was very interesting, but what’s more, it suggested an alternative idiom for copying strings in C, that might probably be the way to go.

Lat­er, in an­oth­er ar­ti­cle [2] that com­pared some func­tion­al­i­ty in C, Python and Go, one of the com­ments ­point­ed out that very same id­iom. That grabbed my at­ten­tion, so I de­cid­ed to try it in an ex­am­ple.

The problem with strncpy seems to be the way it manages the source string to be copied. Based on the sample code provided in the documentation [3] (that should be just a reference), the break condition is up to n characters (the third parameter) or until the source string is exhausted, whatever happens first. This should not be a problem, unless n < strlen(source_string). That parameter would make strncpy to finish before it can put a \0 character at the end of the target string, leaving an invalid array of characters [5].

This is an ex­am­ple.

#include <stdio.h>  /* stdout, fprintf */
#include <string.h>  /* strncpy, strlen */

#define TARGET 10

int main(int argc, char* argv[]) {

    char *src = "Castle";  /* 6 chars long */
    char dst[TARGET] = "__________";

    dst[TARGET - 1] = '\0';

    fprintf(stdout, "%s\n", dst);   /* must be: '_________' */
    /* What happens if I pass a wrong length (lower than the actual
    * strlen) */
    strncpy(dst, src, 3);
    fprintf(stdout, "%s\n", dst);   /* must be: 'Cas______' */
    /* If I copy the string correctly, by passing the right
    * length, then strcnpy behaves as expected */
    strncpy(dst, src, strlen(src) + 1);
    fprintf(stdout, "%s\n", dst);   /* must be: 'Castle' */

    return 0;
}

On this example, the target array is represented by the variable dst, and I used a fixed-length string, on purpose for the demonstration, simulating what would actually happen. I null-terminated it so the program can finish successfully, because otherwise the operations on it would not end until the delimiter is reached, and we cannot know when that will happen, considering what’s in memory at that time. In addition, the unpredictable behaviour will lead to errors, and probably to memory corruption. The underscore, should be interpreted as slots: regions or reserved memory that are there, but empty.

The proposed idiom uses strncat (see [4]), tricking the function by passing it an empty string as the first parameter, and then the actual string we need to copy. This call will render the same result, but without the previous side effect. Let’s see an example:

#include <stdio.h>  /* stdout, fprintf */
#include <string.h>  /* strncat, strlen */

#define TARGET 10

int main(int argc, char* argv[]) {

    char *src = "Castle";  /* 6 chars long */
    char dst[TARGET] = "__________";
    dst[TARGET - 1] = '\0';

    fprintf(stdout, "%s\n", dst);   /* must be: '_________' */
    /* Prepare destination string */
    dst[0] = '\0';
    /* Copy with strncat */
    strncat(dst, src, 3);
    fprintf(stdout, "%s\n", dst);   /* must be: 'Cas' */
    /* If I copy the string correctly, by passing the right
    * length, then strcnpy behaves as expected */
    dst[0] = '\0';
    strncat(dst, src, strlen(src) + 1);
    fprintf(stdout, "%s\n", dst);   /* must be: 'Castle' */
    /* If I try to overrun the buffer */
    dst[0] = '\0';
    strncat(dst, src, strlen(src) + 10);
    fprintf(stdout, "%s\n", dst);   /* must be: 'Castle' */

    return 0;
}

Here we see, the er­ror is no longer pre­sen­t, prob­a­bly be­cause of the dif­fer­ence on the im­ple­men­ta­tion (the snip­pet on the doc­u­men­ta­tion [4] gives us a hint on what it does, so we can spot the change).

This might seem as a little issue, but it raised some concerns on the Linux kernel development, at the point that a new function was developed. The strscpy function is being included in the Kernel development for Linux 4.3-rc4 [6] because it is a better interface. Some of the problems mentioned in the commit message, that inspired this new version, are the ones described on the previous paragraphs.

This makes me wonder, if this should be the “correct” way for performing this operation “safely” in C. In all cases, the error is the same (not checking the boundaries, and trusting the input), and should be avoided. What I mean by this, is that we cannot simply rely on those functions being secure, the security must be in our code, so the proper way to handle these situations is to code defensively: do not trust user input, always check the boundaries, error codes, memory allocation, status of the pointer (a free for every malloc but not for a NULL pointer, etc.).

[1] https://the-flat-trantor-society.blogspot.com.ar/2012/03/no-strncpy-is-not-safer-strcpy.html.
[2] https://blog.surgut.co.uk/2015/08/go-enjoy-python3.html.
[3] strncpy documentation.
[4] (1, 2) strncat manual page.
[5] An array of characters that is not null-terminated, is invalid.
[6] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=30c44659f4a3e7e1f9f47e895591b4b40bf62671.

Setting user permissions in KVM

In a pre­vi­ous ar­ti­cle I men­tioned how to in­stall a li­brary in Fe­do­ra, in or­der to make KVM vir­tu­al­iza­tion easier, man­ag­ing the NAT net­work con­fig­u­ra­tion be­tween the guest vir­tu­al ­ma­chine and the host, by means of lib­virt.

Be­sides that, while us­ing KVM lo­cal­ly for de­vel­op­men­t, I use virt-­­man­ag­er, a help­ful ap­pli­ca­tion that man­ages the dif­fer­ent vir­tu­al ­ma­chines. This ap­pli­ca­tion, as well as the rest of the com­mands that in­ter­ac­t with lib­virt (virsh for ex­am­ple), re­quire su­per us­er priv­i­leges, so it will prompt for the su­do pass­word ev­ery time.

This can be avoid­ed by in­clud­ing the us­er in­to the fol­low­ing group­s: kvm, and lib­virt.

There­fore, just by run­ning the fol­low­ing com­mand we can skip the pass­word prompt ev­ery time.

sudo usermod -a -G kvm,libvirt mariano

This is an op­tion I would use on­ly for lo­cal de­vel­op­ment on my ma­chine. Pro­duc­tive en­vi­ron­ments must have an strict per­mis­sions man­age­men­t.

Notas sobre la ArqConf 2015

Este es mi re­sumen so­bre la Ar­q­Conf 2015, la con­fer­en­cia so­bre ar­qui­tec­tura de ­soft­ware que tu­vo lu­gar en la UCA el 30 de Abril de 2015. La idea es sin­te­ti­zar las prin­ci­pales ideas que me llevé y re­saltar lo más im­por­tante.

Se pre­sen­tan a con­tin­uación un lis­ta­do de las ideas prin­ci­pales por char­la con un breve lis­ta­do de lo que más desta­co por ca­da un­o. Nótese que la lista no es de ningu­na man­era ex­haus­ti­va, y ca­da sec­ción es en re­al­i­dad un breve pár­rafo ilus­tra­tivo a mo­do de re­sumen muy a al­to niv­el.

Ca­da sec­ción ll­e­va un tí­tu­lo alu­si­vo al tema de la pre­sentación, un breve re­sumen y una lista con los prin­ci­pales pun­tos que desta­co.

Orden en una arquitectura y la agilidad como atributo de calidad

Se basó en la ex­pe­ri­en­cia de un ar­qui­tec­to lid­eran­do un equipo de ar­qui­tec­tura para un de­safi­ante proyec­to. El dis­er­tante ex­plicó los prob­le­mas a re­solver, y el mar­co tec­nológi­co en el que se de­sar­rol­ló la solu­ción, y có­mo con un­a ar­qui­tec­tura el­e­gante y sim­ple con rel­a­ti­va­mente pocos com­po­nentes se puede ll­e­var a cabo una im­ple­mentación de gran porte que de so­porte a 150.000 transac­ciones ­con­cur­rentes.

  • El fac­tor clave del éx­i­to de una ar­qui­tec­tura es la co­mu­ni­cación.
  • La ag­ili­dad de un equipo co­mo atrib­u­to de cal­i­dad. Es in­tere­san­te, porque cuan­do un­o pi­en­sa en atrib­u­tos de cal­i­dad se le ocur­ren cosas co­mo se­guri­dad, man­teni­bil­i­dad, us­abil­i­dad, es­cal­a­bil­i­dad, etc. pero no el he­cho de ser ágil. Sin em­bar­go es en ­gen­er­al de­seable que el equipo sea ágil y se pue­da adap­tar fá­cil­mente a los cam­bios, y en ese ca­so ¿Por qué no agre­gar­lo co­mo un atrib­u­to de cal­i­dad?
  • La flex­i­bil­i­dad del equipo, tam­bién co­mo atrib­u­to de cal­i­dad. Anál­o­go al an­te­ri­or, pero con un de­talle: si es un atrib­u­to de cal­i­dad, tiene que ser med­i­ble. Lo in­tere­san­te no es só­lo la orig­i­nal­i­dad de este tipo de atrib­u­tos de cal­i­dad “no tradi­cionales”, si­no tam­bién en co­mo con­sid­er­ar­los den­tro de los es­ce­nar­ios de cal­i­dad.
  • Los re­quer­im­ien­tos deben pri­orizarse en el mar­co glob­al de la or­ga­ni­zación.

Arquitectura y Big Data Analytics.

Una pre­sentación ex­ce­len­te, con mu­cho de­talle y riqueza téc­ni­ca, nom­bran­do tec­nologías, metodologías, téc­ni­cas, y es­ti­los de ar­qui­tec­tura ori­en­ta­dos a Big Da­ta.

  • Kaf­ka co­mo her­ramien­ta para proce­samien­to de in­for­ma­ción en co­las de men­sajes no tradi­cionales (­con es­ta­do per­sis­ten­te). Es un ejem­p­lo de una tec­nología que tu­vo un ca­so de éx­i­to re­al en proyec­tos de Big Da­ta.
  • Siem­pre guardar la fuente de datos: lla­ma­da “la fuente de la ver­dad” (the source of truth), es una bue­na prác­ti­ca, ya que tiene varias ven­ta­jas, co­mo por ejem­plo:
    • Per­mite cor­re­­gir er­rores en ca­­so de fal­la (a par­tir de los datos, se puede volver a pro­ce­sar y no hay una pér­di­­da ir­re­cu­per­a­ble de in­­­for­­ma­­ción).
    • Pre­ser­­van­­do los datos orig­i­­nales (raw da­­ta), es posi­ble en un fu­­turo elab­o­rar o cal­cu­lar nuevas métri­c­as si se re­quieren, cosa que si por el con­­trario só­­lo se guardaran los datos pro­ce­sa­­dos, sería im­­posi­ble.
    • El cos­­to ex­­tra por el al­­ma­ce­­namien­­to no de­bería ser un prob­le­­ma, con­sider­an­­do los ben­e­fi­­cios.
  • Proce­sar la in­for­ma­ción de for­ma idem­po­tente: es­ta quizá sea la idea que mejor re­fle­ja un­a bue­na prác­ti­ca gen­er­al, no so­lo apli­ca­ble a Big Da­ta. En lu­gar de proce­sar mod­i­f­i­can­do reg­istros (­por ejem­p­lo eje­cu­tan­do un SQL que sume uno en al­gu­na colum­na), sim­ple­mente se agregue una nue­va en­tra­da y luego el re­sul­ta­do se cal­cule so­bre el to­tal. De es­ta man­era no se mod­i­f­i­can los datos, y de nuevo, un po­ten­cial er­ror es repara­ble, no hay pér­di­da ir­re­versible de in­for­ma­ción, etc. És­ta en re­al­i­dad es una idea que ya ex­istía en sis­temas de BI, pero es in­tere­sante no­tar que reg­is­trar los he­chos se puede us­ar para mu­chos más ca­sos.
  • Sim­pli­ficar las vari­ables tec­nológ­i­cas. En lu­gar de ten­er un ex­ten­so reper­to­rio tec­nológi­co con ­muchas tec­nologías de propósi­to es­pecí­fi­co, es mejor y más fá­cil de man­ten­er un en­torno con menos tec­nologías, y, aunque es­tas no se adapten per­fec­ta­mente a ca­da prob­le­ma en par­tic­u­lar, aún así hay que priv­i­le­giar el prag­ma­tismo, ha­cien­do los ajustes nece­sar­ios.
  • Ten­er un es­que­ma de datos (da­ta schema) para poder in­te­grar la in­for­ma­ción que se proce­sa des­de d­ifer­entes fuentes.

Arquitecturas de Micro servicios.

Es muy in­tere­sante es­cuchar so­bre los mi­cro ser­vi­cios, y có­mo este tipo de ar­qui­tec­turas per­miten una es­cal­a­bil­i­dad más flex­i­ble.

  • Las ar­qui­tec­turas de mi­cro ser­vi­cios per­miten obten­er la mis­ma fun­cional­i­dad, pero de for­ma dis­tribuida, en con­tra­posi­ción a lo que sería una ar­qui­tec­tura monolíti­ca.
  • Es­to per­mite es­calar de for­ma más flex­i­ble, por ejem­p­lo se pueden ad­min­is­trar los sub­sis­temas de for­ma in­de­pen­di­en­te, asig­nan­do los re­cur­sos o man­te­nien­do más com­po­nentes pero más sim­ples.
  • Es­ta sep­a­ración tam­bién puede re­fle­jarse en equipos de tra­ba­jo, áreas o pro­ce­sos.

Arquitectura y métodos ágiles

En es­ta ocasión, se habló de la ar­qui­tec­tura de soft­ware des­de el pun­to de vista de las metodologías ágiles y los pro­ce­sos de de­sar­rol­lo alin­ea­d­os a los re­quer­im­ien­tos fun­cionales del ne­go­cio.

  • El equipo puede con­ver­sar la ar­qui­tec­tura en fun­ción de los re­quer­im­ien­tos con el PM, sin nece­sari­a­mente en­trar en mu­chos de­talles téc­ni­cos, con­cen­trán­dose en la fun­cional­i­dad y ­com­por­tamien­to es­per­a­do.
  • És­ta con­ver­sación so­bre la ar­qui­tec­tura debe ser con­stante a lo largo de to­do el ci­clo de de­sar­rol­lo.

Arquitectura aplicada la producción

Ex­ce­lente cierre de la con­fer­en­ci­a. Hi­zo mu­cho hin­capié en có­mo se ve a la ar­qui­tec­tura y el rol del ar­qui­tec­to o el equipo de ar­qui­tec­tura des­de el pun­to de vista del CIO. És­to dilu­cidó bas­tan­te ­so­bre lo que se es­pera del equipo de ar­qui­tec­tura para que la or­ga­ni­zación fun­cione.

Lo más desta­ca­do fue ver qué es lo que se es­pera y lo que NO se es­pera del ar­qui­tec­to, y có­mo ­lo más im­por­tante es poder brindar una solu­ción co­mo in­ge­nieros, que re­spon­da a las necesi­dades del ne­go­cio. La prin­ci­pal riqueza es­tu­vo en que las ideas fueron ilustradas con ex­pe­ri­en­cias reales en Da­ta Cen­ters reales.

Al­go lla­ma­ti­vo es que muchas ideas men­cionadas son en re­al­i­dad cues­tiones que se asumen en un proyec­to de soft­ware, pero co­mo sabe­mos en la prác­ti­ca no siem­pre sucede, y es­to deri­va en ma­l­os re­sul­ta­dos.

  • La in­te­gri­dad con­cep­tu­al es fun­da­men­tal: Las solu­ciones deben pro­por­cionarse de for­ma uni­forme, apli­can­do sendos es­ti­los y tec­nologías para los mis­mos tipos de prob­le­mas. Análoga­mente, si ­para difer­entes proyec­tos se us­an muchas tec­nologías difer­entes, el re­sul­ta­do es una ar­qui­tec­tura gi­gante y muy difí­cil de man­ten­er.
  • Ca­da com­po­nente téc­ni­co in­ter­no del equipo de in­ge­niería no es el prin­ci­pal ob­je­ti­vo de la or­ga­ni­zación, si no que es­tán para re­spon­der a és­tos.
  • Adop­tar nuevas tec­nologías so­lo por que pre­sen­ta al­gu­nas ven­ta­jas par­ciales no siem­pre es una bue­na idea a largo ­pla­zo. Suele suced­er que a largo pla­zo ter­mi­na te­nien­do con­se­cuen­cias per­ju­di­ciales para el proyec­to.
  • Los sis­temas deben dis­eñarse y con­stru­irse para du­rar var­ios años (~10), y es­to im­pli­ca que las tec­nologías de con­struc­ción tienen que ten­er var­ios años de ex­i­s­tir, de man­era que sea ra­zon­able asev­er­ar que seguirán es­tando disponibles el tiem­po que dure el sis­tema pro­duc­ti­vo. No sería de­seable ten­er que man­ten­er o hac­erse car­go de tec­nologías (frame­work­s, toolk­it­s, etc.) ob­so­le­tas.
  • Criticar las lla­madas “bue­nas prác­ti­cas” (o ver­dades rev­e­ladas). Es­to sig­nifi­ca que cuan­do al­go se de­nom­i­na co­mo bue­na prác­ti­ca hay que plantearse si real­mente es así, y aunque lo fuer­a, si e­sas ven­ta­jas que trae apli­can al proyec­to en cuestión. És­ta es otra idea más gen­er­al, se tra­ta en defini­ti­va de ten­er pen­samien­to críti­co, pero es al­go que en mu­chos ca­sos no sucede, y ve­mos en gen­er­al ­var­ios proyec­tos apli­can­do “pa­trones de dis­eño” (o de ar­qui­tec­tura) o “bue­nas prác­ti­cas ágiles”, etc. sin pen­sar real­mente có­mo apli­can al proyec­to (al­go puede haber da­do re­sul­ta­dos ex­ce­lentes en otro proyec­to, en otra em­pre­sa, en otro país, pero el ar­qui­tec­to debe con­sid­er­ar si esas vari­ables real­mente co­in­ci­den o ­son rel­e­vantes al con­tex­to).

>>> Conclusiones

Con­sidero que la con­fer­en­cia fue muy bue­na, te­nien­do en cuen­ta la cal­i­dad de las pre­senta­ciones, la ­ex­pe­ri­en­cia de los dis­er­tantes y que to­do es­ta­ba alien­ado con­cep­tual­mente, lo cual hi­zo que la tran­si­ción en­tre temas tu­viera una con­tinuidad no­table.

Es además im­por­tante destacar que este tipo de con­fer­en­ci­as, además de ser en­rique­cer la ex­pe­ri­en­ci­a pro­fe­sion­al de to­dos (dis­er­tan­tes, or­ga­ni­zadores y con­cur­rentes), ben­e­fi­cian a la co­mu­nidad de ar­qui­tec­tos.

Running RabbitMQ server on Docker

If you use Rab­bit­MQ for de­vel­op­ment fre­quent­ly, some­times you might have found it us­es too much re­sources (it’s nor­mal while pro­gram­ming to have a lot of queues or tasks be­ing queued and that makes the CPU us­age to spike).

Having RabbitMQ installed on the OS seems the perfect approach for production, but on development I’d rather do something different, in order to isolate the process. I know I could bound it (for example order it not to start automatically as a dæmon), by means of systemd but a few weeks ago I decided to try docker and see how it results.

It turned out to be just the tool for the work, and so far with a lit­tle sim­ple ­con­fig­u­ra­tion it can run as ex­pect­ed.

There is al­ready a dock­er im­age for Rab­bit­MQ, which can be au­to­mat­i­cal­ly pulled, and then run, for ex­am­ple:

sudo docker pull rabbitmq
sudo docker run -d -e RABBITMQ_NODENAME=my-rabbit --cpuset="1" --name docker.rabbitmq -p 5672:5672 rabbitmq:3

The -d option indicates the process to start detached, then by passing -e we pass some environment variables (in this case, the RABBITMQ_NODENAME is a particular variable for rabbit indicating how to set the name of the node it is starting). Optionally, we can limit the CPU usage with the --cpuset, as in this case which sets the process to use the second core of the machine (it starts at 0). Then the --name is a name for the docker being created.

The most important part in this case is the port mapping, made by the -p option which in this case maps the port used by RabbitMQ directly (1:1) with the host machine. This makes the docker process to run transparently, as the other applications that try to communicate with a RabbitMQ won’t notice any difference, making it look like is executing an actual RabbitMQ service. Finally there is the name of the docker image to run.

What I usually do is to manage the docker image by its instance_id (a number that is displayed after listing the docker images, by doing sudo docker ps -a). Then we can manage it by sudo docker [start|stop] <instance_id>.

There is another command to see the output being generated by the process which is docker logs rabbitmq.docker. Notice in this case the name designated to the image was used instead of the instance_id. In addition we can see internal data for the process by running the inspect command (again we can use the instance_id or the name we assigned).

docker inspect rabbitmq.docker
sudo docker logs docker.rabbitmq

It’s important to notice that docker is actually not a virtualization platform, but a mechanism that runs processes in containers, meaning that in this case the entire RabbitMQ is running as a single process within a container, with some other limitations and bounds constrained by docker.

I found this ap­proach to be very ver­sa­tile for a de­vel­op­ment en­vi­ron­men­t, and with Rab­bit­MQ be­ing the first pi­lot, I think I can mi­grate more ap­pli­ca­tions to dock­er in­stead of hav­ing them in­stalled on the sys­tem (as long as pos­si­ble).

Find Options

Among the core-u­til­s, find is one of the most use­ful com­mand­s. Though I use the ba­sic func­tion­s ­most of the time, find has a wide range of pa­ram­e­ter­s, and it comes in handy not on­ly for find­ing ­files, but al­so for op­er­at­ing a bunch of them at once. Here is a very sim­ple ex­am­ple.

Imag­ine you have to move many files to a di­rec­to­ry, but they all call dif­fer­ent so a glob is no use, and ­man­u­al­ly mov­ing all of them is not an op­tion. A pos­si­ble ap­proach would be to lo­cate the first of the batch (for ex­am­ple ­by run­ning ls -l­rth). Sup­pose the first one of the batch is called /tm­p/check­point (for this ex­am­ple let’s as­sume the files re­side at /tmp).

The com­mand would be:

find /tmp -type f -anewer /tmp/checkpoint -exec mv '{}' <target_directory> \;

The -type f part is im­por­tant in or­der not to move the en­tire di­rec­to­ry (find on­ly the files). Then we have the -anew­er that re­ceives a file as a pa­ram­e­ter, and it will fil­ter for those files whose ­mod­i­fi­ca­tion date is greater than the file used as an ex­am­ple (hence, this must be the start of the batch), and ­fi­nal­ly the -ex­ec part is in­ter­est­ing be­cause as men­tioned at the be­gin­ning, it al­lows to per­for­m ar­bi­trary op­er­a­tions on the group of files (in this case to move them to an­oth­er lo­ca­tion, but other ac­tions such as mod­i­fi­ca­tion­s, sed, etc. are al­so pos­si­ble).

An­oth­er trait I like about find is that presents a se­cure and well-de­fined in­ter­face, mean­ing that in­ ­some cas­es I can first check the re­sults pri­or to ex­e­cute an ac­tion. For ex­am­ple, if we would like to check­ ­for delet­ing some un­nec­es­sary files:

find . -name "*.pyc"

By is­su­ing this com­mand we list some files to erase. And then we can sim­ply do that by ap­pend­ing -delete to the very same com­mand.

This is just the tip of the ice­berg of the things that are pos­si­ble by means of the find com­mand and its var­i­ous op­tion­s.