Running PostgreSQL in memory with docker

Introduction

In­te­rac­ting wi­th a da­ta­ba­se, can be a re­gu­lar ta­sk of a de­ve­lo­pe­r, and for tha­t we would like to en­su­re that we are de­ve­lo­ping and tes­ting in si­tua­tions clo­se ­to a real im­ple­men­ta­tio­n; the­re­fo­re, using the sa­me da­ta­ba­se as in ­pro­duc­tion can help de­tec­ting is­sues ear­l­y.

Ho­we­ve­r, se­tting up an en­ti­re da­ta­ba­se ser­ver for de­ve­lo­p­men­t, can be­ ­cum­ber­so­me. Ho­pe­fu­lly no­wa­da­ys, mo­dern ope­ra­ting sys­te­ms like Li­nux ha­ve ­great tools and fea­tu­res that we could take ad­van­ta­ge of.

In par­ti­cu­la­r, I would like to se­tup a sim­ple da­ta­ba­se lo­ca­lly usin­g do­cker, and sto­ring the da­ta in me­mo­r­y.

The idea is sim­ple: run a do­cker con­tai­ner wi­th the ima­ge for Pos­tgresS­QL, u­sing a tm­pfs [1] [2] as sto­ra­ge for the da­ta­ba­se (a ra­m­fs could al­so­ ­be us­e­d).

Procedure

Firs­t, we get the ima­ge of Pos­tgresS­QL ac­cor­ding to the pla­tfor­m, fo­r e­xam­ple:

docker pull fedora/postgresql

Then, I could crea­te a tm­pfs, for the da­ta and mount it

sudo mkdir /mnt/dbtempdisk
sudo mount -t tmpfs -o size=50m tmpfs /mnt/dbtempdisk

Now we could run the da­ta­ba­se con­tai­ner using this di­rec­to­r­y:

1
2
3
4
5
6
7
docker run --name mempostgres \
    -v "/mnt/dbtempdisk:/var/lib/pgsql/data:Z" \
    -e POSTGRES_USER=<username-for-the-db> \
    -e POSTGRES_PASSWORD=<password-for-the-user> \
    -e POSTGRES_DB=<name-of-the-db> \
    -p 5432:5432 \
    fedora/postgresql

The first li­ne in­di­ca­tes the na­me for the con­tai­ner we are run­ning (if is no­t s­pe­ci­fie­d, do­cker wi­ll put a de­fault one); the se­cond li­ne is the im­por­tan­t o­ne, sin­ce it is what makes the ma­pping of di­rec­to­rie­s, mea­ning that wi­ll ma­p ­the di­rec­to­ry for the tm­pfs on the hos­t, mounted as /va­r/­li­b/­pgs­q­l/­da­ta in­si­de the con­tai­ner (the tar­ge­t). The la­ter di­rec­to­ry is the one Pos­tgreS­QL uses by de­fault for ini­tia­li­zing and sto­ring the da­ta of the ­da­ta­ba­se. The Z at the end of the ma­pping is an in­ter­nal de­tail fo­r ­fla­gging that di­rec­to­ry in ca­se SE­Li­nux is ena­ble­d, so it wi­ll not fail due ­to a per­mis­sions errors (be­cau­se con­tai­ners run as ano­ther use­r, and we are ­moun­ting so­me­thing that mi­ght be out of that sco­pe) [3].

The rest of the th­ree li­nes, are en­vi­ron­ment va­ria­bles that do­cker wi­ll use fo­r ­the ini­tia­li­za­tion of the da­ta­ba­se (they are op­tio­na­l, and de­faul­ts wi­ll be­ u­s­e­d, in ca­se they are not pro­vi­de­d). Then fo­llo­ws the port ma­ppin­g, whi­ch in ­this ca­se in­di­ca­tes to map the port 5432 in­si­de the con­tai­ner to the sa­me o­ne on the hos­t. And fi­na­ll­y, the na­me of the do­cker ima­ge we wi­ll run.

On­ce this is run­nin­g, it would look like we ha­ve an ac­tual ins­tan­ce of Pos­tgreS­QL up and run­ning on our ma­chi­ne (ac­tua­lly we do, but it is in­si­de a con­tai­ner :-), so we can con­nect wi­th any client (e­ven a Py­thon a­ppli­ca­tio­n, etc.).

For exam­ple, if we want to use the ps­ql client wi­th the con­tai­ne­r, the ­co­m­mand would be:

1
2
3
4
docker run -it --rm \
--link mempostgres:postgres \
fedora/postgresql \
psql -h mempostgres -U <username-in-db> <db-name>

Applications

If we ha­ve Pos­tgreS­QL ins­ta­lle­d, we could sim­ply start a new ins­tan­ce as our user wi­th the co­m­mand (pos­tgres ...) and pa­ss the -D pa­ra­me­ter wi­th ­the de­si­red pa­th whe­re the da­ta­ba­se is going to sto­re the da­ta (whi­ch wi­ll be­ ­the tm­pfs/ra­m­disk). This would be ano­ther way of achie­ving the sa­me.

Re­gard­le­ss the im­ple­men­ta­tion­s, he­re are so­me po­ten­tial appli­ca­tion­s:

  1. Lo­cal de­ve­lo­p­ment wi­thout re­qui­ring disk sto­ra­ge, and run­ning fas­ter at the sa­me ti­me.
  2. Unit tes­tin­g: unit tes­ts should be fas­t, grante­d. So­me­ti­me­s, it makes ­per­fect sen­se to run the tes­ts against an ac­tual da­ta­ba­se (prac­ti­ca­li­ty ­bea­ts pu­ri­ty), even if this makes them “in­te­gra­tio­n/­func­tio­na­l” tes­ts. In ­this re­gar­d, ha­ving a li­gh­twe­ight da­ta­ba­se con­tai­ner run­ning lo­ca­lly coul­d a­chie­ve the goal wi­thout com­pro­mi­sing per­for­man­ce.
  3. Iso­la­tio­n: (this on­ly applies for the con­tai­ner appro­ach), run­nin­g Pos­tgreS­QL in a do­cker con­tai­ne­r, en­cap­su­la­tes the li­bra­rie­s, tool­s, ­pa­cka­ges, etc. in do­cker, so the rest of the sys­tem does not ha­ve to­ ­main­tain mu­ch other pa­cka­ges ins­ta­lle­d. Thi­nk of if as a sort of “vir­tua­l en­vi­ron­men­t” for pa­cka­ges.

All in all, I thi­nk it’s an in­te­res­ting appro­ach, wor­th con­si­de­rin­g, at leas­t ­to ha­ve al­ter­na­ti­ves when wo­rking in pro­jec­ts that re­qui­re in­ten­se in­te­rac­tio­n wi­th the da­ta­ba­se.

[1] : https://www.kernel.org/doc/Documentation/filesystems/tmpfs.txt
[2] : https://en.wikipedia.org/wiki/Tmpfs
[3] : http://www.projectatomic.io/blog/2015/06/using-volumes-with-docker-can-cause-problems-with-selinux/