postgresql and some grsec kernels = FAIL
Short message: Do not run postgres on some grsec - patched kernels.
Disclaimer: I write this just because I didn't find any clear explanations of the problem on the net, and I feel that such note can be useful for other who have this problem. I'm not an expert on grsec.
The story
A friend of mine had a problem with his postgres server (PostgreSQL 8.3.8).
The application (Perl/Java) was quite simple but it generated heavy load on the machine.
While running some INSERT/UPDATE queries, there happened random segmentation faults, like this one:
Dec 7 07:24:45 nsXXXXXX kernel: postgres[22481]: segfault at 7fda5e1d5000 ip 00007fda604553c3 sp 00007fffe41faf28 error 4 in libc-2.9.so [7fda603d1000+168000]
Dec 7 07:24:45 nsXXXXXX kernel: grsec: From XX.YY.ZZ.51: Segmentation fault occurred at 00007fda5e1d5000 in /usr/lib/postgresql/8.3/bin/postgres[postgres:22481] uid/euid:103/103 gid/egid:114/114, parent /usr/lib/postgresql/8.3/bin/postgres[postgres:29857] uid/euid:103/103 gid/egid:114/114
As you imagine, one of the backends went away. It happened few times a day.
After one of these segfaults, the server detected some corrupted pages ("ERROR: compressed data is corrupt") - effectively, the database was trashed, junk, byebye. Not possible to pg_dump, randomly damaged pages.
So you understand, this made the server completely useless.
After taking some advice from the community, and investigating some false traces (bad memory? OOM/overcommit?) we finally tracked this down to nonstandard kernel version:
# uname -a
Linux nsXXXXXX.ovh.net 2.6.31.5-grsec-xxxx-grs-ipv4-64 #2 SMP Thu Nov 5 12:36:20 UTC 2009 x86_64 GNU/Linux
I was a bit suspicious about this, and my friend found some grsec problems reported by users (but not postgres-related).
So we gave it a try. And Voila! After replacing the kernel with non-grsec version, the problem went away.
Of course the database had to be rebuilt.
Lesson learned.
UPDATE:
This for sure does not apply to ALL grsec'ed kernels. But it definitely applies to this version which we had problems with - it comes from the hosting provider ovh.pl
PL/Perl regex vs builtin regex_replace in postgres
Maybe you (like me) wondered which one is faster... to use builtin regexp_replace function or to use Perl's regex engine via PL/Perl.
Here you are - this is a test I made (PostgreSQL 8.3.7, Perl 5.8.8, on CentOS 5,3)
First we have a plperl function:
CREATE OR REPLACE FUNCTION test_perl( TEXT ) RETURNS TEXT AS
$code$
my $str = shift;
$str =~ s{[[:space:]]+}{}g;
$str =~ s{^\+48}{};
$str =~ s{[^[:digit:]]+}{#}g;
$str =~ s{#$}{};
$str =~ s{^([[:digit:]]{3})#([[:digit:]]{3})#([[:digit:]]{3})$}{$1$2$3};
return $str if $str;
return undef;
$code$
LANGUAGE plperl IMMUTABLE STRICT;
Then we have it's regexp_replace equivalent (sorry - it's not pretty, but it does exactly the same thing, five replace operations ):
CREATE OR REPLACE FUNCTION test_rr( TEXT ) RETURNS TEXT AS
$code$
SELECT nullif( regexp_replace( regexp_replace( regexp_replace( regexp_replace( regexp_replace(
$1, '[[:space:]]+', '', 'g' ), '^\+48', '' ),
'[^[:digit:]]+', '#', 'g' ), '#$', '' ),
'^([[:digit:]]{3})#([[:digit:]]{3})#([[:digit:]]{3})$', '\1\2\3' ), '' )
$code$
LANGUAGE SQL IMMUTABLE STRICT;
Test run:
=> select count(*) from testtable where test_rr(telefon) is not null;
Time: 7051.000 ms
Time: 7629.000 ms
Time: 7534.000 ms
=> select count(*) from testtable where test_perl(telefon) is not null;
Time: 4087.000 ms
Time: 3931.000 ms
Time: 4056.000 ms
It appears that Perl regex engine is almost twice faster than the builtin one.
I will follow with more tests, because I have a very intensive usage of regular expressions here... Probably it will be even faster to move proessing out of the database :-(
The use of database as do-everything place is tempting, especially in postgres - but maybe it's not the right tool for the job.
Update: yes, we moved processing out of the database. Flat files processed with regular expressions... Works like a charm :)
running databases inside Vmware
This is just a warning note for people running databases inside Vmware.
The problem:
At my employers company we develop some business critical application for one of our customers. This customer decided to employ Vmware ESX host as a container for a testing farm for this application. One of servers in this farm is a PostgreSQL machine.
The Very Bad Effect that we noticed while debugging some nasty db-related problem is:
Inside a guest OS, time flows differently.
That is, one second can last 1/2 second or 3/4 second or 2 seconds - depending on the disk I/O. The higher disk I/O rates, the slower time flows.
Somehow, this has fatal impact on performance of PostgreSQL. Algorithms which rely on precise timing simply stop to work. This could be a matter of interesting analysis which I don't have time for right now.
I am not a Vmware expert, and probably this is just a matter of bad configuration.
Probably it appears only when you run many single-core guests inside one multi-core host, but anyway - be warned.
psql variables – quoting and more
psql, postgreSQL powerful command line interface, has it's variables system. I do use it sometimes but... how many single quotes should I put there?
I have puzzled over this so many times, that I decided to put it here as a memo.
Proper quoting of text variables
Variable declaration:
\set d1 '''2008-01-01 00:00:00.000'''
\set d2 '''2008-12-31 23:59:59.999'''
Variable usage:
SELECT something FROM somewhere
WHERE now() BETWEEN :d1 AND :d2;
To make this post a bit more useful, here are some other
examples of psql variables usage
A shortcut for frequently used SQL phrase:
\set cnt 'SELECT COUNT(*) FROM'
:cnt users;
Another one:
\set csn 'SELECT * FROM DBA.CLIENTS WHERE name LIKE'
\set csni 'SELECT * FROM DBA.CLIENTS WHERE name ILIKE'
:csni 'kowalski'\g all_kowalskis.txt
This one if for pretty-ascii-report-printout lovers:
\set hr **************************************
\set inform '\\echo :hr \\echo :msg \\echo :hr'
Usage of the above:
\set msg 'starting mass update\nat ' `date`
:inform
**************************************
starting mass update
at Fri Dec 19 21:15:49 CET 2008
**************************************
Nice, eh?
You can put such "macros" into your .psqlrc file to save some keystrokes.
You can even create your very own "library" of psql macros and include it in your scripts with \i.
Have fun!
Note: these are not variables in SQL stored procedures sense. This is just plain text substitution.
A bit similar to Host Variables (note :colon usage), but not so powerful.
Replikacja w PostgreSQL za pomocą Slony-I – update
W poprzednim artykule opisałem jak zestawić replikację w postgresie za pomocą slon-tools.
Dziś chciałem dodać parę rzeczy których tam zabrakło, a które są często potrzebne w praktycznym zastosowaniu Slony-I.
Dodawanie tabeli do replikacji
Załóżmy że chcemy dodać do replikacji tabelę froobles.
- Tabela musi istnieć na obu serwerach i mieć klucz główny. Możemy utworzyć ją ręcznie, lub wykorzystać skrypt który to zrobi:
slonik_execute_script set1 /path/to/froobles.sql
- W konfigu (plik slon_tools.conf) musimy dodać kolejną publikację (set) Slony-I. Dopisujemy go do hasha $SLONY_SETS.
$SLONY_SETS = { set1 => { set_id => 1, table_id => 1, sequence_id => 1, pkeyedtables => [ 'accounts', 'branches', 'tellers', ], }, # tymczasowy set dla dodawanych tabel set2 => { set_id => 2, table_id => 100, sequence_id => 100, pkeyedtables => [ 'froobles', ], }, }; 1; -
Następnie musimy utworzyć publikację w klastrze.
postgres@merry:~$ slonik_create_set set2 | slonik
-
Utoworzoną publikację (set #2) musimy zasubskrybować do slave'a (node #2):
postgres@merry:~$ slonik_subscribe_set set2 2 | slonik
-
Zawartość tabeli zostanie przekopiowana na serwer slave. Może to potrwać zależnie od wielkości tabeli.
-
Gdy już upewniliśmy się że tabela jest replikowana, możemy złączyć publikacje z powrotem w jedną. Służy do tego polecenie slonik_merge_sets:
postgres@merry:~$ slonik_merge_sets 1 set1 set2 | slonik
-
Uaktualniamy plik slon_tools.conf (patrz wyżej) tak by odzwierciedlał aktualną konfigurację klastra. Nowo dodane tabele powinny się znaleźć w "set1". Robimy to aby uniknąć zamieszania w razie konieczności odtworzenia klastra.
$SLONY_SETS = { set1 => { set_id => 1, table_id => 1, sequence_id => 1, pkeyedtables => [ 'accounts', 'branches', 'tellers', 'froobles', ], }, }; 1;
Usuwanie tabeli z replikacji
Jeśli chcemy dodawać, to pewnie czasem i usuwać.
Przykład: Usuwamy tabelę z publikacji nr 1. Musimy znać też ID tabeli nadane przez slonika; możemy je podejrzeć w bazie w tabeli _webdev.sl_table. W tym przykładzie jest to 100.
postgres@merry:~$ slonik_drop_table 100 set1 | slonik
Podsumowanie
Narzędzia slon-tools są naprawdę proste.
Wszystkie skrypty dostępne w pakiecie rozpoczynają się od prefiksu slonik_ lub slon_. Polecam zapoznanie się z nimi oraz z doumentacją Slony-I dostępną pod adresem http://slony.info.
Powodzenia!