PL/Perl regex vs builtin regex_replace in postgres
Maybe you (like me) wondered which one is faster... to use builtin regexp_replace function or to use Perl's regex engine via PL/Perl.
Here you are - this is a test I made (PostgreSQL 8.3.7, Perl 5.8.8, on CentOS 5,3)
First we have a plperl function:
CREATE OR REPLACE FUNCTION test_perl( TEXT ) RETURNS TEXT AS
$code$
my $str = shift;
$str =~ s{[[:space:]]+}{}g;
$str =~ s{^\+48}{};
$str =~ s{[^[:digit:]]+}{#}g;
$str =~ s{#$}{};
$str =~ s{^([[:digit:]]{3})#([[:digit:]]{3})#([[:digit:]]{3})$}{$1$2$3};
return $str if $str;
return undef;
$code$
LANGUAGE plperl IMMUTABLE STRICT;
Then we have it's regexp_replace equivalent (sorry - it's not pretty, but it does exactly the same thing, five replace operations ):
CREATE OR REPLACE FUNCTION test_rr( TEXT ) RETURNS TEXT AS
$code$
SELECT nullif( regexp_replace( regexp_replace( regexp_replace( regexp_replace( regexp_replace(
$1, '[[:space:]]+', '', 'g' ), '^\+48', '' ),
'[^[:digit:]]+', '#', 'g' ), '#$', '' ),
'^([[:digit:]]{3})#([[:digit:]]{3})#([[:digit:]]{3})$', '\1\2\3' ), '' )
$code$
LANGUAGE SQL IMMUTABLE STRICT;
Test run:
=> select count(*) from testtable where test_rr(telefon) is not null;
Time: 7051.000 ms
Time: 7629.000 ms
Time: 7534.000 ms
=> select count(*) from testtable where test_perl(telefon) is not null;
Time: 4087.000 ms
Time: 3931.000 ms
Time: 4056.000 ms
It appears that Perl regex engine is almost twice faster than the builtin one.
I will follow with more tests, because I have a very intensive usage of regular expressions here... Probably it will be even faster to move proessing out of the database :-(
The use of database as do-everything place is tempting, especially in postgres - but maybe it's not the right tool for the job.
Update: yes, we moved processing out of the database. Flat files processed with regular expressions... Works like a charm :)