I have a bunch of data that contains a phone number and a birthday as well as other data.
{1997-06-28,07742367858}
{07791100873,1996-07-14}
{30/01/1997,07974335488}
{1997-04-04,07701003703}
{1996-03-11,07480227283}
{1998-06-20,07713817233}
{1996-09-13,07435148920}
{"21 03 2000",07548542539,1st}
{1996-03-09,07539248008}
{07484642432,1996-03-01}
I am trying to extract the phone number from this, however unsure on how to get this out when the data is not always in the same order.
I would expect to one column that return a phone number, the next which returned a birthday then another which return any arbitrary value in the 3rd column slot.
You can try to sort parts of each string by the number of digits they contain. This can be done with the expression:
select length(regexp_replace('1997-06-28', 'D', '', 'g'))
length
--------
8
(1 row)
The query removes curly brackets from strings, splits them by comma, sorts elements by the number of digits and aggregates back to arrays:
with my_data(str) as (
values
('{1997-06-28,07742367858}'),
('{07791100873,1996-07-14}'),
('{30/01/1997,07974335488}'),
('{1997-04-04,07701003703}'),
('{1996-03-11,07480227283}'),
('{1998-06-20,07713817233}'),
('{1996-09-13,07435148920}'),
('{"21 03 2000",07548542539,1st}'),
('{1996-03-09,07539248008}'),
('{07484642432,1996-03-01}')
)
select id, array_agg(elem order by length(regexp_replace(elem, 'D', '', 'g')) desc)
from (
select id, trim(unnest(string_to_array(str, ',')), '"') as elem
from (
select trim(str, '{}') as str, row_number() over () as id
from my_data
) s
) s
group by id
Result:
id | array_agg
----+--------------------------------
1 | {07742367858,1997-06-28}
2 | {07791100873,1996-07-14}
3 | {07974335488,30/01/1997}
4 | {07701003703,1997-04-04}
5 | {07480227283,1996-03-11}
6 | {07713817233,1998-06-20}
7 | {07435148920,1996-09-13}
8 | {07548542539,"21 03 2000",1st}
9 | {07539248008,1996-03-09}
10 | {07484642432,1996-03-01}
(10 rows)
See also this answer Looking for solution to swap position of date format DMY to YMD if you want to normalize dates. You should modify the function:
create or replace function iso_date(text)
returns date language sql immutable as $$
select case
when $1 like '__/__/____' then to_date($1, 'DD/MM/YYYY')
when $1 like '____/__/__' then to_date($1, 'YYYY/MM/DD')
when $1 like '____-__-__' then to_date($1, 'YYYY-MM-DD')
when trim($1, '"') like '__ __ ____' then to_date(trim($1, '"'), 'DD MM YYYY')
end
$$;
and use it:
select id, a[1] as phone, iso_date(a[2]) as birthday, a[3] as comment
from (
select id, array_agg(elem order by length(regexp_replace(elem, 'D', '', 'g')) desc) as a
from (
select id, trim(unnest(string_to_array(str, ',')), '"') as elem
from (
select trim(str, '{}') as str, row_number() over () as id
from my_data
) s
) s
group by id
) s
id | phone | birthday | comment
----+-------------+------------+---------
1 | 07742367858 | 1997-06-28 |
2 | 07791100873 | 1996-07-14 |
3 | 07974335488 | 1997-01-30 |
4 | 07701003703 | 1997-04-04 |
5 | 07480227283 | 1996-03-11 |
6 | 07713817233 | 1998-06-20 |
7 | 07435148920 | 1996-09-13 |
8 | 07548542539 | 2000-03-21 | 1st
9 | 07539248008 | 1996-03-09 |
10 | 07484642432 | 1996-03-01 |
(10 rows)