简体   繁体   中英

PostgreSQL Split Single Column Values to Multiple Columns Based on Pattern

I am using SAS EG against a Postgres database with explicit pass-through SQL. I have a messy table that was given to me that I need to "clean up" for the end-user. One of the columns (varchar255) in the table has multiple date values smashed together. Most of the values just have one date value (ie 10/11/2018) but a few look like this (8/11/201810/6/2019). I need to split these out but there is no delimiter in which to split other than maybe a pattern but even then it's variable length so I don't know how to do it. Basically, the first date in the sequence can be (1-2 number month/1-2 number day/4 number year) followed by they next date. How can I split these and separate them by a delimiter so that I can then perform a split_to_array, count the highest number of separate values in the array and then make the appropriate number of new columns to accommodate the separate dates? Ordinarily, I would provide a code example but in this case I don't even know where to start to parse these out.

Original Value Example:

row1 6/4/2017
row2 8/11/201810/6/2019
row3 10/16/20134/12/201812/18/2019

Desired Value Example:

row1 6/4/2017
row2 8/11/2018, 10/6/2019
row3 10/16/2013, 4/12/2018, 12/18/2019

Thanks in advance!

You should be able to make regular expression that matches your dates. This works in SAS. Note that it adds an extra comma, but you could either remove that, figure out how to make a more complex regex, or just ignore the extra comma.

WANT=prxchange('s/(\d{1,2}\/\d{1,2}\/\d{4})/$1,/',-1,HAVE);

Looks like Postgres has a regex_replace() function, so something like this should work

regexp_replace(HAVE,'(\d{1,2}/\d{1,2}/\d{4})','\1,','g') as WANT

Use the PostgreSQL function regexp_match .

From PostgreSQL docs https://www.postgresql.org/docs/current/functions-matching.html

In the common case where you just want the whole matching substring or NULL for no match, write something like

SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
regexp_match
--------------
barbeque

Might need a couple select expressions, where in the regex pattern has more and more date patterns before the to be captured date pattern group, as you try to pull out more and more datestrings.

regexp_match(mashed, '(\d{1,2}/\d{1,2}/\d{4})`) as datestring1,
regexp_match(mashed,  '\d{1,2}/\d{1,2}/\d{4}(\d{1,2}/\d{1,2}/\d{4})`) as datestring2,
etc … 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM