I am using SAS EG against a Postgres database with explicit pass-through SQL. I have a messy table that was given to me that I need to "clean up" for the end-user. One of the columns (varchar255) in the table has multiple date values smashed together. Most of the values just have one date value (ie 10/11/2018) but a few look like this (8/11/201810/6/2019). I need to split these out but there is no delimiter in which to split other than maybe a pattern but even then it's variable length so I don't know how to do it. Basically, the first date in the sequence can be (1-2 number month/1-2 number day/4 number year) followed by they next date. How can I split these and separate them by a delimiter so that I can then perform a split_to_array, count the highest number of separate values in the array and then make the appropriate number of new columns to accommodate the separate dates? Ordinarily, I would provide a code example but in this case I don't even know where to start to parse these out.
Original Value Example:
row1 6/4/2017
row2 8/11/201810/6/2019
row3 10/16/20134/12/201812/18/2019
Desired Value Example:
row1 6/4/2017
row2 8/11/2018, 10/6/2019
row3 10/16/2013, 4/12/2018, 12/18/2019
Thanks in advance!
You should be able to make regular expression that matches your dates. This works in SAS. Note that it adds an extra comma, but you could either remove that, figure out how to make a more complex regex, or just ignore the extra comma.
WANT=prxchange('s/(\d{1,2}\/\d{1,2}\/\d{4})/$1,/',-1,HAVE);
Looks like Postgres has a regex_replace() function, so something like this should work
regexp_replace(HAVE,'(\d{1,2}/\d{1,2}/\d{4})','\1,','g') as WANT
Use the PostgreSQL function regexp_match
.
From PostgreSQL docs https://www.postgresql.org/docs/current/functions-matching.html
In the common case where you just want the whole matching substring or NULL for no match, write something like
SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1];
regexp_match
--------------
barbeque
Might need a couple select expressions, where in the regex pattern has more and more date patterns before the to be captured date pattern group, as you try to pull out more and more datestrings.
regexp_match(mashed, '(\d{1,2}/\d{1,2}/\d{4})`) as datestring1,
regexp_match(mashed, '\d{1,2}/\d{1,2}/\d{4}(\d{1,2}/\d{1,2}/\d{4})`) as datestring2,
etc …
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.