I have some data (below) which I am trying to align.
| 24 | 11 | 506 | -1 | -829.99||
| 24 | 11 | 1910 | 506 | 1 | 829.99|3|
| 12 | 11 | 1933 | 531 | 2 | 7.78 |N|
It seems whenever the 3rd to last value for each row is negative, the row is missing a "|" delimiter. I am trying to use regex to add a vertical bar mid-way through the records to re-align the data like so:
| 24 | 11 | | 506 | -1 | -829.99||
| 24 | 11 | 1910 | 506 | 1 | 829.99 | 3|
| 12 | 11 | 1933 | 531 | 2 | 7.78 | N|
Disregard the white space, I included it to make the data more readable for the purpose of this question.
I know the below expression will locate the correct text group and place an additional "|" after it but can this be modified to put the "|" before the group?
re.sub(r'(\|*\|*\|\|)', r'\1',DATA)
Just getting started with regex so any help is appreciated!
PS - I am using python to do the actual regex substitutions/additions for this data munging task.
There are some problems in your regex. The asterisk *
indicates that the previous element (whether one character or compound) can repeat zero or more times. Therefore, \\|*
would match "" (empty string), "|", "||", etc. and \\|*\\|*\\|\\|
would match two consecutive bars "||" preceded by any number of bars (0 or more) -- therefore, it matches the last two bars, only.
To prove this, with re.sub
, you can surround the back-reference (ie \\1
) with some different characters (I used curly braces ie {\\1}
below).
data="""| 24 | 11 | 506 | -1 | -829.99||
| 24 | 11 | 1910 | 506 | 1 | 829.99|3|
| 12 | 11 | 1933 | 531 | 2 | 7.78 |N|
"""
print("using regex above, with curly braces around captured match:")
print(re.sub(r'(\|*\|*\|\|)', r'{\1}', data))
print("desired output:")
print(re.sub(r'(\|[^|]+\|[^|]+\|[^|]+\|\|)', r'|\1', data))
Output:
using regex above, with curly braces around captured match:
| 24 | 11 | 506 | -1 | -829.99{||}
| 24 | 11 | 1910 | 506 | 1 | 829.99|3|
| 12 | 11 | 1933 | 531 | 2 | 7.78 |N|
desired output:
| 24 | 11 || 506 | -1 | -829.99||
| 24 | 11 | 1910 | 506 | 1 | 829.99|3|
| 12 | 11 | 1933 | 531 | 2 | 7.78 |N|
The solution looks for bars with a positive number of items in between them, which are not bars. [^|]
means anything other than |
will match. Note that in the brackets, that bar does not need escaping. The +
indicates "one or more of the previous element".
Does this work for you ? It gives me the desired output.
re.sub(r'(\|.*\|.*\|.*)(\|.*\|.*\|\|\n)',r'\g<1>'+'|'+r'\g<2>',DATA)
I kept everything before 506 in group 1 and everything after it in group 2 and added a '|' in between.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.