Regex + Python to remove specific trailing and ending characters from value in tab delimited file

Question

It's been years (and years) since I've done any regex, so turning to experts on here since it's likely a trivial exercise :)

I have a tab delimited file and on each line I have a certain fields that have values such as:

foo
bar
b"foo's bar"
b'bar foo'
b'carbar'

(A complete line in the file might be something like:

123\\t b'bar foo' \\tabc\\t123\\r\\n

I want to get rid of all the leading b', b" and trailing ", ' from that field on every line. So given the example line above, after running the regex, I'd get:

123\\t bar foo \\tabc\\t123\\r\\n

Bonus points if you can give me the python blurb to run this over the file.

Answer 1

(^|\\t)b[\\"'] should match the leadings, and for the trailing:

\\"' should do it

In Python, you do:

import re
r1 = re.compile("(^|\t)b[\"']")
r2 = re.compile("[\"'](\t|$)")

then just use

r1.sub("\\1", yourString)
r2.sub("\\1", yourString)

Answer 2

for each line you can use

re.sub(r'''(?<![^\t\n])\W*b(["'])(.*)\1\W*(?![^\t\n])''', r'\2', line)

and for bonus points:

import re

pattern = re.compile(r'''(?<![^\t\n])\W*b(["'])(.*?)\1\W*?(?![^\t\n])''')
with open('outfile', 'w') as outfile:
    for line in open('infile'):
        outfile.write(pattern.sub(r'\2', line))

Answer 3

>>> "b\"foo's bar\"".replace('b"',"").replace("b'","").rstrip("\"'")
"foo's bar"
>>> "b'bar foo'".replace('b"',"").replace("b'","").rstrip("\"'")
'bar foo'
>>>

Regex + Python to remove specific trailing and ending characters from value in tab delimited file

Question

3 answers

solution1
1 2010-03-05 22:57:53

solution2
1 ACCPTED 2010-03-05 23:05:37

solution3
0 2010-03-06 00:28:14

Regex + Python to remove specific trailing and ending characters from value in tab delimited file

Question

3 answers

solution1 1 2010-03-05 22:57:53

solution2 1 ACCPTED 2010-03-05 23:05:37

solution3 0 2010-03-06 00:28:14

solution1
1 2010-03-05 22:57:53

solution2
1 ACCPTED 2010-03-05 23:05:37

solution3
0 2010-03-06 00:28:14