It's been years (and years) since I've done any regex, so turning to experts on here since it's likely a trivial exercise :)
I have a tab delimited file and on each line I have a certain fields that have values such as:
(A complete line in the file might be something like:
123\\t b'bar foo' \\tabc\\t123\\r\\n
I want to get rid of all the leading b', b" and trailing ", ' from that field on every line. So given the example line above, after running the regex, I'd get:
123\\t bar foo \\tabc\\t123\\r\\n
Bonus points if you can give me the python blurb to run this over the file.
(^|\\t)b[\\"'] should match the leadings, and for the trailing:
\\"' should do it
In Python, you do:
import re
r1 = re.compile("(^|\t)b[\"']")
r2 = re.compile("[\"'](\t|$)")
then just use
r1.sub("\\1", yourString)
r2.sub("\\1", yourString)
for each line you can use
re.sub(r'''(?<![^\t\n])\W*b(["'])(.*)\1\W*(?![^\t\n])''', r'\2', line)
and for bonus points:
import re
pattern = re.compile(r'''(?<![^\t\n])\W*b(["'])(.*?)\1\W*?(?![^\t\n])''')
with open('outfile', 'w') as outfile:
for line in open('infile'):
outfile.write(pattern.sub(r'\2', line))
>>> "b\"foo's bar\"".replace('b"',"").replace("b'","").rstrip("\"'")
"foo's bar"
>>> "b'bar foo'".replace('b"',"").replace("b'","").rstrip("\"'")
'bar foo'
>>>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.