I have a csv file with a column with strings. Part of the string is in parentheses. I wish to move the part of string in parentheses to a different column and retain the rest of the string as it is.
For instance: I wish to convert:
LC(Carbamidomethyl)RLK
to
LCRLK Carbamidomethyl
If you only have one parentheses group in your string, you can use this regex:
>>> a = "LC(Carbamidomethyl)RLK"
>>> re.sub('(.*)\((.+)\)(.*)', '\g<1>\g<3> \g<2>', a)
'LCRLK Carbamidomethyl'
>>> a = "LCRLK"
>>> re.sub('(.*)\((.+)\)(.*)', '\g<1>\g<3> \g<2>', a)
'LCRLK' # works with no parentheses too
Regex decomposed:
(.*) #! Capture begin of the string
\( # match first parenthesis
(.+) #! Capture content into parentheses
\) # match the second
(.*) #! Capture everything after
---------------
\g<1>\g<3> \g<2> # Write each capture in the correct order
A faster solution, for huge data set is:
begin, end = a.find('('), a.find(')')
if begin != -1 and end != -1:
a = a[:begin] + a[end+1:] + " " + a[begin+1:end]
The process is to get the positions of parentheses (if there's any) and cut the string where we want. Then, we concatenate the result.
It's clear that the string manipulation is the fastest method:
>>> timeit.timeit("re.sub('(.*)\((.+)\)(.*)', '\g<1>\g<3> \g<2>', a)", setup="a = 'LC(Carbadidomethyl)RLK'; import re")
15.214869976043701
>>> timeit.timeit("begin, end = a.find('('), a.find(')') ; b = a[:begin] + a[end+1:] + ' ' + a[begin+1:end]", setup="a = 'LC(Carbamidomethyl)RL'")
1.44008207321167
See comments
>>> a = "DRC(Carbamidomethyl)KPVNTFVHESLADVQAVC(Carbamidomethyl)SQKNVACK"
>>> while True:
... begin, end = a.find('('), a.find(')')
... if begin != -1 and end != -1:
... a = a[:begin] + a[end+1:] + " " + a[begin+1:end]
... else:
... break
...
>>> a
'DRCKPVNTFVHESLADVQAVCSQKNVACK Carbamidomethyl Carbamidomethyl'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.