Extracting part of string in parenthesis using python

Question

I have a csv file with a column with strings. Part of the string is in parentheses. I wish to move the part of string in parentheses to a different column and retain the rest of the string as it is.

For instance: I wish to convert:

LC(Carbamidomethyl)RLK

to

LCRLK Carbamidomethyl

Answer 1

Regex solution

If you only have one parentheses group in your string, you can use this regex:

>>> a = "LC(Carbamidomethyl)RLK"
>>> re.sub('(.*)\((.+)\)(.*)', '\g<1>\g<3> \g<2>', a)
'LCRLK Carbamidomethyl'
>>> a = "LCRLK"  
>>> re.sub('(.*)\((.+)\)(.*)', '\g<1>\g<3> \g<2>', a)
'LCRLK'  # works with no parentheses too

Regex decomposed:

(.*)       #! Capture begin of the string
\(         # match first parenthesis
  (.+)     #! Capture content into parentheses
\)         # match the second
(.*)       #! Capture everything after

---------------
\g<1>\g<3> \g<2>  # Write each capture in the correct order

String manipulation solution

A faster solution, for huge data set is:

begin, end  = a.find('('), a.find(')')
if begin != -1 and end != -1: 
    a = a[:begin] + a[end+1:] + " " + a[begin+1:end]

The process is to get the positions of parentheses (if there's any) and cut the string where we want. Then, we concatenate the result.

Performance of each method

It's clear that the string manipulation is the fastest method:

>>> timeit.timeit("re.sub('(.*)\((.+)\)(.*)', '\g<1>\g<3> \g<2>', a)", setup="a = 'LC(Carbadidomethyl)RLK'; import re")
15.214869976043701


>>> timeit.timeit("begin, end  = a.find('('), a.find(')') ; b = a[:begin] + a[end+1:] + ' ' + a[begin+1:end]", setup="a = 'LC(Carbamidomethyl)RL'")
1.44008207321167

Multi parentheses set

See comments

>>> a = "DRC(Carbamidomethyl)KPVNTFVHESLADVQAVC(Carbamidomethyl)SQKNVACK"
>>> while True:
...     begin, end  = a.find('('), a.find(')')
...     if begin != -1 and end != -1:
...         a = a[:begin] + a[end+1:] + " " + a[begin+1:end]
...     else:
...         break
...
>>> a
'DRCKPVNTFVHESLADVQAVCSQKNVACK Carbamidomethyl Carbamidomethyl'

Extracting part of string in parenthesis using python

Question

1 answers

solution1
2 ACCPTED 2014-01-28 20:37:43

Regex solution

String manipulation solution

Performance of each method

Multi parentheses set

Extracting part of string in parenthesis using python

Question

1 answers

solution1 2 ACCPTED 2014-01-28 20:37:43

Regex solution

String manipulation solution

Performance of each method

Multi parentheses set

solution1
2 ACCPTED 2014-01-28 20:37:43