简体   繁体   中英

get partial string contained in “()” from a pandas DataFrame

I have a df:

      MinMaleTA

28    888(G2M)
29    888(AAM)
30    888(G2M)
31    888(G2M)
32    888(AAM)
33    888(G2M)
34    888(G2M)
35    888(AAM)
36    888(G2M)
37    888(G2M)
38    888(G2M)
39    888(G2M)
40    888(AAM)
41    888(G2M)
42    888(G2M)
43    888(G2M)

sometimes more than 3 digit string inside '()',like:

 28 888(G2MPTM) 

How can I the string between '()' in MinMaleTA.

something like:

result = df['MinMaleTA'].startwith"(" and endwith")"

the output for the first 2 rows should be:

G2M AAM

Use str.extract method with a regex:

>>> df['MinMaleTA'].str.extract(r'\((.*)\)')
      0
28  G2M
29  AAM
30  G2M
31  G2M
32  AAM
33  G2M
34  G2M
35  AAM
36  G2M
37  G2M
38  G2M
39  G2M
40  AAM
41  G2M
42  G2M
43  G2M

\\( and \\) match the character ( and )

(.*) is the capturing group that match any number of characters.

如果字符串始终具有相同的构造 - 并且在( )具有相同的大小

result = df['MinMaleTA'].str[-4:-1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM