简体   繁体   中英

pandas/regex: Remove the string after the hyphen or parenthesis character (including) carry string after the comma in pandas dataframe

I have a dataframe contains one column which has multiple strings separated by the comma, but in this string, I want to remove all matter after hyphen (including hyphen), main point is after in some cases hyphen is not there but directed parenthesis is there so I also want to remove that as well and carry all the after the comma how can I do it? You can see this case in last row.

dd = pd.DataFrame()
dd['sin'] = ['U147(BCM), U35(BCM)','P01-00(ECM), P02-00(ECM)', 'P3-00(ECM), P032-00(ECM)','P034-00(ECM)', 'P23F5(PCM), P04-00(ECM)']

Expected output

dd['sin']
# output 
U147 U35
P01 P02
P3 P032
P034
P23F5 P04

Want to carry only string before the hyphen or parenthesis or any special character.

The following code seems to reproduce your desired result:

dd['sin'] = dd['sin'].str.split(", ")
dd = dd.explode('sin').reset_index()
dd['sin'] = dd['sin'].str.replace('\W.*', '', regex=True)

Which gives dd['sin'] as:

0     U147
1      U35
2      P01
3      P02
4       P3
5     P032
6     P034
7    P23F5
8      P04
Name: sin, dtype: object

The call of .reset_index() in the second line is optional depending on whether you want to preserve which row that piece of the string came from.

You can use the following regex :

r"-\d{2}|\([EBP]CM\)|\s"


Here is the code:

sin = ['U147(BCM), U35(BCM)','P01-00(ECM), P02-00(ECM)', 'P3-00(ECM), P032-00(ECM)','P034-00(ECM)', 'P23F5(PCM), P04-00(ECM)']

dd = pd.DataFrame()
dd['sin'] = sin
dd['sin'] = dd['sin'].str.replace(r'-\d{2}|\([EBP]CM\)|\s', '', regex=True)
print(dd)

OUTPUT:

         sin
0   U147,U35
1    P01,P02
2    P3,P032
3       P034
4  P23F5,P04



EDIT

Or use this line to remove the comma:

dd['sin'] = dd['sin'].str.replace(r'-\d{2}|\([EBP]CM\)|\s', '', regex=True).str.replace(',',' ')

OUTPUT:

         sin
0   U147 U35
1    P01 P02
2    P3 P032
3       P034
4  P23F5 P04

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM