Pandas how can I extract by regex from column into multiple rows?

Question

I have the following data:

ID	content	date
1	2429(sach:MySpezialItem:16.59)	2022-04-12
2	2429(sach:item 13:18.59)(sach:this and that costs:16.59)	2022-06-12

And I want to achieve the following:

ID	number	price	date
1	2429		2022-04-12
1		16.59	2022-04-12
2	2429		2022-06-12
2		18.59	2022-06-12
2		16.59	2022-06-12

What I tried

df['sach'] = df['content'].str.split(r'\(sach:.*\)').explode('content')
df['content'] = df['content'].str.replace(r'\(sach:.*\)','', regex=True)

Answer 1

You can use a single regex with str.extractall :

regex = r'(?P<number>\d+)\(|:(?P<price>\d+(?:\.\d+)?)\)'

df = df.join(df.pop('content').str.extractall(regex).droplevel(1))

NB. if you want a new DataFrame, don't pop :

df2 = (df.drop(columns='content')
         .join(df['content'].str.extractall(regex).droplevel(1))
       )

output:

   ID        date number  price
0   1  2022-04-12   2429    NaN
0   1  2022-04-12    NaN  16.59
1   2  2022-06-12   2429    NaN
1   2  2022-06-12    NaN  18.59
1   2  2022-06-12    NaN  16.59

regex demo

Pandas how can I extract by regex from column into multiple rows?

Question

1 answers

solution1
2 ACCPTED 2022-08-23 13:11:02

Pandas how can I extract by regex from column into multiple rows?

Question

1 answers

solution1 2 ACCPTED 2022-08-23 13:11:02

solution1
2 ACCPTED 2022-08-23 13:11:02