Pandas 如何通过正则表达式从列中提取到多行？

Question

I have the following data:我有以下数据：

ID ID	content内容	date日期
1 1	2429(sach:MySpezialItem:16.59) 2429（萨赫：MySpezialItem：16.59）	2022-04-12 2022-04-12
2 2	2429(sach:item 13:18.59)(sach:this and that costs:16.59) 2429（萨赫：项目 13：18.59）（萨赫：这个和那个成本：16.59）	2022-06-12 2022-06-12

And I want to achieve the following:我想实现以下目标：

ID ID	number数字	price价格	date日期
1 1	2429 2429		2022-04-12 2022-04-12
1 1		16.59 16.59	2022-04-12 2022-04-12
2 2	2429 2429		2022-06-12 2022-06-12
2 2		18.59 18.59	2022-06-12 2022-06-12
2 2		16.59 16.59	2022-06-12 2022-06-12

What I tried我试过的

df['sach'] = df['content'].str.split(r'\(sach:.*\)').explode('content')
df['content'] = df['content'].str.replace(r'\(sach:.*\)','', regex=True)

Answer 1

You can use a single regex with str.extractall :您可以将单个正则表达式与str.extractall一起使用：

regex = r'(?P<number>\d+)\(|:(?P<price>\d+(?:\.\d+)?)\)'

df = df.join(df.pop('content').str.extractall(regex).droplevel(1))

NB.注意。 if you want a new DataFrame, don't pop :如果你想要一个新的 DataFrame，不要pop ：

df2 = (df.drop(columns='content')
         .join(df['content'].str.extractall(regex).droplevel(1))
       )

output: output：

   ID        date number  price
0   1  2022-04-12   2429    NaN
0   1  2022-04-12    NaN  16.59
1   2  2022-06-12   2429    NaN
1   2  2022-06-12    NaN  18.59
1   2  2022-06-12    NaN  16.59

regex demo正则表达式演示

Pandas 如何通过正则表达式从列中提取到多行？

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-08-23 13:11:02

Pandas 如何通过正则表达式从列中提取到多行？

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-08-23 13:11:02

解决方案1
2 已采纳 2022-08-23 13:11:02