[英]Pandas how can I extract by regex from column into multiple rows?
I have the following data:我有以下数据:
ID ![]() |
content![]() |
date![]() |
---|---|---|
1 ![]() |
2429(sach:MySpezialItem:16.59) ![]() |
2022-04-12 ![]() |
2 ![]() |
2429(sach:item 13:18.59)(sach:this and that costs:16.59) ![]() |
2022-06-12 ![]() |
And I want to achieve the following:我想实现以下目标:
ID ![]() |
number![]() |
price![]() |
date![]() |
---|---|---|---|
1 ![]() |
2429 ![]() |
2022-04-12 ![]() |
|
1 ![]() |
16.59 ![]() |
2022-04-12 ![]() |
|
2 ![]() |
2429 ![]() |
2022-06-12 ![]() |
|
2 ![]() |
18.59 ![]() |
2022-06-12 ![]() |
|
2 ![]() |
16.59 ![]() |
2022-06-12 ![]() |
What I tried我试过的
df['sach'] = df['content'].str.split(r'\(sach:.*\)').explode('content')
df['content'] = df['content'].str.replace(r'\(sach:.*\)','', regex=True)
You can use a single regex with str.extractall
:您可以将单个正则表达式与
str.extractall
一起使用:
regex = r'(?P<number>\d+)\(|:(?P<price>\d+(?:\.\d+)?)\)'
df = df.join(df.pop('content').str.extractall(regex).droplevel(1))
NB.注意。 if you want a new DataFrame, don't
pop
:如果你想要一个新的 DataFrame,不要
pop
:
df2 = (df.drop(columns='content')
.join(df['content'].str.extractall(regex).droplevel(1))
)
output: output:
ID date number price
0 1 2022-04-12 2429 NaN
0 1 2022-04-12 NaN 16.59
1 2 2022-06-12 2429 NaN
1 2 2022-06-12 NaN 18.59
1 2 2022-06-12 NaN 16.59
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.