[英]Pandas split into columns with regex
I have something like this I have a column in DataFrame like this我有这样的事情我在 DataFrame 中有一个这样的列
Column1
message1 message2 notmessage
message1 message2
message1 message2 message3 notmessage
I want a data frame, like:我想要一个数据框,例如:
Column1 | A | b | c
message1 message2 notmessage | message1 | message2 | null
message1 message2 | message1 | message2 | null
message1 message2 message3 notmessage | message1 | message2 | message3
There is no problem getting first value from the Column1 using使用 Column1 从 Column1 获取第一个值没有问题
df['A'] = df['Column1'].str.extract('(my_regex)',expand=True)
But how Can I obtain 3 new Columns?但是我怎样才能获得 3 个新列? I was trying using this: https://stackoverflow.com/a/39358924 it worked for me when I was using split method in other files, but it doesnt work for me with regex splitting like below:
我正在尝试使用这个: https://stackoverflow.com/a/39358924当我在其他文件中使用拆分方法时它对我有用,但它对我来说不适用于正则表达式拆分,如下所示:
df.join(df['Column1'].str.extract('(my_regex)',expand=True).rename(columns={0:'A', 1:'B', 2:'C'}))
Please help:)请帮忙:)
I believe you need Series.str.extractall
with select first column 0
and reshape by Series.unstack
:我相信你需要
Series.str.extractall
和 select 第一列0
并通过Series.unstack
重塑:
d = {0:'A', 1:'B', 2:'C'}
df = df.join(df['Column1'].str.extractall('(my_regex)')[0].unstack().rename(columns=d))
Get all 3 columns:获取所有 3 列:
import pandas as pd
df = pd.DataFrame(["message1 message2 notmessage",
"message1 message2",
"message1 message2 message3 notmessage"
],
columns=["Column1"]
)
df['A'] = df['Column1'].str.extract('(^\w+)',
expand=True
)
df['b'] = df['Column1'].str.extract('(?<=\s)(\w+).*',
expand=True
)
df['c'] = df['Column1'].str.extract('(\w+3).*',
expand=True
)
print(df)
Result:结果:
Column1 A b c
0 message1 message2 notmessage message1 message2 NaN
1 message1 message2 message1 message2 NaN
2 message1 message2 message3 notmessage message1 message2 message3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.