[英]How to split a column into two or multiple columns columns in python using either str.split or regex?
How to split this column into 2 or more columns.如何将此列拆分为 2 列或更多列。 I've used
str.split('/',2)
to split but it just removed the '/' and did not split into 2 columns.我使用
str.split('/',2)
进行拆分,但它只是删除了 '/' 并没有拆分为 2 列。
X ![]() |
---|
East Bound: 6900 / West Bound: 7700![]() |
East Bound: 7800 / West Bound: 8700![]() |
North Bound: 5000 / South Bound: 4900![]() |
North Bound: 7000 / South Bound: 9000![]() |
East Bound: 4900 / West Bound: 9700![]() |
What I want is:我想要的是:
First Direction![]() |
Second direction![]() |
---|---|
East Bound: 6900![]() |
West Bound: 7700![]() |
East Bound: 7800![]() |
West Bound: 8700![]() |
North Bound: 5000![]() |
South Bound: 4900![]() |
North Bound: 7000![]() |
South Bound: 9000![]() |
East Bound: 4900![]() |
West Bound: 9700![]() |
Even better is if I can have four column headers for the four cardinal directions and filling it with the values from the first table such as:更好的是,如果我可以为四个基本方向设置四个列标题并用第一个表中的值填充它,例如:
North![]() |
South![]() |
East![]() |
West![]() |
---|---|---|---|
0 ![]() |
0 ![]() |
6900 ![]() |
7700 ![]() |
0 ![]() |
0 ![]() |
7800 ![]() |
8700 ![]() |
5000 ![]() |
4900 ![]() |
0 ![]() |
0 ![]() |
7000 ![]() |
4900 ![]() |
0 ![]() |
0 ![]() |
0 ![]() |
0 ![]() |
4900 ![]() |
9700 ![]() |
If I have read on the documentation correctly, I believe this can be done with regex patterns but is there an efficient way to do this concisely?如果我正确阅读了文档,我相信这可以通过正则表达式模式来完成,但是有没有一种有效的方法来简洁地做到这一点?
Here is the original df for use: df = ['East Bound: 6900 / West Bound: 7700', 'East Bound: 7800 / West Bound: 8700', 'North Bound: 5000 / South Bound: 4900', 'North Bound: 7000 / South Bound: 9000', 'East Bound: 4900 / West Bound: 9700']
这是使用的原始df:
df = ['East Bound: 6900 / West Bound: 7700', 'East Bound: 7800 / West Bound: 8700', 'North Bound: 5000 / South Bound: 4900', 'North Bound: 7000 / South Bound: 9000', 'East Bound: 4900 / West Bound: 9700']
For Q1, you can try .str.split
对于 Q1,您可以尝试
.str.split
df[['First Direction', 'Second direction']] = df['X'].str.split(' / ', expand=True)
print(df)
X First Direction Second direction
0 East Bound: 6900 / West Bound: 7700 East Bound: 6900 West Bound: 7700
1 East Bound: 7800 / West Bound: 8700 East Bound: 7800 West Bound: 8700
2 North Bound: 5000 / South Bound: 4900 North Bound: 5000 South Bound: 4900
3 North Bound: 7000 / South Bound: 9000 North Bound: 7000 South Bound: 9000
4 East Bound: 4900 / West Bound: 9700 East Bound: 4900 West Bound: 9700
For Q2, you can try to convert X
column to dictionary then explode the column into separate columns对于 Q2,您可以尝试将
X
列转换为字典,然后将该列分解为单独的列
out = df['X'].apply(lambda x: dict([direction.split(':') for direction in x.split(' / ')])).apply(pd.Series)
print(out)
East Bound West Bound North Bound South Bound
0 6900 7700 NaN NaN
1 7800 8700 NaN NaN
2 NaN NaN 5000 4900
3 NaN NaN 7000 9000
4 4900 9700 NaN NaN
My approach would be to use Series.str.extractall
with a specific pattern to get the direction and the amount, convert the amount to a suitable type (I've just gone for integer here), then pivot_table filling in with zeros where appropriate, eg:我的方法是使用具有特定模式的
Series.str.extractall
来获取方向和数量,将数量转换为合适的类型(我刚刚在这里使用整数),然后在适当的地方用零填充 pivot_table,例如:
out = (
df['X'].str.extractall(r'(?P<bound>North|South|West|East) (?:Bound): (?P<n>\d+)')
.astype({'n': int})
.pivot_table(index=pd.Grouper(level=0), columns='bound', values='n', fill_value=0)
)
This'll give you:这会给你:
bound East North South West
0 6900 0 0 7700
1 7800 0 0 8700
2 0 5000 4900 0
3 0 7000 9000 0
4 4900 0 0 9700
This retains your original DF ID's... so you can merge/join back to your original DF at some point.这会保留您的原始 DF ID...,因此您可以在某个时候合并/加入原始 DF。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.