[英]split column in pandas dataframe
我想使用逗号分隔符将df
中的ji
列拆分为两列 - 也可以很好地摆脱ji
值周围的括号。 我尝试了各种方法并不断出错。 我想暂时避免使用lambda expression
? 还有其他想法吗?
例子
ji length
0 (75.0, 5.0) 3283.458479
1 (96.0, 5.0) 1431.312901
2 (97.0, 5.0) 1364.592959
3 (247.0, 5.0) 3736.322308
4 (81.0, 7.0) 2655.910005
5 (93.0, 7.0) 1752.293687
6 (242.0, 7.0) 427.844417
7 (248.0, 7.0) 3725.823013
8 (254.0, 7.0) 2318.937332
9 (255.0, 7.0) 2292.673905
10 (242.0, 8.0) 145.811907
11 (254.0, 8.0) 2222.447786
12 (255.0, 8.0) 2196.184360
13 (248.0, 9.0) 441.222866
14 (253.0, 9.0) 853.095032
15 (256.0, 9.0) 2076.942682
16 (91.0, 10.0) 1743.310744
17 (93.0, 10.0) 1256.337420
18 (105.0, 10.0) 523.447658
19 (174.0, 10.0) 1530.617012
20 (176.0, 10.0) 1697.614009
21 (248.0, 10.0) 440.000463
22 (253.0, 10.0) 904.706003
23 (256.0, 10.0) 1991.662604
24 (258.0, 10.0) 1850.995862
25 (172.0, 11.0) 1301.179960
26 (174.0, 11.0) 1436.984094
27 (176.0, 11.0) 1695.954099
28 (179.0, 11.0) 1548.015013
29 (228.0, 11.0) 4640.928585
30 (242.0, 11.0) 169.617203
31 (251.0, 11.0) 784.921333
32 (253.0, 11.0) 983.118859
33 (255.0, 11.0) 1181.474433
34 (256.0, 11.0) 1303.398235
您可以使用以下方式加载上面的示例:
import pandas as pd
from io import StringIO
csv = """\
ji:length
(75.0,5.0):3283.458479
(96.0,5.0):1431.312901
(97.0,5.0):1364.592959
(247.0,5.0):3736.322308
(81.0,7.0):2655.910005
(93.0,7.0):1752.293687
(242.0,7.0):427.844417
(248.0,7.0):3725.823013
(254.0,7.0):2318.937332
(255.0,7.0):2292.673905
(242.0,8.0):145.811907
(254.0,8.0):2222.447786
(255.0,8.0):2196.184360
(248.0,9.0):441.222866
(253.0,9.0):853.095032
(256.0,9.0):2076.942682
(91.0,10.0):1743.310744
(93.0,10.0):1256.337420
(105.0,10.0):523.447658
(174.0,10.0):1530.617012
(176.0,10.0):1697.614009
(248.0,10.0):440.000463
(253.0,10.0):904.706003
(256.0,10.0):1991.662604
(258.0,10.0):1850.995862
(172.0,11.0):1301.179960
(174.0,11.0):1436.984094
(176.0,11.0):1695.954099
(179.0,11.0):1548.015013
(228.0,11.0):4640.928585
(242.0,11.0):169.617203
(251.0,11.0):784.921333
(253.0,11.0):983.118859
(255.0,11.0):1181.474433
(256.0,11.0):1303.398235
"""
df = pd.read_csv(StringIO(csv), sep=":")
解决方案,如果列ji
中的字符串 - 用于提取、 strip
和split
的pop
列,对于DataFrame
使用expand=True
:
print (type(df.loc[0, 'ji']))
<class 'str'>
df[['a','b']] = df.pop('ji').str.strip('()').str.split(', ', expand=True).astype(float)
或者如果没有缺失值并且性能很重要,则使用list comprehension
:
L = [x.strip('()').split(', ') for x in df.pop('ji')]
df[['a','b']] = pd.DataFrame(L, index=df.index).astype(float)
print (df)
length a b
0 3283.458479 75.0 5.0
1 1431.312901 96.0 5.0
2 1364.592959 97.0 5.0
3 3736.322308 247.0 5.0
4 2655.910005 81.0 7.0
5 1752.293687 93.0 7.0
6 427.844417 242.0 7.0
7 3725.823013 248.0 7.0
If tuples 然后创建嵌套的元组列表并传递给DataFrame
构造函数:
print (type(df.loc[0, 'ji']))
<class 'tuple'>
df[['a','b']] = pd.DataFrame(df.pop('ji').values.tolist(), index=df.index)
编辑:
如果'ji'
包含元组,那就简单多了:
df[['j', 'i']] = df.pop('ji').apply(pd.Series)
鉴于
>>> df
ji length
0 (75.0,5.0) 3283.458479
1 (96.0,5.0) 1431.312901
2 (97.0,5.0) 1364.592959
3 (247.0,5.0) 3736.322308
4 (81.0,7.0) 2655.910005
>>>
>>> df.dtypes
ji object
length float64
dtype: object
即当'ji'
列包含字符串时,我会在这里使用ast.literal_eval
。
>>> from ast import literal_eval
>>> def split_to_df(string):
...: return pd.Series(literal_eval(string))
>>>
>>> df[['val1', 'val2']] = df.pop('ji').apply(split_to_df)
>>> df
length val1 val2
0 3283.458479 75.0 5.0
1 1431.312901 96.0 5.0
2 1364.592959 97.0 5.0
3 3736.322308 247.0 5.0
4 2655.910005 81.0 7.0
(使用pop
的灵感来自 jezrael 的回答。)
你需要:
df['a'] = df['ji'].apply(lambda x: x[0])
df['b'] = df['ji'].apply(lambda x: x[1])
df.drop(['ji'], axis=1, inplace=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.