[英]Split pandas dataframe nested list into new named columns
I have a dataframe (df) of the form: 我有一个形式的数据框(df):
name alias col3
mark david ['3109892828','email@john.com','123 main st']
john twixt ['5468392873','email@twix.com','345 grand st']
What is a concise way to split col3 into new, named columns? 将col3拆分为新的命名列的简洁方法是什么? (perhaps using lambda and apply)
(也许使用lambda并应用)
You could apply a join to the list elements to make a comma separated string and then call the vectorised str.split
with expand=True
to create the new columns: 您可以对列表元素应用
str.split
以创建逗号分隔的字符串,然后使用expand=True
调用矢量化的str.split
来创建新列:
In [12]:
df[['UserID', 'email', 'address']] = df['col3'].apply(','.join).str.split(expand=True)
df
Out[12]:
alias col3 name \
0 david [3109892828, email@john.com, 123 main st] mark
1 twixt [5468392873, email@twix.com, 345 grand st] john
UserID email address
0 3109892828,email@john.com,123 main st
1 5468392873,email@twix.com,345 grand st
A cleaner method would be to apply the pd.Series
ctor which will turn each list into a Series: 一种更干净的方法是应用
pd.Series
ctor,它将把每个列表变成一个Series:
In [15]:
df[['UserID', 'email', 'address']] = df['col3'].apply(pd.Series)
df
Out[15]:
alias col3 name UserID \
0 david [3109892828, email@john.com, 123 main st] mark 3109892828
1 twixt [5468392873, email@twix.com, 345 grand st] john 5468392873
email address
0 email@john.com 123 main st
1 email@twix.com 345 grand st
Here's what I came up with. 这是我想出的。 It includes a bit of scrubbing of the raw file, and a conversion to a dictionary.
它包括一些原始文件的清理,以及到字典的转换。
import pandas as pd
with open('/path/to/file', 'rb') as f:
data = f.readlines()
data = map(lambda x: x.split('}'), data)
data_df = pd.DataFrame(data)
data_dfn = data_df.transpose()
data_new = data_dfn[0].map(lambda x: x.lstrip('[,{)').replace("'","").split(','))
s = pd.DataFrame(data_new)
d = dict(data_new)
D = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.iteritems() ]))
D = D.transpose()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.