将Pandas DataFrame嵌套列表拆分为新的命名列

Question

I have a dataframe (df) of the form: 我有一个形式的数据框（df）：

name alias col3
mark david ['3109892828','email@john.com','123 main st']
john twixt ['5468392873','email@twix.com','345 grand st']

What is a concise way to split col3 into new, named columns? 将col3拆分为新的命名列的简洁方法是什么？ (perhaps using lambda and apply) （也许使用lambda并应用）

Answer 1

You could apply a join to the list elements to make a comma separated string and then call the vectorised str.split with expand=True to create the new columns: 您可以对列表元素应用str.split以创建逗号分隔的字符串，然后使用expand=True调用矢量化的str.split来创建新列：

In [12]:
df[['UserID', 'email', 'address']] = df['col3'].apply(','.join).str.split(expand=True)
df

Out[12]:
   alias                                        col3  name  \
0  david   [3109892828, email@john.com, 123 main st]  mark   
1  twixt  [5468392873, email@twix.com, 345 grand st]  john   

                          UserID  email address  
0  3109892828,email@john.com,123   main      st  
1  5468392873,email@twix.com,345  grand      st

A cleaner method would be to apply the pd.Series ctor which will turn each list into a Series: 一种更干净的方法是应用pd.Series ctor，它将把每个列表变成一个Series：

In [15]:
df[['UserID', 'email', 'address']] = df['col3'].apply(pd.Series)
df

Out[15]:
   alias                                        col3  name      UserID  \
0  david   [3109892828, email@john.com, 123 main st]  mark  3109892828   
1  twixt  [5468392873, email@twix.com, 345 grand st]  john  5468392873   

            email       address  
0  email@john.com   123 main st  
1  email@twix.com  345 grand st

Answer 2

Here's what I came up with. 这是我想出的。 It includes a bit of scrubbing of the raw file, and a conversion to a dictionary. 它包括一些原始文件的清理，以及到字典的转换。

import pandas as pd
with open('/path/to/file', 'rb') as f:
    data = f.readlines()

data = map(lambda x: x.split('}'), data)
data_df = pd.DataFrame(data)
data_dfn = data_df.transpose()
data_new = data_dfn[0].map(lambda x: x.lstrip('[,{)').replace("'","").split(','))

s = pd.DataFrame(data_new)
d = dict(data_new)
D = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.iteritems() ]))
D = D.transpose()

将Pandas DataFrame嵌套列表拆分为新的命名列

问题描述

2 个解决方案

解决方案1
2 2015-09-18 15:01:36

解决方案2
0 2015-09-18 21:42:03

将Pandas DataFrame嵌套列表拆分为新的命名列

问题描述

2 个解决方案

解决方案1 2 2015-09-18 15:01:36

解决方案2 0 2015-09-18 21:42:03

解决方案1
2 2015-09-18 15:01:36

解决方案2
0 2015-09-18 21:42:03