如何将值列表中的数据框转换为大数据框，其中每个级别在python中为单列值？

Question

I have a data frame look like below: 我有一个数据框，如下所示：

mydata = [{'col_A' : 'A', 'col_B': [1,2,3]},
      {'col_A' : 'B', 'col_B': [7,8]}]
pd.DataFrame(mydata)


col_A   col_B
    A   [1, 2, 3]
    B   [7, 8]

How to split the value in the list and create a data frame that look like this: 如何在列表中拆分值并创建如下所示的数据框：

col_A   col_B
A   1
A   2
A   3
B   7
B   8

Answer 1

Try this: 尝试这个：

pd.DataFrame([{'col_A':row['col_A'], 'col_B':val} 
               for ind, row in df.iterrows()
               for val in row['col_B']])

You might also be able to do something clever with the apply() function, but off the top of my head, I can think of how. 您也许还可以使用apply（）函数做一些聪明的事情，但是我想起来了。

Answer 2

Here is a solution using apply : 这是使用apply的解决方案：

df['col_B'].apply(pd.Series).set_index(df['col_A']).stack().reset_index(level=0)

  col_A  0
0     A  1
1     A  2
2     A  3
3     B  7
4     B  8

Answer 3

If your DataFrame is big, the fastest is use DataFrame constructor with stack and double reset_index : 如果您的DataFrame大，最快的是将DataFrame constructor与stack和double reset_index ：

print pd.DataFrame(x for x in df['col_B']).set_index(df['col_A']).stack()
                   .reset_index(drop=True, level=1).reset_index().rename(columns={0:'col_B'})

Testing : 测试：

import pandas as pd

mydata = [{'col_A' : 'A', 'col_B': [1,2,3]},
      {'col_A' : 'B', 'col_B': [7,8]}]
df = pd.DataFrame(mydata)

print df


df =  pd.concat([df]*1000).reset_index(drop=True)

print pd.DataFrame(x for x in df['col_B']).set_index(df['col_A']).stack().reset_index(drop=True, level=1).reset_index().rename(columns={0:'col_B'})

print pd.DataFrame(x for x in df['col_B']).set_index(df['col_A']).stack().reset_index().drop('level_1', axis=1).rename(columns={0:'col_B'})

print df['col_B'].apply(pd.Series).set_index(df['col_A']).stack().reset_index().drop('level_1', axis=1).rename(columns={0:'col_B'})

print pd.DataFrame([{'col_A':row['col_A'], 'col_B':val} for ind, row in df.iterrows() for val in row['col_B']])

Timing : 时间：

In [1657]: %timeit pd.DataFrame(x for x in df['col_B']).set_index(df['col_A']).stack().reset_index().drop('level_1', axis=1).rename(columns={0:'col_B'})
100 loops, best of 3: 4.01 ms per loop

In [1658]: %timeit pd.DataFrame(x for x in df['col_B']).set_index(df['col_A']).stack().reset_index(drop=True, level=1).reset_index().rename(columns={0:'col_B'})
100 loops, best of 3: 3.09 ms per loop

In [1659]: %timeit pd.DataFrame([{'col_A':row['col_A'], 'col_B':val} for ind, row in df.iterrows() for val in row['col_B']])
10 loops, best of 3: 153 ms per loop

In [1660]: %timeit df['col_B'].apply(pd.Series).set_index(df['col_A']).stack().reset_index().drop('level_1', axis=1).rename(columns={0:'col_B'})
1 loops, best of 3: 357 ms per loop

如何将值列表中的数据框转换为大数据框，其中每个级别在python中为单列值？

问题描述

3 个解决方案

解决方案1
1 已采纳 2016-02-09 03:38:22

解决方案2
1 2016-02-09 08:00:12

解决方案3
1 2016-02-09 08:43:33

如何将值列表中的数据框转换为大数据框，其中每个级别在python中为单列值？

问题描述

3 个解决方案

解决方案1 1 已采纳 2016-02-09 03:38:22

解决方案2 1 2016-02-09 08:00:12

解决方案3 1 2016-02-09 08:43:33

解决方案1
1 已采纳 2016-02-09 03:38:22

解决方案2
1 2016-02-09 08:00:12

解决方案3
1 2016-02-09 08:43:33