Python Pandas：使用列中的数组进行展平

Question

I have a pandas Data Frame having one column containing arrays. 我有一个熊猫数据框，其中的一列包含数组。 I'd like to "flatten" it by repeating the values of the other columns for each element of the arrays. 我想通过为数组的每个元素重复其他列的值来“展平”它。

I succeed to make it by building a temporary list of values by iterating over every row, but it's using "pure python" and is slow. 我通过遍历每一行来构建临时值列表来成功实现此目标，但是它使用的是“纯python”，而且速度很慢。

Is there a way to do this in pandas/numpy? 有没有办法在pandas / numpy中做到这一点？ In other words, I try to improve the flatten function in the example below. 换句话说，我尝试在下面的示例中改进flatten功能。

Thanks a lot. 非常感谢。

toConvert = pd.DataFrame({
    'x': [1, 2],
    'y': [10, 20],
    'z': [(101, 102, 103), (201, 202)]
})

def flatten(df):
    tmp = []
    def backend(r):
        x = r['x']
        y = r['y']
        zz = r['z']
        for z in zz:
            tmp.append({'x': x, 'y': y, 'z': z})
    df.apply(backend, axis=1)
    return pd.DataFrame(tmp)

print(flatten(toConvert).to_string(index=False))

Which gives: 这使：

Answer 1

You need numpy.repeat with str.len for creating columns x and y and for z use this solution : 您需要numpy.repeat与str.len创建列x和y和z使用此解决方案：

import pandas as pd
import numpy as np
from  itertools import chain

df = pd.DataFrame({
        "x": np.repeat(toConvert.x.values, toConvert.z.str.len()),
        "y": np.repeat(toConvert.y.values, toConvert.z.str.len()),
        "z": list(chain.from_iterable(toConvert.z))})

print (df)          
   x   y    z
0  1  10  101
1  1  10  102
2  1  10  103
3  2  20  201
4  2  20  202

Answer 2

Here's a NumPy based solution - 这是一个基于NumPy的解决方案-

np.column_stack((toConvert[['x','y']].values.\
     repeat(map(len,toConvert.z),axis=0),np.hstack(toConvert.z)))

Sample run - 样品运行-

In [78]: toConvert
Out[78]: 
   x   y                z
0  1  10  (101, 102, 103)
1  2  20       (201, 202)

In [79]: np.column_stack((toConvert[['x','y']].values.\
    ...:      repeat(map(len,toConvert.z),axis=0),np.hstack(toConvert.z)))
Out[79]: 
array([[  1,  10, 101],
       [  1,  10, 102],
       [  1,  10, 103],
       [  2,  20, 201],
       [  2,  20, 202]])

Python Pandas：使用列中的数组进行展平

问题描述

2 个解决方案

解决方案1
1 2016-10-27 08:59:09

解决方案2
1 已采纳 2016-10-27 09:04:21

Python Pandas：使用列中的数组进行展平

问题描述

2 个解决方案

解决方案1 1 2016-10-27 08:59:09

解决方案2 1 已采纳 2016-10-27 09:04:21

解决方案1
1 2016-10-27 08:59:09

解决方案2
1 已采纳 2016-10-27 09:04:21