[英]Python pandas: flatten with arrays in column
I have a pandas Data Frame having one column containing arrays. 我有一个熊猫数据框,其中的一列包含数组。 I'd like to "flatten" it by repeating the values of the other columns for each element of the arrays.
我想通过为数组的每个元素重复其他列的值来“展平”它。
I succeed to make it by building a temporary list of values by iterating over every row, but it's using "pure python" and is slow. 我通过遍历每一行来构建临时值列表来成功实现此目标,但是它使用的是“纯python”,而且速度很慢。
Is there a way to do this in pandas/numpy? 有没有办法在pandas / numpy中做到这一点? In other words, I try to improve the flatten function in the example below.
换句话说,我尝试在下面的示例中改进flatten功能。
Thanks a lot. 非常感谢。
toConvert = pd.DataFrame({
'x': [1, 2],
'y': [10, 20],
'z': [(101, 102, 103), (201, 202)]
})
def flatten(df):
tmp = []
def backend(r):
x = r['x']
y = r['y']
zz = r['z']
for z in zz:
tmp.append({'x': x, 'y': y, 'z': z})
df.apply(backend, axis=1)
return pd.DataFrame(tmp)
print(flatten(toConvert).to_string(index=False))
Which gives: 这使:
x y z
1 10 101
1 10 102
1 10 103
2 20 201
2 20 202
You need numpy.repeat
with str.len
for creating columns x
and y
and for z
use this solution : 您需要
numpy.repeat
与str.len
创建列x
和y
和z
使用此解决方案 :
import pandas as pd
import numpy as np
from itertools import chain
df = pd.DataFrame({
"x": np.repeat(toConvert.x.values, toConvert.z.str.len()),
"y": np.repeat(toConvert.y.values, toConvert.z.str.len()),
"z": list(chain.from_iterable(toConvert.z))})
print (df)
x y z
0 1 10 101
1 1 10 102
2 1 10 103
3 2 20 201
4 2 20 202
Here's a NumPy based solution - 这是一个基于NumPy的解决方案-
np.column_stack((toConvert[['x','y']].values.\
repeat(map(len,toConvert.z),axis=0),np.hstack(toConvert.z)))
Sample run - 样品运行-
In [78]: toConvert
Out[78]:
x y z
0 1 10 (101, 102, 103)
1 2 20 (201, 202)
In [79]: np.column_stack((toConvert[['x','y']].values.\
...: repeat(map(len,toConvert.z),axis=0),np.hstack(toConvert.z)))
Out[79]:
array([[ 1, 10, 101],
[ 1, 10, 102],
[ 1, 10, 103],
[ 2, 20, 201],
[ 2, 20, 202]])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.