简体   繁体   English

使用两列列表来旋转数据框

[英]pivoting dataframe with two columns of lists

I have a Dataframe like: 我有一个Dataframe,如:

matrix = [(222, ['A','B','C'], [1,2,3]),
         (333, ['A','B','D'], [1,3,5])]

df = pd.DataFrame(matrix, columns=['timestamp', 'variable', 'value'])
timestamp     variable         value   

222           ['A','B','C']    [1,2,3]
333           ['A','B','D']    [1,3,5]

and would like to pivot it so that the timestamp value is kept, the unique values in the variable column become additional columns, and values from value are sorted in the respective columns. 并且希望将其旋转以便保留timestamp值, variable列中的唯一值将成为其他列,并且值中的value将在相应列中进行排序。

The output should look as follows: 输出应如下所示:

timestamp   A    B    C    D 

222         1    2    3    nan
333         1    3    nan  5 

any help would be greatly appreciated! 任何帮助将不胜感激! :) :)

Create dictionary with zip, pass to DataFrame constructor: 使用zip创建字典,传递给DataFrame构造函数:

a = [dict(zip(*x)) for x in zip(df['variable'], df['value'])]
print (a)
[{'A': 1, 'B': 2, 'C': 3}, {'A': 1, 'B': 3, 'D': 5}]

df = df[['timestamp']].join(pd.DataFrame(a, index=df.index))
print (df)
   timestamp  A  B    C    D
0        222  1  2  3.0  NaN
1        333  1  3  NaN  5.0

If many another columns use DataFrame.pop for extract columns: 如果许多其他列使用DataFrame.pop来提取列:

a = [dict(zip(*x)) for x in zip(df.pop('variable'), df.pop('value'))]

df = df.join(pd.DataFrame(a, index=df.index))
print (df)
   timestamp  A  B    C    D
0        222  1  2  3.0  NaN
1        333  1  3  NaN  5.0

Using unnest first , then just pivot 首先使用unfst ,然后pivot

unnesting(df,['variable','value']).pivot(*df.columns)
Out[79]: 
variable     A    B    C    D
timestamp                    
222        1.0  2.0  3.0  NaN
333        1.0  3.0  NaN  5.0

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

You can pass the values & column names to a the pd.Series constructor. 您可以将值和列名称传递给pd.Series构造函数。 This will automatically expand the values in your desired shape. 这将自动扩展所需形状的值。

df.set_index('timestamp').apply(lambda row: pd.Series(row.value, index=row.variable), axis=1)

# outputs:
             A    B    C    D
timestamp
222        1.0  2.0  3.0  NaN
333        1.0  3.0  NaN  5.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM