简体   繁体   English

Python-熊猫,将长列拆分为多列

[英]Python - Pandas, split long column to multiple columns

Given the following DataFrame: 给定以下DataFrame:

>>> pd.DataFrame(data=[['a',1],['a',2],['b',3],['b',4],['c',5],['c',6],['d',7],['d',8],['d',9],['e',10]],columns=['key','value'])
  key  value
0   a      1
1   a      2
2   b      3
3   b      4
4   c      5
5   c      6
6   d      7
7   d      8
8   d      9
9   e     10

I'm looking for a method that will change the structure based on the key value, like so: 我正在寻找一种将根据键值更改结构的方法,如下所示:

   a  b  c  d   e
0  1  3  5  7  10
1  2  4  6  8  10 <- 10 is duplicated
2  2  4  6  9  10 <- 10 is duplicated

The result row number is as the longest group count (d in the above example) and the missing values are duplicates of the last available value. 结果行号是最长的组数(在上面的示例中为d),而丢失的值是最后一个可用值的重复项。

Create MultiIndex by set_index with counter column by cumcount , reshape by unstack , repalce missing values by last non missing ones with ffill and last converting all data to integer s if necessary: 建立MultiIndexset_index与计数器列cumcount ,通过重塑unstack ,repalce遗漏值由最后一个非缺少的与ffill和最后将所有数据以integer如有必要,S:

df = df.set_index([df.groupby('key').cumcount(),'key'])['value'].unstack().ffill().astype(int)

Another solution with custom lambda function: 带有自定义lambda函数的另一个解决方案:

df = (df.groupby('key')['value']
        .apply(lambda x: pd.Series(x.values))
        .unstack(0)
        .ffill()
        .astype(int))

print (df)
key  a  b  c  d   e
0    1  3  5  7  10
1    2  4  6  8  10
2    2  4  6  9  10

Using pivot , with groupby + cumcount 使用pivot ,与groupby + cumcount

df.assign(key2=df.groupby('key').cumcount()).pivot('key2','key','value').ffill().astype(int)
Out[214]: 
key   a  b  c  d   e
key2                
0     1  3  5  7  10
1     2  4  6  8  10
2     2  4  6  9  10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM