[英]Python - Pandas, split long column to multiple columns
Given the following DataFrame: 给定以下DataFrame:
>>> pd.DataFrame(data=[['a',1],['a',2],['b',3],['b',4],['c',5],['c',6],['d',7],['d',8],['d',9],['e',10]],columns=['key','value'])
key value
0 a 1
1 a 2
2 b 3
3 b 4
4 c 5
5 c 6
6 d 7
7 d 8
8 d 9
9 e 10
I'm looking for a method that will change the structure based on the key value, like so: 我正在寻找一种将根据键值更改结构的方法,如下所示:
a b c d e
0 1 3 5 7 10
1 2 4 6 8 10 <- 10 is duplicated
2 2 4 6 9 10 <- 10 is duplicated
The result row number is as the longest group count (d in the above example) and the missing values are duplicates of the last available value. 结果行号是最长的组数(在上面的示例中为d),而丢失的值是最后一个可用值的重复项。
Create MultiIndex
by set_index
with counter column by cumcount
, reshape by unstack
, repalce missing values by last non missing ones with ffill
and last converting all data to integer
s if necessary: 建立MultiIndex
由set_index
与计数器列cumcount
,通过重塑unstack
,repalce遗漏值由最后一个非缺少的与ffill
和最后将所有数据以integer
如有必要,S:
df = df.set_index([df.groupby('key').cumcount(),'key'])['value'].unstack().ffill().astype(int)
Another solution with custom lambda function: 带有自定义lambda函数的另一个解决方案:
df = (df.groupby('key')['value']
.apply(lambda x: pd.Series(x.values))
.unstack(0)
.ffill()
.astype(int))
print (df)
key a b c d e
0 1 3 5 7 10
1 2 4 6 8 10
2 2 4 6 9 10
Using pivot
, with groupby
+ cumcount
使用pivot
,与groupby
+ cumcount
df.assign(key2=df.groupby('key').cumcount()).pivot('key2','key','value').ffill().astype(int)
Out[214]:
key a b c d e
key2
0 1 3 5 7 10
1 2 4 6 8 10
2 2 4 6 9 10
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.