[英]Split a column into multiple ones based on a groupby with Pandas
I've just started using Python and I'm stuck with a problem related to a dataset I'm working with. 我刚开始使用Python,但遇到与我正在使用的数据集有关的问题。
I have the following dataset: 我有以下数据集:
C1 C2 C3 C4 C5 C6
99 069 99002068 3348117 3230802 T6
99 069 99002063 4599974 178885 T4
99 069 99002063 4599974 4606066 T4
99 069 99002063 4599974 236346 T4
99 069 99002063 4599974 310114 T4
I need to group by transpose column C5 into multiple columns based on a group by of columns C1,C2,C3,C4,C6. 我需要根据列C1,C2,C3,C4,C6的分组依据将列C5换组为多个列。
The code I've written so far is the following: 到目前为止,我编写的代码如下:
# load plugins
import pandas as pd
# import CSV
data = pd.read_csv(
"C:/Users/mcatuogno/Desktop/lista_collegamenti_onb.csv",
sep=";",
header=None,
dtype=str,
usecols=[0, 1, 2, 3, 4, 5],
names=["C1", "C2", "C3", "C4", "C5", "C6"]
)
# sort values
dataSort = data.sort_values(["C1", "C2", "C3", "C4"])
# transpose column based on group by function
dataTranspose = dataSort.groupby(["C1", "C2", "C3", "C4", "C6"])["C5"].apply(list)
With the code above the result is 使用上面的代码,结果是
C1 C2 ... C6 C5
99 000 ... 09900000001100 [102995, 102997, 102996]
99 000 ... 09900000001135 [103042]
I don't know how I can split the column C5 into multiple columns, each with the following name CN_1, CN_2, ..., CN_x. 我不知道如何将列C5拆分为多个列,每个列的名称分别为CN_1,CN_2,...,CN_x。
Which python function can I use? 我可以使用哪个python函数?
Thanks in advance! 提前致谢!
You can create helper Series for count consecutive values per groups by GroupBy.cumcount
, add to MultiIndex
and reshape by Series.unstack
: 您可以创建助手系列,以通过
GroupBy.cumcount
为每个组计算连续值,添加到MultiIndex
并通过Series.unstack
重塑Series.unstack
:
g = dataSort.groupby(["C1", "C2", "C3", "C4", "C6"])["C5"].cumcount()
print (g)
1 0
2 1
3 2
4 3
0 0
dtype: int64
df = (dataSort.set_index(["C1", "C2", "C3", "C4", "C6", g])['C5']
.unstack()
.add_prefix('Cn_')
.reset_index())
print (df)
C1 C2 C3 C4 C6 Cn_0 Cn_1 Cn_2 Cn_3
0 99 69 99002063 4599974 T4 178885.0 4606066.0 236346.0 310114.0
1 99 69 99002068 3348117 T6 3230802.0 NaN NaN NaN
Your solution should be changed for create new DataFrame
per constructor: 应该更改您的解决方案,以便为每个构造函数创建新的
DataFrame
:
dataTranspose = dataSort.groupby(["C1", "C2", "C3", "C4", "C6"])["C5"].apply(list)
df = (pd.DataFrame(dataTranspose.values.tolist(), index = dataTranspose.index)
.add_prefix('Cn_')
.reset_index())
print (df)
C1 C2 C3 C4 C6 Cn_0 Cn_1 Cn_2 Cn_3
0 99 69 99002063 4599974 T4 178885 4606066.0 236346.0 310114.0
1 99 69 99002068 3348117 T6 3230802 NaN NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.