根据Pandas的分组依据将一列分为多个

Question

I've just started using Python and I'm stuck with a problem related to a dataset I'm working with. 我刚开始使用Python，但遇到与我正在使用的数据集有关的问题。

I have the following dataset: 我有以下数据集：

    C1  C2  C3          C4      C5      C6
    99  069 99002068    3348117 3230802 T6
    99  069 99002063    4599974 178885  T4
    99  069 99002063    4599974 4606066 T4
    99  069 99002063    4599974 236346  T4
    99  069 99002063    4599974 310114  T4

I need to group by transpose column C5 into multiple columns based on a group by of columns C1,C2,C3,C4,C6. 我需要根据列C1，C2，C3，C4，C6的分组依据将列C5换组为多个列。

The code I've written so far is the following: 到目前为止，我编写的代码如下：

    # load plugins
    import pandas as pd

    # import CSV
    data = pd.read_csv(
        "C:/Users/mcatuogno/Desktop/lista_collegamenti_onb.csv",
        sep=";",
        header=None,
        dtype=str,
        usecols=[0, 1, 2, 3, 4, 5],
        names=["C1", "C2", "C3", "C4", "C5", "C6"]
    )

    # sort values
    dataSort = data.sort_values(["C1", "C2", "C3", "C4"])

    # transpose column based on group by function
    dataTranspose = dataSort.groupby(["C1", "C2", "C3", "C4", "C6"])["C5"].apply(list)

With the code above the result is 使用上面的代码，结果是

    C1   C2  ...              C6      C5
    99  000  ...  09900000001100      [102995, 102997, 102996]
    99  000  ...  09900000001135      [103042]

I don't know how I can split the column C5 into multiple columns, each with the following name CN_1, CN_2, ..., CN_x. 我不知道如何将列C5拆分为多个列，每个列的名称分别为CN_1，CN_2，...，CN_x。

Which python function can I use? 我可以使用哪个python函数？

Thanks in advance! 提前致谢！

Answer 1

You can create helper Series for count consecutive values per groups by GroupBy.cumcount , add to MultiIndex and reshape by Series.unstack : 您可以创建助手系列，以通过GroupBy.cumcount为每个组计算连续值，添加到MultiIndex并通过Series.unstack重塑Series.unstack ：

g = dataSort.groupby(["C1", "C2", "C3", "C4", "C6"])["C5"].cumcount()
print (g)
1    0
2    1
3    2
4    3
0    0
dtype: int64

df = (dataSort.set_index(["C1", "C2", "C3", "C4", "C6", g])['C5']
              .unstack()
              .add_prefix('Cn_')
              .reset_index())
print (df)
   C1  C2        C3       C4  C6       Cn_0       Cn_1      Cn_2      Cn_3
0  99  69  99002063  4599974  T4   178885.0  4606066.0  236346.0  310114.0
1  99  69  99002068  3348117  T6  3230802.0        NaN       NaN       NaN

Your solution should be changed for create new DataFrame per constructor: 应该更改您的解决方案，以便为每个构造函数创建新的DataFrame ：

dataTranspose = dataSort.groupby(["C1", "C2", "C3", "C4", "C6"])["C5"].apply(list)

df = (pd.DataFrame(dataTranspose.values.tolist(), index = dataTranspose.index)
        .add_prefix('Cn_')
        .reset_index())
print (df)
   C1  C2        C3       C4  C6     Cn_0       Cn_1      Cn_2      Cn_3
0  99  69  99002063  4599974  T4   178885  4606066.0  236346.0  310114.0
1  99  69  99002068  3348117  T6  3230802        NaN       NaN       NaN

根据Pandas的分组依据将一列分为多个

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-07-10 13:01:01

根据Pandas的分组依据将一列分为多个

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-07-10 13:01:01

解决方案1
3 已采纳 2019-07-10 13:01:01