将 groupedby pandas 数据框（多列但不是所有列）从长转换为宽

Question

The problem:问题：

I have a dataset with yearly data of different companies.我有一个包含不同公司年度数据的数据集。 The data is stored in a long format, each year is a row therefore company ids are duplicated.数据以长格式存储，每年都是一行，因此公司 ID 是重复的。 The data looks like this (however in the original dataframe I have lot more columns).数据看起来像这样（但是在原始数据框中我有更多的列）。

I would need to transform the long type format to wide type format, so each company will be shown in one row (no duplication)我需要将长型格式转换为宽型格式，因此每个公司将显示在一行中（无重复）

This is the result I would like to look like:这是我想要的结果：

As you can see I would need:如您所见，我需要：

some columns (like "year") are not needed any more不再需要某些列（如“年份”）
some columns (like "sales", "sales_change_in_2_years", "sales_change_over_year") should be transformed from wide format to long format and keeping their names (and adding a number to them)某些列（如“sales”、“sales_change_in_2_years”、“sales_change_over_year”）应从宽格式转换为长格式并保留其名称（并为其添加数字）
some columns (like "ind1" and "ind2") should remain as they are (no transformation from wide to long)某些列（如“ind1”和“ind2”）应保持原样（没有从宽到长的转换）

So far I was able to workout a solution which works only on one columns, so it is really not a solution for me.到目前为止，我能够解决一个只适用于一列的解决方案，所以它对我来说真的不是一个解决方案。

This is my code:这是我的代码：

test.groupby("comp_id")['sales_change_1'].apply(list).apply(pd.Series).rename(columns=lambda x: 'sales_{}'.format(x+1))

Is there a better solution to my problem?我的问题有更好的解决方案吗？

Answer 1

After you drop the years:在你放下岁月之后：

del test['Year']

You can manage to group the lines together by adding an extra column with the row "index" for each row belonging to the same company.您可以通过为属于同一公司的每一行添加一个带有“索引”行的额外列来设法将这些行组合在一起。

test['idx'] = test.groupby('Comp_id').cumcount() + 1

Then set it as part as the DataFrame index and use unstack() to turn it into columns.然后将其设置为 DataFrame 索引的一部分并使用 unstack unstack()将其转换为列。

test = test.set_index(['Comp_id', 'idx']).unstack()

At this point, your columns will be a MultiIndex with the created 'idx' as a second level, so you could already use the DataFrame as it stands referring to columns as ('Sales', 1) , ('Sales', 2) , etc.此时，您的列将是一个 MultiIndex，其中创建的'idx'作为第二级，因此您已经可以使用 DataFrame，因为它表示将列称为('Sales', 1) , ('Sales', 2) ，等等。

If you want to flatten your columns, using underscore as the separator, you can do so with:如果你想展平你的列，使用下划线作为分隔符，你可以这样做：

test.columns = ['{}_{}'.format(col, idx) for (col, idx) in test.columns]

将 groupedby pandas 数据框（多列但不是所有列）从长转换为宽

问题描述

1 个解决方案

解决方案1
0 2020-02-08 17:49:35

将 groupedby pandas 数据框（多列但不是所有列）从长转换为宽

问题描述

1 个解决方案

解决方案1 0 2020-02-08 17:49:35

解决方案1
0 2020-02-08 17:49:35