Pandas dataframe groupby删除列

Question

I stumble upon with a problem with dataframe.我偶然发现了 dataframe 的问题。 I am using this snippet code to generate dataframe after that I group by dataframe based on 'chr' Column .我使用此代码段生成 dataframe 之后，我根据'chr' Column按 dataframe 分组。

import pandas as pd

DF = pd.DataFrame({'chr':["chr3","chr3","chr7","chr6","chr1", "chr7"],'y':[10,20,30,40,50,90],'ds': 
     ['2018-01-01', '2018-01-02', '2018-01-01', '2018-01-01', '2018-01-01', '2018-12-01']})

DF.head(n=10)

    chr     y       ds
0   chr3    10  2018-01-01
1   chr3    20  2018-01-02
2   chr7    30  2018-01-01
3   chr6    40  2018-01-01
4   chr1    50  2018-01-01
5   chr7    90  2018-12-01


ans = [pd.DataFrame(y) for x, y in DF.groupby('chr', as_index=False)]
ans

[    chr   y          ds
4  chr1  50  2018-01-01,
     chr   y          ds
0  chr3  10  2018-01-01
1  chr3  20  2018-01-02,
     chr   y          ds
3  chr6  40  2018-01-01,
     chr   y          ds
2  chr7  30  2018-01-01
5  chr7  90  2018-12-01]

Please note that once I use groupby I store the result in list.请注意，一旦我使用groupby ，我会将结果存储在列表中。 As a result, I have list with nested dataframe based on chr .结果，我列出了基于chr的嵌套dataframe列表。 What is the way if I need to delete chr column in each sub dataframe from my list?如果我需要从我的列表中删除每个子 dataframe 中的 chr 列，该怎么办？ I need simply to drop chr in each dataframe from the list.我只需要从列表中删除每个 dataframe 中的chr 。 Please note that solution should scale on bigger list size.请注意，解决方案应该在更大的列表大小上进行扩展。

Answer 1

You can do it while creating your original list like this if there are only two columns:如果只有两列，您可以在创建原始列表时执行此操作：

ans = [pd.DataFrame(y, columns=DF.columns.difference(['chr'])) for x, y in DF.groupby('chr', as_index=False)]

Alternatively, drop chr from each subDf explicitly:或者，从每个 subDf 显式删除chr ：

ans = [pd.DataFrame(y).drop('chr', axis=1) for x, y in DF.groupby('chr', as_index=False)]

If you can't drop while creating the original list (as shown above), you can update it like this:如果在创建原始列表时无法删除（如上所示），您可以像这样更新它：

# Create `ans` as you're currently doing:
ans = [pd.DataFrame(y) for x, y in DF.groupby('chr', as_index=False)] 
#
# some processing on `ans`
#
# Now update `ans` by dropping "chr" from each subDf
ans = [df.drop('chr', axis=1) for df in ans]

Answer 2

This will drop the column chr during saving:这将在保存期间删除列chr ：

ans = [pd.DataFrame(y).drop('chr', axis=1) for x, y in DF.groupby('chr', as_index=False)]

Pandas dataframe groupby删除列

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-02-14 08:59:49

解决方案2
0 2021-02-14 09:00:07

Pandas dataframe groupby删除列

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-02-14 08:59:49

解决方案2 0 2021-02-14 09:00:07

解决方案1
1 已采纳 2021-02-14 08:59:49

解决方案2
0 2021-02-14 09:00:07