简体   繁体   English

Pandas dataframe groupby删除列

[英]Pandas dataframe groupby remove column

I stumble upon with a problem with dataframe.我偶然发现了 dataframe 的问题。 I am using this snippet code to generate dataframe after that I group by dataframe based on 'chr' Column .我使用此代码段生成 dataframe 之后,我根据'chr' Column按 dataframe 分组。

import pandas as pd

DF = pd.DataFrame({'chr':["chr3","chr3","chr7","chr6","chr1", "chr7"],'y':[10,20,30,40,50,90],'ds': 
     ['2018-01-01', '2018-01-02', '2018-01-01', '2018-01-01', '2018-01-01', '2018-12-01']})

DF.head(n=10)

    chr     y       ds
0   chr3    10  2018-01-01
1   chr3    20  2018-01-02
2   chr7    30  2018-01-01
3   chr6    40  2018-01-01
4   chr1    50  2018-01-01
5   chr7    90  2018-12-01


ans = [pd.DataFrame(y) for x, y in DF.groupby('chr', as_index=False)]
ans

[    chr   y          ds
4  chr1  50  2018-01-01,
     chr   y          ds
0  chr3  10  2018-01-01
1  chr3  20  2018-01-02,
     chr   y          ds
3  chr6  40  2018-01-01,
     chr   y          ds
2  chr7  30  2018-01-01
5  chr7  90  2018-12-01]

Please note that once I use groupby I store the result in list.请注意,一旦我使用groupby ,我会将结果存储在列表中。 As a result, I have list with nested dataframe based on chr .结果,我列出了基于chr的嵌套dataframe列表。 What is the way if I need to delete chr column in each sub dataframe from my list?如果我需要从我的列表中删除每个子 dataframe 中的 chr 列,该怎么办? I need simply to drop chr in each dataframe from the list.我只需要从列表中删除每个 dataframe 中的chr Please note that solution should scale on bigger list size.请注意,解决方案应该在更大的列表大小上进行扩展。

You can do it while creating your original list like this if there are only two columns:如果只有两列,您可以在创建原始列表时执行此操作:

ans = [pd.DataFrame(y, columns=DF.columns.difference(['chr'])) for x, y in DF.groupby('chr', as_index=False)]    

Alternatively, drop chr from each subDf explicitly:或者,从每个 subDf 显式删除chr

ans = [pd.DataFrame(y).drop('chr', axis=1) for x, y in DF.groupby('chr', as_index=False)]    

If you can't drop while creating the original list (as shown above), you can update it like this:如果在创建原始列表时无法删除(如上所示),您可以像这样更新它:

# Create `ans` as you're currently doing:
ans = [pd.DataFrame(y) for x, y in DF.groupby('chr', as_index=False)] 
#
# some processing on `ans`
#
# Now update `ans` by dropping "chr" from each subDf
ans = [df.drop('chr', axis=1) for df in ans]

This will drop the column chr during saving:这将在保存期间删除列chr

ans = [pd.DataFrame(y).drop('chr', axis=1) for x, y in DF.groupby('chr', as_index=False)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM