通过行总和和列总和来替换df

Question

I have a df for candidates votations by county. 我有一个县级候选人投票的df。 (600 X 1192) （600 X 1192）

I need to subset the original df to select the candidates with total votation > 50 (row sum) and Countys with total votation > 100 (column sum) 我需要将原始df子集化，以选择总投票> 50（行总和）和总投票> 100（列总和）的县

On the original data I do not have the totals by candidate, county. 在原始数据上，我没有按候选人，县的总计。

import pandas as pd
import numpy as np


df1 = pd.DataFrame([["cand1", 10,100, 1, 1000, 10, 100],["cand2",20,1000, 2, 20, 0, 20],["cand3", 30,5000, 3, 30, 0, 3], ["cand4",40, 1, 4, 1, 0, 4], ["cand5",50, 50, 0,20, 0,2]],
                   columns=['candidate',"code", 'county1', 'county2', 'county3', 'county4', 'county5'])
df1

And the result must be: 结果必须是：

df2 = pd.DataFrame([["cand1", 10,100, 1000, 100],["cand2",20,1000, 20, 20],["cand3",30, 5000, 30, 3], ["cand5",50, 50, 20, 2]],
                   columns=['candidate',"code", 'county1', 'county3', 'county5'])
df2

I would appreciate your help to solve my issue 多谢您的协助解决我的问题

Answer 1

Use boolean indexing: 使用布尔索引：

df1.set_index(['candidate', 'code']).loc[
    lambda x: x.sum(axis=1) > 50, lambda x: x.sum(axis=0) > 100
]

lambdas allow operator chaining but if you want a cleaner way you can also do lambda允许操作员链接，但是如果您想使用更简洁的方法，也可以这样做

df1 = df1.set_index(['candidate', 'code'])
df1.loc[df1.sum(axis=1) > 50, df1.sum(axis=0) > 100]

Both yield 既产量

                county1  county3  county5
candidate code                           
cand1     10        100     1000      100
cand2     20       1000       20       20
cand3     30       5000       30        3
cand5     50         50       20        2

where candidate and code columns are the index of DataFrame. 其中候选列和代码列是DataFrame的索引。 You can call reset_index() at the end if you want them as regular columns. 如果希望将它们作为常规列，则可以在最后调用reset_index() 。

通过行总和和列总和来替换df

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-12-30 21:36:38

通过行总和和列总和来替换df

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-12-30 21:36:38

解决方案1
2 已采纳 2017-12-30 21:36:38