简体   繁体   English

使用 pandas 过滤列总和上的行

[英]With pandas filter rows on sum of column

I would like to select rows in a dataframe based on a sum crieteria of one of the columns.我想根据其中一列的总和标准在 dataframe 中的 select 行。 For example I want the indexes of the the first rows of the dataframe where the sum of column B is less than 3:例如,我想要 dataframe 第一行的索引,其中 B 列的总和小于 3:

df = pd.DataFrame({'A':[z, y, x, w], 'B':[1, 1, 1, 1]})

The only solution I have is a seperate dataframe and a while loop:我唯一的解决方案是一个单独的 dataframe 和一个 while 循环:

df2 = pd.DataFrame({'A':[], 'B':[]})
index = 0
while df2['B'].sum() < 3:
    df2 = df2.append(df1.loc[index])
    index += 1

The logic gets me where I need but seems unnecessarily inefficient.逻辑将我带到我需要的地方,但似乎不必要地低效。 Does anyone have a creative way of using pandas to filter the dataframe based on sum conditional of a column?有没有人有创造性的方式使用 pandas 根据列的总和条件过滤 dataframe?

What you describe is a cumulative sum ( cumsum ).您描述的是累积和( cumsum )。

Appending rows to a DataFrame within a loop is horribly inefficient as it copies the entire DataFrame on every iteration just to append an additional small amount of data.在循环中将行附加到 DataFrame效率非常低,因为它在每次迭代时将整个 DataFrame 复制到 append 额外的少量数据。 Instead you should look to slice your original DataFrame with a Boolean mask;相反,您应该使用 Boolean 掩码对原始 DataFrame 进行切片; in this case checking where the cumsum is less than 3.在这种情况下,检查cumsum小于 3 的位置。

df2 = df[df['B'].cumsum().lt(3)]

#   A  B
#0  z  1
#1  y  1

df['B'].cumsum()
#0    1
#1    2
#2    3
#3    4

df['B'].cumsum().lt(3)
#0     True     <- Slicing with this Boolean Series
#1     True     <- keeps only these True rows
#2    False
#3    False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM