简体   繁体   English

Pandas 数据框条件均值

[英]Pandas dataframe conditional mean

I'm trying to find the average number of cigarettes smoked per day among women who smoked during pregnancy for a given dataset.我试图找到给定数据集的怀孕期间吸烟的女性每天吸烟的平均数量。 Currently, I'm trying目前,我正在尝试

mean = data.groupby(['male', 'cigs']).mean()
print(mean)

That gives me the mean average family income for each amount of cigarettes smoked per day (ie 0 per day, 2 per day, 8 per day, ect).这给了我每天吸每支烟的平均家庭收入(即每天 0 支、每天 2 支、每天 8 支等)。 How do I get it so it's the average family income for those who smoked >= 1?我如何得到它,所以它是吸烟 >= 1 的人的平均家庭收入?

Also, this is my first post on stack so forgive me if there isn't enough detail.另外,这是我在堆栈上的第一篇文章,所以如果没有足够的细节,请原谅我。

I assume " cigs " refers to number of Cigarettes smoked per day.我假设“ cigs ”是指每天抽的香烟数量。 You can first filter the data based on cigs >=1 and then apply what you were doing.您可以首先根据 cigs >=1 过滤数据,然后应用您正在执行的操作。

data_on_people_who_smoke = data[data.cigs >= 1]
mean = data_on_people_who_smoke.groupby(['male', 'cigs']).mean()
print(mean)
mean = data[data['cigs']>1]['income'].mean()
print (mean)

This gives you the mean of the income of all respondents that smoke at least 1 cig.这为您提供了抽至少 1 支烟的所有受访者收入的平均值。 don't groupby gender or cigs.不要按性别或香烟分组。 Filter first, and get the mean.先过滤,取平均值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM