[英]How do I query more than one column in a data frame?
I'm taking a Data Science class that uses Python and this is a questions that stumped me today.我正在上一门使用 Python 的数据科学课程,这是今天困扰我的一个问题。 "How many babies are named “Oliver” in the state of Utah for all years?"
“犹他州多年来有多少婴儿被命名为“奥利弗”? To answer this question we were supposed to use data from this set https://raw.githubusercontent.com/byuidatascience/data4names/master/data-raw/names_year/names_year.csv
为了回答这个问题,我们应该使用这个集合中的数据https://raw.githubusercontent.com/byuidatascience/data4names/master/data-raw/names_year/names_year.csv
So I started by loading in pandas.所以我从加载熊猫开始。
import pandas as pd
Then I loaded in the data set and created a data frame然后我加载数据集并创建了一个数据框
url='https://raw.githubusercontent.com/byuidatascience/data4names/master/data-raw/names_year/names_year.csv'
names=pd.read_csv(url)
Finally I used the .query() method to single out the data type that I wanted, the name Oliver.最后,我使用 .query() 方法挑选出我想要的数据类型,名称为 Oliver。
oliver=names.query("name == 'Oliver'")
I eventually found the total number of babies that had been named Oliver in Utah using this code我最终使用这段代码找到了犹他州被命名为奥利弗的婴儿总数
total=pd.DataFrame.sum(quiz)
print(total)
but I wasn't sure how to single out the data for both the name and the state, or if that is even possible.但我不确定如何挑选出名称和州的数据,或者是否有可能。 Is there anyone out there that knows of a better way to find this answer?
有没有人知道找到这个答案的更好方法?
You have all the code there you just need one more line to Sum accordint to the state:你有所有的代码,你只需要多一行就可以根据状态求和:
print(oliver.UT.sum()) # this will give you the total for the state of UTAH
and forget about the quiz.忘记测验。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.