简体   繁体   English

熊猫sort_values在for循环外工作,但不在里面?

[英]pandas sort_values is working outside a for loop, but not inside?

I am trying to write a for loop that will take a dataframe with census data, count the populations of the three largest counties of each state, and write the sum to a new series. 我正在尝试编写一个for循环,该循环将使用具有普查数据的数据框,计算每个州的三个最大县的人口,并将总和写入新的序列。 Here is the function that is not working: 这是无法使用的功能:

import numpy as np
import pandas as pd

##created a dataframe earlier with a census csv file called 'census_df'


def bad_function():
    only_counties = census_df.set_index(['STNAME'])

    ser = pd.Series(index = only_counties.index)
    ser = ser.index.drop_duplicates() ##get a unique list of all 50 states from the dataframe

    state_name = pd.Series(index = ser)


    for i in state_name.index:
        a = only_counties.loc[i, 'CENSUS2010POP']
        a = a.sort_values(ascending=False)

        population = np.sum(a[0:3])

        state_name.loc[i] = population

    return state_name

When I call this function, I get the following error: 当我调用此函数时,出现以下错误:

AttributeError                            Traceback (most recent call last)
<ipython-input-59-dc2686648261> in <module>()
     26     return state_name
     27 
---> 28 answer_six()

<ipython-input-59-dc2686648261> in answer_six()
     18     for i in state_name.index:
     19         a = only_counties.loc[i, 'CENSUS2010POP']
---> 20         a = a.sort_values(ascending=False)
     21 
     22         population = np.sum(a[0:3])

AttributeError: 'numpy.int64' object has no attribute 'sort_values'

HOWEVER, when I ditched the loop for testing purposes and selected one item('Alabama') from the index of what I am trying to iterate over, and use the same sort_values method in the same way, it works just fine. 但是,当我出于测试目的放弃循环并从我要迭代的索引中选择一个项目(“阿拉巴马州”),并以相同的方式使用相同的sort_values方法时,它就可以正常工作。 Like this: 像这样:

def bad_function():
    only_counties = census_df.set_index(['STNAME'])

    ser = pd.Series(index = only_counties.index)
    ser = ser.index.drop_duplicates()

    state_name = pd.Series(index = ser)

    a = only_counties.loc['Alabama', 'CENSUS2010POP']
    a = a.sort_values(ascending=False)

    b = np.sum(a[0:3])

    return a, b

It returns exactly what I want, which is a: a list of counties in the state sorted by population and b: the sum of the three highest population counties. 它恰好返回我想要的内容,它是:a:按人口排序的州列表,b:三个人口最高的县的总和。 So what is happening? 那到底是怎么回事?

Are you the following: 您是以下人员吗?

for i in state_name.index:
    print (I)

prints the state names or index? 打印状态名称或索引?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM