[英]pandas sort_values is working outside a for loop, but not inside?
我正在嘗試編寫一個for循環,該循環將使用具有普查數據的數據框,計算每個州的三個最大縣的人口,並將總和寫入新的序列。 這是無法使用的功能:
import numpy as np
import pandas as pd
##created a dataframe earlier with a census csv file called 'census_df'
def bad_function():
only_counties = census_df.set_index(['STNAME'])
ser = pd.Series(index = only_counties.index)
ser = ser.index.drop_duplicates() ##get a unique list of all 50 states from the dataframe
state_name = pd.Series(index = ser)
for i in state_name.index:
a = only_counties.loc[i, 'CENSUS2010POP']
a = a.sort_values(ascending=False)
population = np.sum(a[0:3])
state_name.loc[i] = population
return state_name
當我調用此函數時,出現以下錯誤:
AttributeError Traceback (most recent call last)
<ipython-input-59-dc2686648261> in <module>()
26 return state_name
27
---> 28 answer_six()
<ipython-input-59-dc2686648261> in answer_six()
18 for i in state_name.index:
19 a = only_counties.loc[i, 'CENSUS2010POP']
---> 20 a = a.sort_values(ascending=False)
21
22 population = np.sum(a[0:3])
AttributeError: 'numpy.int64' object has no attribute 'sort_values'
但是,當我出於測試目的放棄循環並從我要迭代的索引中選擇一個項目(“阿拉巴馬州”),並以相同的方式使用相同的sort_values方法時,它就可以正常工作。 像這樣:
def bad_function():
only_counties = census_df.set_index(['STNAME'])
ser = pd.Series(index = only_counties.index)
ser = ser.index.drop_duplicates()
state_name = pd.Series(index = ser)
a = only_counties.loc['Alabama', 'CENSUS2010POP']
a = a.sort_values(ascending=False)
b = np.sum(a[0:3])
return a, b
它恰好返回我想要的內容,它是:a:按人口排序的州列表,b:三個人口最高的縣的總和。 那到底是怎么回事?
您是以下人員嗎?
for i in state_name.index:
print (I)
打印狀態名稱或索引?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.