Reading.xls files that contains Brazilian population estimates since 2000, I start with the 2000.xls file populating a dataframe called main_df that at first looks like
STATE STATE_CODE CITY CITY_CODE 2000_POP
SP X Sao Paulo Y 10.000.000 ...
After iterating over *.xls files from 2001 until 2020 main_df should look like:
STATE STATE_CODE CITY CITY_CODE 2000_POP 2001_POP 2002_POP ... 2020_POP SP X Sao Paulo Y 10.000.000 m n ... p ...
To make it happen I'm using Pandas in a not very efficient way, iterating over df rows, but anyhow that was the way I found to find the population size looking for the city and state codes.
Being df the dataframes that represents city population estimates for 2001 ~ 2020.
Here's the code snippet that iterates over every df rows trying to populate main_df :
df = pd.read_excel(filename, encoding='latin_1', sep=',')
column_year_id = filename.strip('.xls')
df.columns = ['STATE', 'STATE_CODE', 'CITY', 'CITY_CODE', column_year_id]
for index, row in df.iterrows():
target_uf = (row['STATE_CODE'])
target_city_code = (str(row['CITY_CODE']))
population_on_current_year = row[-1]
selection = (main_df['STATE_CODE'] == target_uf) & (main_df['CITY_CODE'] == target_city_code)
main_df.loc[selection, column_year_id] = population_on_current_year
The problem is that at the end of the day main_df ends up with only its original 2000 population size column filled, but, from 2001 to 2020 its filled with NaN values looking like:
STATE STATE_CODE CITY CITY_CODE 2000_POP 2001_POP 2002_POP ... 2020_POP SP X Sao Paulo Y 10.000.000 NaN NaN ... NaN ...
Why is it happening and what should I do to make it work?
It seems that the problem is because I am not able to insert an element to an specific position like if main_df was an array using main_df[index, column] . Does Pandas allows this kind of insertion?
Edit 1: This is how I create main_df :
main_df = pd.read_excel(filename, encoding='latin_1', sep=',')
I got able to do what I wish with:
selection = (main_df['COD_UF'] == target_state) & (main_df['COD_MUN'] == target_city)
index = main_df.loc[selection].index
main_df.loc[index.values[0], column_year_id] = population_on_current_year
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.