I'm trying to group by different columns based on year and apply for just same year and finally store the result in a .csv file.
My data and code is :
ISO3 Income_Cat_1980 Income_Cat_1985 DWWC1980 DWWC1985
AFG L LM 5 10
AGO LM H 15 25
ALB LM UM 30 40
ARE H H 40 50
for i in range (1980,1990,5):
df=pd.DataFrame(pd.read_csv("mydata.csv"))
df=df.groupby("Income_Cat_"+str(i)).sum()
print df
df.to_csv('country-surplus'+str(i)+'.csv',index="Income_Cat_"+str(i))
my code result is :
Income_Cat_1980 DWWC1980 DWWC1985
H 40 50
L 5 10
LM 45 65
Income_Cat_1985 DWWC1980 DWWC1985
H 55 75
LM 5 10
UM 30 40
and store the output in different .csv
file , but I need to calculate the sum of DWWC1980
and DWWC1985
based on same year Income_Cat
, so the result should be:
Income_Cat DWWC1980 DWWC1985
H 40 75
L 5 0
LM 45 10
UM 0 40
and store output in one .csv
file.
The code should be looking like this:
#You should only be loading the data once
df=pd.DataFrame(pd.read_csv("mydata.csv"))
dfl = []
for i in range (1980,1990,5):
temp = df.groupby("Income_Cat_"+str(i))[['DWWC' + str(i)]].sum()
temp.index.rename('Income_Cat', inplace=True)
dfl.append(temp)
out = pd.concat(dfl, sort=False).fillna(0)).groupby('Income_Cat').sum()
out.to_csv('country-surplus'+str(i)+'.csv',index="Income_Cat')
The output is not exactly the same as you've described, because its index includes all 6 income categories. I don't understand why you'd only need 4 of them, but I hope the snippet is helpful.
This shoudld produce the desired output, if i understood the question
init = True
for i in range(1980,1990,5):
_df = df[["Income_Cat_"+str(i), 'DWWC'+str(i)]]
_df=_df.groupby("Income_Cat_"+str(i)).sum()
if init:
out = _df
init=False
else:
out = out.merge(_df, how='outer', left_index=True, right_index=True)
out.fillna(0, inplace=True)
out.index.rename('Income_cat', inplace=True)
you can make this slightly more comprehensive, by replacing the first line inside the loop with:
_df = df[[a for a in df.columns if str(i) in a]]
Guess you need this.
Input:
df
ISO3 Income_Cat_1980 Income_Cat_1985 DWWC1980 DWWC1985
0 AFG L LM 5 10
1 AGO LM H 15 25
2 ALB LM UM 30 40
3 ARE H H 40 50
Use the following code
pd.concat([df.groupby('Income_Cat_' + str(year)).sum()['DWWC' + str(year)]
for year in range(1980,1986) if 'Income_Cat_' + str(year) in
df.columns],1).fillna(0).astype(int)
Output
DWWC1980 DWWC1985
H 40 75
L 5 0
LM 45 10
UM 0 40
Explanation:
pd.concat([list of series],1)
concatenates pd.Series
along the indices. If one pd.Series
(with column name series1
does not have the index i
of another the cell in the resulting dataframe gets the value NaN
so:
print(df.loc[i, series1])
results in NaN
. Therefore we use fillna(0)
to fill NaN
s with zeros. Casting to integers is the final step to arrive at the desired dataframe
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.