[英]How can I add a date-based new column to a pivot table in pandas?
I am working with covid-19's ECDC tables: source = https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide我正在使用 covid-19 的 ECDC 表:source = https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide
I have transformed the loooooooooong table into a pivot, more useful one using pandas.我已将 loooooooooong 表转换为 pivot,使用 pandas 更有用。 Now I have a table indexed by date with cases and deaths by some selected countries现在我有一个按日期索引的表格,其中包含一些选定国家/地区的病例和死亡人数
def downloadECDC(url)
world = pd.read_csv(url)
today = datetime.today().strftime("%d%m%Y")
world.to_csv('ECDC' + today + '.csv')
world['date'] = pd.to_datetime((world.year*10000+world.month*100+world.day).apply(str),format='%Y%m%d')
dt = world[['date','deaths','cases','countriesAndTerritories', 'popData2018']]
dt['DperHab'] = dt['deaths']/dt['popData2018']
preoutput = pd.pivot_table(dt.loc[(dt['countriesAndTerritories']=='Spain') | (dt['countriesAndTerritories']=='Italy') | (dt['countriesAndTerritories']=='Germany') | (dt['countriesAndTerritories']=='France') | (dt['countriesAndTerritories']=='United_Kingdom') | (dt['countriesAndTerritories']=='Portugal') | (dt['countriesAndTerritories']=='Netherlands') | (dt['countriesAndTerritories']=='Iran') | (dt['countriesAndTerritories']=='China') | (dt['countriesAndTerritories']=='South_Korea')], index = ['date'], values=['deaths','cases'], columns = 'countriesAndTerritories', aggfunc=np.sum, fill_value = 0)
precases = pd.pivot_table(dt.loc[(dt['countriesAndTerritories']=='Spain') | (dt['countriesAndTerritories']=='Netherlands')| (dt['countriesAndTerritories']=='Italy') | (dt['countriesAndTerritories']=='France') ], index = ['date'], values=['cases'], columns = 'countriesAndTerritories', aggfunc=np.sum, fill_value = 0)
predeaths= pd.pivot_table(dt.loc[(dt['countriesAndTerritories']=='Spain') | (dt['countriesAndTerritories']=='Netherlands')| (dt['countriesAndTerritories']=='Italy') | (dt['countriesAndTerritories']=='France')], index = ['date'], values=['deaths'], columns = 'countriesAndTerritories', aggfunc=np.sum, fill_value = 0)
predxh= pd.pivot_table(dt.loc[(dt['countriesAndTerritories']=='Spain') | (dt['countriesAndTerritories']=='Netherlands')| (dt['countriesAndTerritories']=='Italy') | (dt['countriesAndTerritories']=='France')], index = ['date'], values=['DperHab'], columns = 'countriesAndTerritories', aggfunc=np.sum, fill_value = 0)
output = preoutput.reindex(axis = 1, level = 1, labels = ['Spain','Italy','Germany','France','United_Kingdom','Portugal','Netherlands','Iran','China','South_Korea'])
cases = precases.reindex(axis = 1, level = 1, labels = ['Spain','Italy','France','Netherlands'])
deaths = predeaths.reindex(axis = 1, level = 1, labels = ['Spain','Italy','France','Netherlands'])
dxhab = predxh.reindex(axis = 1, level = 1, labels = ['Spain','Italy','France','Netherlands'])
output.to_excel('ECDC' + today + '.xlsx')
What I want is to create a new pivot table which values would be calculated summing deaths from one date backwards to the start of the timeline.我想要的是创建一个新的 pivot 表,该表的值将计算从一个日期向后到时间线开始的死亡总和。 I have tried several options but without result.我尝试了几种选择,但没有结果。 Something like, I guess:我猜是这样的:
preaggdeath= pd.pivot_table(dt.loc[(dt['countriesAndTerritories']=='Spain') | (dt['countriesAndTerritories']=='Netherlands')| (dt['countriesAndTerritories']=='Italy') | (dt['countriesAndTerritories']=='France')], index = ['date'], values=[XXXXX], columns = 'countriesAndTerritories', aggfunc=np.sum, fill_value = 0) # when XXXX is like to add deaths from one date to the start of series backwards
Thanks in advance提前致谢
Edit: What I have编辑:我有什么
What I would like to have我想拥有什么
like this?像这样?
df["aggregated deaths"] = df["daily deaths"].cumsum()
@trigonom put me in the right direction. @trigonom 让我朝着正确的方向前进。 It is even more simple thanks to pandas' compound expressions.多亏了 pandas 的复合表达式,它变得更加简单。
The only additional step that I need is to cumulativelly aggregate each column using this expresion我需要的唯一额外步骤是使用此表达式累计聚合每一列
deathsCumulative = deaths.cumsum()
It would generate a new pivot table with cumulative values for each row它将生成一个新的 pivot 表,其中包含每行的累积值
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.