简体   繁体   English

如何在 pandas 中的 pivot 表中添加基于日期的新列?

[英]How can I add a date-based new column to a pivot table in pandas?

I am working with covid-19's ECDC tables: source = https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide我正在使用 covid-19 的 ECDC 表:source = https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide

I have transformed the loooooooooong table into a pivot, more useful one using pandas.我已将 loooooooooong 表转换为 pivot,使用 pandas 更有用。 Now I have a table indexed by date with cases and deaths by some selected countries现在我有一个按日期索引的表格,其中包含一些选定国家/地区的病例和死亡人数

def downloadECDC(url)
    world = pd.read_csv(url)
    today = datetime.today().strftime("%d%m%Y")
    world.to_csv('ECDC' + today + '.csv')

    world['date'] = pd.to_datetime((world.year*10000+world.month*100+world.day).apply(str),format='%Y%m%d')



    dt = world[['date','deaths','cases','countriesAndTerritories', 'popData2018']]
    dt['DperHab'] = dt['deaths']/dt['popData2018']


    preoutput = pd.pivot_table(dt.loc[(dt['countriesAndTerritories']=='Spain') | (dt['countriesAndTerritories']=='Italy') | (dt['countriesAndTerritories']=='Germany') | (dt['countriesAndTerritories']=='France') |  (dt['countriesAndTerritories']=='United_Kingdom') | (dt['countriesAndTerritories']=='Portugal') | (dt['countriesAndTerritories']=='Netherlands') | (dt['countriesAndTerritories']=='Iran') | (dt['countriesAndTerritories']=='China') | (dt['countriesAndTerritories']=='South_Korea')], index = ['date'], values=['deaths','cases'], columns = 'countriesAndTerritories', aggfunc=np.sum, fill_value = 0)
    precases = pd.pivot_table(dt.loc[(dt['countriesAndTerritories']=='Spain') | (dt['countriesAndTerritories']=='Netherlands')| (dt['countriesAndTerritories']=='Italy') | (dt['countriesAndTerritories']=='France') ], index = ['date'], values=['cases'], columns = 'countriesAndTerritories', aggfunc=np.sum, fill_value = 0)
    predeaths= pd.pivot_table(dt.loc[(dt['countriesAndTerritories']=='Spain') | (dt['countriesAndTerritories']=='Netherlands')| (dt['countriesAndTerritories']=='Italy') | (dt['countriesAndTerritories']=='France')], index = ['date'], values=['deaths'], columns = 'countriesAndTerritories', aggfunc=np.sum, fill_value = 0)
    predxh= pd.pivot_table(dt.loc[(dt['countriesAndTerritories']=='Spain') | (dt['countriesAndTerritories']=='Netherlands')| (dt['countriesAndTerritories']=='Italy') | (dt['countriesAndTerritories']=='France')], index = ['date'], values=['DperHab'], columns = 'countriesAndTerritories', aggfunc=np.sum, fill_value = 0)
    output = preoutput.reindex(axis = 1, level = 1, labels = ['Spain','Italy','Germany','France','United_Kingdom','Portugal','Netherlands','Iran','China','South_Korea'])
    cases = precases.reindex(axis = 1, level = 1, labels = ['Spain','Italy','France','Netherlands'])
    deaths = predeaths.reindex(axis = 1, level = 1, labels = ['Spain','Italy','France','Netherlands'])
    dxhab = predxh.reindex(axis = 1, level = 1, labels = ['Spain','Italy','France','Netherlands'])

    output.to_excel('ECDC' + today + '.xlsx')

What I want is to create a new pivot table which values would be calculated summing deaths from one date backwards to the start of the timeline.我想要的是创建一个新的 pivot 表,该表的值将计算从一个日期向后到时间线开始的死亡总和。 I have tried several options but without result.我尝试了几种选择,但没有结果。 Something like, I guess:我猜是这样的:

preaggdeath= pd.pivot_table(dt.loc[(dt['countriesAndTerritories']=='Spain') | (dt['countriesAndTerritories']=='Netherlands')| (dt['countriesAndTerritories']=='Italy') | (dt['countriesAndTerritories']=='France')], index = ['date'], values=[XXXXX], columns = 'countriesAndTerritories', aggfunc=np.sum, fill_value = 0) # when XXXX is like to add deaths from one date to the start of series backwards

Thanks in advance提前致谢

Edit: What I have编辑:我有什么

我的桌子

What I would like to have我想拥有什么

新表

like this?像这样?

df["aggregated deaths"] = df["daily deaths"].cumsum()

@trigonom put me in the right direction. @trigonom 让我朝着正确的方向前进。 It is even more simple thanks to pandas' compound expressions.多亏了 pandas 的复合表达式,它变得更加简单。

The only additional step that I need is to cumulativelly aggregate each column using this expresion我需要的唯一额外步骤是使用此表达式累计聚合每一列

deathsCumulative = deaths.cumsum()

It would generate a new pivot table with cumulative values for each row它将生成一个新的 pivot 表,其中包含每行的累积值

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM