简体   繁体   English

在某些日期替换pandas DataFrame中的NaN(上采样)

[英]Replace NaN in pandas DataFrame at certain dates (upsampling)

I'm new to python and I'm struggeling with the following example: I have a pandas DataFrame with a dateTime-Index and a column with feastdays. 我是python的新手,我正在使用以下示例:我有一个带有dateTime-Index的pandas DataFrame和一个带有feastdays的列。 This is in daily resolution. 这是日常解决方案。

import pandas as pd
import holidays

hd = holidays.Switzerland(years=[2018])
f = pd.DataFrame(hd.items())
f.columns = ['date', 'feastday']
f['date'] = pd.to_datetime(f['date'])
f = f.set_index('date')

This looks like this: 这看起来像这样:

date                feastday        
2018-01-01      Neujahrestag
2018-04-01            Ostern
2018-03-30        Karfreitag
2018-04-02       Ostermontag
2018-05-10          Auffahrt
2018-05-20         Pfingsten
2018-05-21     Pfingstmontag
2018-08-01  Nationalfeiertag
2018-12-25       Weihnachten

Now I want the data not in daily resolution but in for example 6H resolution: 现在我希望数据不是日常分辨率,而是例如6H分辨率:

f1 = f.resample('6H').asfreq()

That works as I wished and leads to: 这是我希望并导致:

date                     feastday        
2018-01-01 00:00:00  Neujahrestag
2018-01-01 06:00:00           NaN
2018-01-01 12:00:00           NaN
2018-01-01 18:00:00           NaN
2018-01-02 00:00:00           NaN
2018-01-02 06:00:00           NaN
2018-01-02 12:00:00           NaN

But now I want to fill for example 'Neujahrstag' for all 2018-01-01 and not only for the first item. 但现在我想填写所有2018-01-01的'Neujahrstag',而不仅仅是第一项。 The result should look like this (not only for 'Neujahrstag' but for all items in mit DataFrame f). 结果应如下所示(不仅适用于'Neujahrstag',而且适用于mit DataFrame f中的所有项目)。 All items with the same date should have the same value in feastday. 具有相同日期的所有项目在节期中应具有相同的值。 The time at that date doesn't matter: 那天的时间无关紧要:

 date                     feastday        
2018-01-01 00:00:00  Neujahrestag
2018-01-01 06:00:00  Neujahrestag
2018-01-01 12:00:00  Neujahrestag
2018-01-01 18:00:00  Neujahrestag
2018-01-02 00:00:00           NaN
2018-01-02 06:00:00           NaN
2018-01-02 12:00:00           NaN

I can replace one item manually by: 我可以手动替换一个项目:

f1['2018-01-01'] = f1['2018-01-01']['feastday'][0]

That works without problem but I don't get running the stuff automatically for all data... I tried it with a for-loop but I didn't succeed. 这没有问题,但我没有自动运行所有数据的东西...我尝试使用for循环,但我没有成功。 Can anybody help me. 有谁能够帮助我。 Maybe there is also an other (simpler) way to reach my goal? 也许还有另一种(更简单的)方法来实现我的目标? Thanks in advance for your help. 在此先感谢您的帮助。

Marco 马尔科

Grouping by day with the df.groupby(df.index.day) pattern is one way to do this: 使用df.groupby(df.index.day)模式按天分组是一种方法:

f1 = f.resample('6H').asfreq()
res = f1.groupby(f1.index.day).ffill()[['feastday']]
res.head(7)
                         feastday
date
2018-01-01 00:00:00  Neujahrestag
2018-01-01 06:00:00  Neujahrestag
2018-01-01 12:00:00  Neujahrestag
2018-01-01 18:00:00  Neujahrestag
2018-01-02 00:00:00           NaN
2018-01-02 06:00:00           NaN
2018-01-02 12:00:00           NaN

In this particular case, use .ffill with the limit argument, as your frequency is 6 hours and there are 24 hours in a day. 在这种特殊情况下,请使用带有limit参数的.ffill ,因为您的频率为6小时,一天中有24小时。

df.resample('6H').ffill(limit=3)

#                         feastday
#date                             
#2018-01-01 00:00:00  Neujahrestag
#2018-01-01 06:00:00  Neujahrestag
#2018-01-01 12:00:00  Neujahrestag
#2018-01-01 18:00:00  Neujahrestag
#2018-01-02 00:00:00           NaN
#2018-01-02 06:00:00           NaN
#2018-01-02 12:00:00           NaN
#...

In general, coud groupby transform if things do not evenly divide. 一般来说,如果事情没有平分,那么coud groupby就会转变。

df = df.resample('6H').asfreq()
df.groupby(df.index.date).transform('first')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM