熊貓在缺少日期時間時填充行數

Question

我有一個帶有時間戳列的數據框。 我可以按 10 分鍾范圍內的時間戳對該數據幀的行進行分組，正如您從下面的代碼中看到的那樣

minutes = '10T'
grouped_df=df.loc[df['id_area'] == 3].groupby(pd.to_datetime(df["timestamp"]).dt.floor(minutes))["x"].count()

當我打印數據框時，我得到了這個

timestamp
2022-11-09 14:10:00    2
2022-11-09 14:20:00    1
2022-11-09 15:10:00    1
2022-11-09 15:30:00    1
2022-11-09 16:10:00    2
Name: x, dtype: int64

因此，正如您所看到的，例如在 14:20 和 15:10 之間沒有任何值。 我需要用0填充這些步驟。我該怎么做？

Answer 1

數據樣本：

np.random.seed(2022)

N = 20
df = pd.DataFrame({'id_area':np.random.choice([1,2,3], size=N),
                  'x':np.random.choice([1,np.nan], size=N),
                   'timestamp':pd.date_range('2022-11-11', freq='7Min', periods=N)})

如果只需要在DatetimeIndex中添加缺少的日期時間，請添加Series.asfreq ：

minutes = '10T'
grouped_df1=(df.loc[df['id_area'] == 3]
              .groupby(pd.to_datetime(df["timestamp"]).dt.floor(minutes))["x"]
              .count()
              .asfreq(minutes, fill_value=0))

print (grouped_df1)
timestamp
2022-11-11 00:50:00    1
2022-11-11 01:00:00    0
2022-11-11 01:10:00    0
2022-11-11 01:20:00    0
2022-11-11 01:30:00    0
2022-11-11 01:40:00    0
2022-11-11 01:50:00    0
2022-11-11 02:00:00    1
Freq: 10T, Name: x, dtype: int64

或者使用Grouper ：

minutes = '10T'
grouped_df1=(df.assign(timestamp = pd.to_datetime(df["timestamp"]))
               .loc[df['id_area'] == 3]
               .groupby(pd.Grouper(freq=minutes, key='timestamp'))["x"]
              .count())

print (grouped_df1)
timestamp
2022-11-11 00:50:00    1
2022-11-11 01:00:00    0
2022-11-11 01:10:00    0
2022-11-11 01:20:00    0
2022-11-11 01:30:00    0
2022-11-11 01:40:00    0
2022-11-11 01:50:00    0
2022-11-11 02:00:00    1
Freq: 10T, Name: x, dtype: int64

如果需要將不匹配的值計數為0 ，請將Series.where中的x替換為NaN ：

grouped_df2=(df['x'].where(df['id_area'] == 3)
                   .groupby(pd.to_datetime(df["timestamp"]).dt.floor(minutes))
                   .count())
print (grouped_df2)  
timestamp
2022-11-11 00:00:00    0
2022-11-11 00:10:00    0
2022-11-11 00:20:00    0
2022-11-11 00:30:00    0
2022-11-11 00:40:00    0
2022-11-11 00:50:00    1
2022-11-11 01:00:00    0
2022-11-11 01:10:00    0
2022-11-11 01:20:00    0
2022-11-11 01:30:00    0
2022-11-11 01:40:00    0
2022-11-11 01:50:00    0
2022-11-11 02:00:00    1
2022-11-11 02:10:00    0
Name: x, dtype: int64

Answer 2

為清楚起見，您始終可以創建一個並行數據框，其中包含您需要的每個日期（在本例中，以 10 分鍾為間隔）

grouped_df = grouped_df.reset_index()
times = pd.date_range(start=grouped_df['time'].min(), end=grouped_df['time'].max(), freq='10min')

現在，您需要的所有日期都應該在 times 對象中：

    times:
DatetimeIndex(['2022-11-09 14:10:00', '2022-11-09 14:20:00',
               '2022-11-09 14:30:00', '2022-11-09 14:40:00',
               '2022-11-09 14:50:00', '2022-11-09 15:00:00',
               '2022-11-09 15:10:00', '2022-11-09 15:20:00',
               '2022-11-09 15:30:00', '2022-11-09 15:40:00',
               '2022-11-09 15:50:00', '2022-11-09 16:00:00',
               '2022-11-09 16:10:00'],
              dtype='datetime64[ns]', freq='10T')

然后我們可以加入之前的數據框 grouped_df 並用零填充空白值。

final_df = pd.merge(grouped_df, pd.DataFrame(times, columns=['time']), how='outer', on='time').sort_values('time').fillna(0)

你的最終結果應該看起來很像這樣（請記住我做了一些值來重現你的原始結果）：

        time           values
0   2022-11-09 14:10:00 10.0
1   2022-11-09 14:20:00 5.0
2   2022-11-09 14:30:00 0.0
3   2022-11-09 14:40:00 0.0
4   2022-11-09 14:50:00 0.0
5   2022-11-09 15:00:00 0.0
6   2022-11-09 15:10:00 20.0
7   2022-11-09 15:20:00 0.0
8   2022-11-09 15:30:00 15.0
9   2022-11-09 15:40:00 0.0
10  2022-11-09 15:50:00 0.0
11  2022-11-09 16:00:00 0.0
12  2022-11-09 16:10:00 30.0

熊貓在缺少日期時間時填充行數

問題描述

2 個解決方案

解決方案1
1 已采納 2022-12-21 11:33:47

解決方案2
0 2022-12-21 11:51:11

熊貓在缺少日期時間時填充行數

問題描述

2 個解決方案

解決方案1 1 已采納 2022-12-21 11:33:47

解決方案2 0 2022-12-21 11:51:11

解決方案1
1 已采納 2022-12-21 11:33:47

解決方案2
0 2022-12-21 11:51:11