简体   繁体   English

如何将零值添加到以日期时间为索引的 Pandas 数据框,例如用于后续绘图

[英]How to add zero values to datetime-indexed Pandas dataframe, e.g. for subsequent graphing

I have the following Pandas datetime-indexed dataframe:我有以下 Pandas 日期时间索引数据框:

date_time约会时间 category类别 num_files文件数 num_lines num_lines worst_index最差指数
2022-07-15 23:50:00 2022-07-15 23:50:00 black黑色的 2 2 868 868 0.01 0.01
2022-07-15 23:50:00 2022-07-15 23:50:00 red红色的 5 5 5631 5631 0.01 0.01
2022-07-15 23:50:00 2022-07-15 23:50:00 green绿色 1 1 1891 1891年 0.00 0.00
2022-07-15 23:50:00 2022-07-15 23:50:00 all全部 8 8 8390 8390 0.01 0.01
2022-07-16 00:00:00 2022-07-16 00:00:00 all全部 0 0 0 0 0.00 0.00
2022-07-16 00:10:00 2022-07-16 00:10:00 all全部 0 0 0 0 0.00 0.00
2022-07-16 00:20:00 2022-07-16 00:20:00 black黑色的 1 1 656 656 0.00 0.00
2022-07-16 00:20:00 2022-07-16 00:20:00 red红色的 2 2 4922 4922 0.00 0.00
2022-07-16 00:20:00 2022-07-16 00:20:00 green绿色 1 1 1847 1847年 0.00 0.00
2022-07-16 00:20:00 2022-07-16 00:20:00 all全部 4 4 7425 7425 0.00 0.00
2022-07-16 00:30:00 2022-07-16 00:30:00 all全部 0 0 0 0 0.00 0.00

The data is collected every 10 minutes for the categories "black", "red" and "green" + there is a summary category "all" with respectively cumulated values for "num_files", "num_lines" and "worst_index".每 10 分钟收集一次“黑色”、“红色”和“绿色”类别的数据 + 有一个汇总类别“所有”,分别具有“num_files”、“num_lines”和“worst_index”的累积值。

In case, that num_files, num_lines or worst_index for the "all" category of a measurement point is 0 (zero), I would like to set those values for the three categories "black", "red" and "green" also to 0 (zero) in the dataframe.如果测量点的“所有”类别的 num_files、num_lines 或最差索引为 0(零),我想将“黑色”、“红色”和“绿色”三个类别的值也设置为 0 (零)在数据框中。 So, either insert a corresponding row if there is none for that timestamp so far.因此,如果到目前为止该时间戳没有对应的行,请插入相应的行。

Background is that I found the subsequently generated matplotlib graphs indicating wrongly for the three categories: eg for category "black" there should not be a direct line between timestamp "2022-07-15 23:50:00" "num_files"-value 2 and "num_files"-value 1 at timestamp "2022-07-16 00:20:00" as actually "num_files" for category black was 0 (zero) for the timestamps "2022-07-16 00:00:00" and "2022-07-16 00:10:00" in between but unfortunately the data is collected like this which I cannot change.背景是我发现随后生成的 matplotlib 图错误地指示了三个类别:例如,对于类别“黑色”,时间戳“2022-07-15 23:50:00”“num_files”-value 2 之间不应有直线和时间戳“2022-07-16 00:20:00”处的“num_files”值 1,因为对于时间戳“2022-07-16 00:00:00”,黑色类别的“num_files”实际上为 0(零),并且“2022-07-16 00:10:00”介于两者之间,但不幸的是,数据是这样收集的,我无法更改。

I tried to iterate through the datetime indexed dataframe using iterrows and to select / filter with loc but did not manage it with my too junior Python and Pandas knowledge and experience.我尝试使用 iterrows 遍历日期时间索引的数据框,并使用 loc 选择/过滤,但没有用我太初级的 Python 和 Pandas 知识和经验来管理它。

You can do this with a reindexing operation, treating date_time and category as a multi-index.您可以通过重新索引操作来做到这一点,将date_timecategory视为多索引。 First, construct the final desired index (ie, 10 minute separated dates with an entry for every category).首先,构建最终所需的索引(即,10 分钟分隔的日期,每个类别都有一个条目)。 The MultiIndex.from_product method does this neatly: MultiIndex.from_product方法巧妙地做到了这一点:

drange = pd.date_range(df['date_time'].min(), df['date_time'].max(), freq='10T')
cats = ['black', 'green', 'red', 'all']
new_idx = pd.MultiIndex.from_product([drange, cats], names=['date_time', 'category'])

Then, reindex your data with the new_idx (after temporarily turning the date/category columns to the index).然后,使用new_idx重新索引您的数据(在临时将日期/类别列转换为索引之后)。 Fill any NAs created with 0:填充用 0 创建的任何 NA:

df = df.set_index(['date_time', 'category']).reindex(new_idx).reset_index().fillna(0)

Result:结果:

             date_time category  num_files  num_lines  worst_index
0  2022-07-15 23:50:00    black        2.0      868.0         0.01
1  2022-07-15 23:50:00    green        1.0     1891.0         0.00
2  2022-07-15 23:50:00      red        5.0     5631.0         0.01
3  2022-07-15 23:50:00      all        8.0     8390.0         0.01
4  2022-07-16 00:00:00    black        0.0        0.0         0.00
5  2022-07-16 00:00:00    green        0.0        0.0         0.00
6  2022-07-16 00:00:00      red        0.0        0.0         0.00
7  2022-07-16 00:00:00      all        0.0        0.0         0.00
8  2022-07-16 00:10:00    black        0.0        0.0         0.00
9  2022-07-16 00:10:00    green        0.0        0.0         0.00
10 2022-07-16 00:10:00      red        0.0        0.0         0.00
11 2022-07-16 00:10:00      all        0.0        0.0         0.00
12 2022-07-16 00:20:00    black        1.0      656.0         0.00
13 2022-07-16 00:20:00    green        1.0     1847.0         0.00
14 2022-07-16 00:20:00      red        2.0     4922.0         0.00
15 2022-07-16 00:20:00      all        4.0     7425.0         0.00
16 2022-07-16 00:30:00    black        0.0        0.0         0.00
17 2022-07-16 00:30:00    green        0.0        0.0         0.00
18 2022-07-16 00:30:00      red        0.0        0.0         0.00
19 2022-07-16 00:30:00      all        0.0        0.0         0.00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 合并两个以日期时间为索引的pandas.dataframe对象 - Merging two datetime-indexed pandas.dataframe objects 在日期时间索引的熊猫数据框中选择固定时间间隔的行 - Choose rows a fixed time-interval apart in Datetime-indexed pandas dataframe 如何将功能“附加”到 Python 中的对象,例如 Pandas DataFrame? - How to "attach" functionality to objects in Python e.g. to pandas DataFrame? 在 Pandas Dataframe 中添加后续值 - Add Subsequent Values in Pandas Dataframe 如何将零值添加到分组以进行后续正常减法 Python Pandas - How to add zero values to a grouping for subsequent normal subtraction Python Pandas 使用系列作为输入,如何在 Pandas 数据框中找到具有匹配值的行? 例如df.loc[系列]? - Using a series as input, how can I find rows with matching values in a pandas dataframe? e.g. df.loc[series]? Pandas Dataframe:查找共享值的条目(例如,包含播放器的所有游戏) - Pandas Dataframe: Finding entries that share values (e.g. all games that contain a player) 熊猫:在DataFrame问题中选择列-例如row [1] ['Column'] - Pandas: selecting columns in a DataFrame question - e.g. row[1]['Column'] python pandas dataframe 填充,例如 bfill、ffill - python pandas dataframe filling e.g. bfill, ffill 具有不等元素的 Pandas 日期时间索引 DataFrame 之间的操作 - Operation between pandas datetime-indexed DataFrames with non-equal elements
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM