简体   繁体   English

将数据填充到数据框python

[英]fill data to dataframe python

Is there any way to add other hours which are not shown in the dataframe, and set its value to 0? 是否可以添加未在数据框中显示的其他小时并将其值设置为0?

+---+------+--------------+<br>
| hr|counts|           app|<br>
+---+------+--------------+<br>
| 02|     1|         "DNS"|<br>
| 13|     2|         "DNS"|<br>
+---+------+--------------+<br>

And I want it to be 我希望它成为

+---+------+--------------+<br>
| hr|counts|           app|<br>
+---+------+--------------+<br>
| 00|     0|         "DNS"|<br>
| 01|     0|         "DNS"|<br>
| 02|     1|         "DNS"|<br> <---- new
| 03|     0|         "DNS"|<br>
| 04|     0|         "DNS"|<br>
| 05|     0|         "DNS"|<br>
           .<br>
           .<br>
| 11|     0|         "DNS"|<br>
| 12|     0|         "DNS"|<br>
| 13|     2|         "DNS"|<br> <---- new
| 14|     0|         "DNS"|<br>
| 15|     0|         "DNS"|<br>
| 16|     0|         "DNS"|<br>
           .<br>
           .<br>
| 22|     0|         "DNS"|<br>
| 23|     0|         "DNS"|<br>
+---+------+--------------+<br>

dtypes: dtypes:
counts float64 计数float64
app object 应用程式物件
dtype: object dtype:对象

Here's one way using set_index and reindex 这是使用set_indexreindex的一种方法

In [4971]: df.set_index('hr').reindex(range(24)).fillna(0).assign(app='DNS').reset_index()
Out[4971]:
    hr  counts  app
0    0     0.0  DNS
1    1     0.0  DNS
2    2     1.0  DNS
3    3     0.0  DNS
4    4     0.0  DNS
5    5     0.0  DNS
6    6     0.0  DNS
7    7     0.0  DNS
8    8     0.0  DNS
9    9     0.0  DNS
10  10     0.0  DNS
11  11     0.0  DNS
12  12     0.0  DNS
13  13     2.0  DNS
14  14     0.0  DNS
15  15     0.0  DNS
16  16     0.0  DNS
17  17     0.0  DNS
18  18     0.0  DNS
19  19     0.0  DNS
20  20     0.0  DNS
21  21     0.0  DNS
22  22     0.0  DNS
23  23     0.0  DNS

Details 细节

In [4954]: df
Out[4954]:
   hr  counts  app
0   2       1  DNS
1  13       2  DNS

Steps 脚步

Showing only top 5 rows for brevity 为简洁起见,仅显示前5行

In [4974]: df.set_index('hr').reindex(range(24)).head()
Out[4974]:
    counts  app
hr
0      NaN  NaN
1      NaN  NaN
2      1.0  DNS
3      NaN  NaN
4      NaN  NaN

In [4975]: df.set_index('hr').reindex(range(24)).fillna(0).head()
Out[4975]:
    counts  app
hr
0      0.0    0
1      0.0    0
2      1.0  DNS
3      0.0    0
4      0.0    0

In [4976]: df.set_index('hr').reindex(range(24)).fillna(0).assign(app='DNS').head()
Out[4976]:
    counts  app
hr
0      0.0  DNS
1      0.0  DNS
2      1.0  DNS
3      0.0  DNS
4      0.0  DNS

EDIT 编辑

If however, your hr column is string/object type due to trailing zero, then use custom formatted range index hrs 但是,如果由于尾随零而使您的hr列为字符串/对象类型,请使用自定义格式的范围索引hrs

In [5006]: df
Out[5006]:
   hr  counts  app
0  02       1  DNS
1  13       2  DNS

In [5007]: hrs = map('{:02d}'.format, range(24))

In [5008]: df.set_index('hr').reindex(hrs).fillna(0).assign(app='DNS').reset_index()
Out[5008]:
    hr  counts  app
0   00     0.0  DNS
1   01     0.0  DNS
2   02     1.0  DNS
3   03     0.0  DNS
4   04     0.0  DNS
5   05     0.0  DNS
6   06     0.0  DNS
7   07     0.0  DNS
8   08     0.0  DNS
9   09     0.0  DNS
10  10     0.0  DNS
11  11     0.0  DNS
12  12     0.0  DNS
13  13     2.0  DNS
14  14     0.0  DNS
15  15     0.0  DNS
16  16     0.0  DNS
17  17     0.0  DNS
18  18     0.0  DNS
19  19     0.0  DNS
20  20     0.0  DNS
21  21     0.0  DNS
22  22     0.0  DNS
23  23     0.0  DNS

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM