[英]fill data to dataframe python
Is there any way to add other hours which are not shown in the dataframe, and set its value to 0? 是否可以添加未在数据框中显示的其他小时并将其值设置为0?
+---+------+--------------+<br>
| hr|counts| app|<br>
+---+------+--------------+<br>
| 02| 1| "DNS"|<br>
| 13| 2| "DNS"|<br>
+---+------+--------------+<br>
And I want it to be 我希望它成为
+---+------+--------------+<br>
| hr|counts| app|<br>
+---+------+--------------+<br>
| 00| 0| "DNS"|<br>
| 01| 0| "DNS"|<br>
| 02| 1| "DNS"|<br> <---- new
| 03| 0| "DNS"|<br>
| 04| 0| "DNS"|<br>
| 05| 0| "DNS"|<br>
.<br>
.<br>
| 11| 0| "DNS"|<br>
| 12| 0| "DNS"|<br>
| 13| 2| "DNS"|<br> <---- new
| 14| 0| "DNS"|<br>
| 15| 0| "DNS"|<br>
| 16| 0| "DNS"|<br>
.<br>
.<br>
| 22| 0| "DNS"|<br>
| 23| 0| "DNS"|<br>
+---+------+--------------+<br>
dtypes: dtypes:
counts float64 计数float64
app object 应用程式物件
dtype: object dtype:对象
Here's one way using set_index
and reindex
这是使用
set_index
和reindex
的一种方法
In [4971]: df.set_index('hr').reindex(range(24)).fillna(0).assign(app='DNS').reset_index()
Out[4971]:
hr counts app
0 0 0.0 DNS
1 1 0.0 DNS
2 2 1.0 DNS
3 3 0.0 DNS
4 4 0.0 DNS
5 5 0.0 DNS
6 6 0.0 DNS
7 7 0.0 DNS
8 8 0.0 DNS
9 9 0.0 DNS
10 10 0.0 DNS
11 11 0.0 DNS
12 12 0.0 DNS
13 13 2.0 DNS
14 14 0.0 DNS
15 15 0.0 DNS
16 16 0.0 DNS
17 17 0.0 DNS
18 18 0.0 DNS
19 19 0.0 DNS
20 20 0.0 DNS
21 21 0.0 DNS
22 22 0.0 DNS
23 23 0.0 DNS
Details 细节
In [4954]: df
Out[4954]:
hr counts app
0 2 1 DNS
1 13 2 DNS
Steps 脚步
Showing only top 5 rows for brevity 为简洁起见,仅显示前5行
In [4974]: df.set_index('hr').reindex(range(24)).head()
Out[4974]:
counts app
hr
0 NaN NaN
1 NaN NaN
2 1.0 DNS
3 NaN NaN
4 NaN NaN
In [4975]: df.set_index('hr').reindex(range(24)).fillna(0).head()
Out[4975]:
counts app
hr
0 0.0 0
1 0.0 0
2 1.0 DNS
3 0.0 0
4 0.0 0
In [4976]: df.set_index('hr').reindex(range(24)).fillna(0).assign(app='DNS').head()
Out[4976]:
counts app
hr
0 0.0 DNS
1 0.0 DNS
2 1.0 DNS
3 0.0 DNS
4 0.0 DNS
EDIT 编辑
If however, your hr
column is string/object type due to trailing zero, then use custom formatted range index hrs
但是,如果由于尾随零而使您的
hr
列为字符串/对象类型,请使用自定义格式的范围索引hrs
In [5006]: df
Out[5006]:
hr counts app
0 02 1 DNS
1 13 2 DNS
In [5007]: hrs = map('{:02d}'.format, range(24))
In [5008]: df.set_index('hr').reindex(hrs).fillna(0).assign(app='DNS').reset_index()
Out[5008]:
hr counts app
0 00 0.0 DNS
1 01 0.0 DNS
2 02 1.0 DNS
3 03 0.0 DNS
4 04 0.0 DNS
5 05 0.0 DNS
6 06 0.0 DNS
7 07 0.0 DNS
8 08 0.0 DNS
9 09 0.0 DNS
10 10 0.0 DNS
11 11 0.0 DNS
12 12 0.0 DNS
13 13 2.0 DNS
14 14 0.0 DNS
15 15 0.0 DNS
16 16 0.0 DNS
17 17 0.0 DNS
18 18 0.0 DNS
19 19 0.0 DNS
20 20 0.0 DNS
21 21 0.0 DNS
22 22 0.0 DNS
23 23 0.0 DNS
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.