简体   繁体   中英

fill data to dataframe python

Is there any way to add other hours which are not shown in the dataframe, and set its value to 0?

+---+------+--------------+<br>
| hr|counts|           app|<br>
+---+------+--------------+<br>
| 02|     1|         "DNS"|<br>
| 13|     2|         "DNS"|<br>
+---+------+--------------+<br>

And I want it to be

+---+------+--------------+<br>
| hr|counts|           app|<br>
+---+------+--------------+<br>
| 00|     0|         "DNS"|<br>
| 01|     0|         "DNS"|<br>
| 02|     1|         "DNS"|<br> <---- new
| 03|     0|         "DNS"|<br>
| 04|     0|         "DNS"|<br>
| 05|     0|         "DNS"|<br>
           .<br>
           .<br>
| 11|     0|         "DNS"|<br>
| 12|     0|         "DNS"|<br>
| 13|     2|         "DNS"|<br> <---- new
| 14|     0|         "DNS"|<br>
| 15|     0|         "DNS"|<br>
| 16|     0|         "DNS"|<br>
           .<br>
           .<br>
| 22|     0|         "DNS"|<br>
| 23|     0|         "DNS"|<br>
+---+------+--------------+<br>

dtypes:
counts float64
app object
dtype: object

Here's one way using set_index and reindex

In [4971]: df.set_index('hr').reindex(range(24)).fillna(0).assign(app='DNS').reset_index()
Out[4971]:
    hr  counts  app
0    0     0.0  DNS
1    1     0.0  DNS
2    2     1.0  DNS
3    3     0.0  DNS
4    4     0.0  DNS
5    5     0.0  DNS
6    6     0.0  DNS
7    7     0.0  DNS
8    8     0.0  DNS
9    9     0.0  DNS
10  10     0.0  DNS
11  11     0.0  DNS
12  12     0.0  DNS
13  13     2.0  DNS
14  14     0.0  DNS
15  15     0.0  DNS
16  16     0.0  DNS
17  17     0.0  DNS
18  18     0.0  DNS
19  19     0.0  DNS
20  20     0.0  DNS
21  21     0.0  DNS
22  22     0.0  DNS
23  23     0.0  DNS

Details

In [4954]: df
Out[4954]:
   hr  counts  app
0   2       1  DNS
1  13       2  DNS

Steps

Showing only top 5 rows for brevity

In [4974]: df.set_index('hr').reindex(range(24)).head()
Out[4974]:
    counts  app
hr
0      NaN  NaN
1      NaN  NaN
2      1.0  DNS
3      NaN  NaN
4      NaN  NaN

In [4975]: df.set_index('hr').reindex(range(24)).fillna(0).head()
Out[4975]:
    counts  app
hr
0      0.0    0
1      0.0    0
2      1.0  DNS
3      0.0    0
4      0.0    0

In [4976]: df.set_index('hr').reindex(range(24)).fillna(0).assign(app='DNS').head()
Out[4976]:
    counts  app
hr
0      0.0  DNS
1      0.0  DNS
2      1.0  DNS
3      0.0  DNS
4      0.0  DNS

EDIT

If however, your hr column is string/object type due to trailing zero, then use custom formatted range index hrs

In [5006]: df
Out[5006]:
   hr  counts  app
0  02       1  DNS
1  13       2  DNS

In [5007]: hrs = map('{:02d}'.format, range(24))

In [5008]: df.set_index('hr').reindex(hrs).fillna(0).assign(app='DNS').reset_index()
Out[5008]:
    hr  counts  app
0   00     0.0  DNS
1   01     0.0  DNS
2   02     1.0  DNS
3   03     0.0  DNS
4   04     0.0  DNS
5   05     0.0  DNS
6   06     0.0  DNS
7   07     0.0  DNS
8   08     0.0  DNS
9   09     0.0  DNS
10  10     0.0  DNS
11  11     0.0  DNS
12  12     0.0  DNS
13  13     2.0  DNS
14  14     0.0  DNS
15  15     0.0  DNS
16  16     0.0  DNS
17  17     0.0  DNS
18  18     0.0  DNS
19  19     0.0  DNS
20  20     0.0  DNS
21  21     0.0  DNS
22  22     0.0  DNS
23  23     0.0  DNS

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM