[英]Insert zero to missing data in pandas.DataFrame
I have a following kind of pandas.DataFrame: 我有以下类型的pandas.DataFrame:
sales_with_missing = pd.DataFrame({'month':[1,2,3,6,7,8,9,10,11,12],'code':[111]*10, 'sales':[np.random.randint(1500) for _ in np.arange(10)]})
You can see records for April and May are missing, and I'd like to insert sales as zero for those missing records: 您可以看到4月和5月的记录丢失,我想将丢失记录的销售额列为零:
sales = insert_zero_for_missing(sales_with_missing)
print(sales)
How can I implement the insert_zero_for_missing
method? 如何实现
insert_zero_for_missing
方法?
month
as the index, month
设置为索引, reindex
to add rows for the missing months, reindex
为缺少的月份添加行, fillna
to fill the missing values with zero, and then fillna
以零填充缺失值,然后 month
a column again): month
再次成为列): import numpy as np
import pandas as pd
month = list(range(1,4)) + list(range(6,13))
sales = np.array(month)*100
df = pd.DataFrame(dict(month=month, sales=sales))
print(df.set_index('month').reindex(range(1,13)).fillna(0).reset_index())
yields 产量
month sales
0 1 100
1 2 200
2 3 300
3 4 0
4 5 0
5 6 600
6 7 700
7 8 800
8 9 900
9 10 1000
10 11 1100
11 12 1200
# create a series of all months
all_months = pd.Series(data = range(1 , 13))
# get all missing months from your data frame in this example it will be 4 & 5
missing_months = all_months[~all_months.isin(sales_with_missing.month)]
# create a new data frame of missing months , it will be used in the next step to be concatenated to the original data frame
missing_df = pd.DataFrame({'month' : missing_months.values , 'code' : 111 , 'sales' : 0})
Out[36]:
code month sales
111 4 0
111 5 0
# then concatenate both data frames
pd.concat([sales_with_missing , missing_df]).sort_index(by = 'month')
Out[39]:
code month sales
111 1 1028
111 2 1163
111 3 961
111 4 0
111 5 0
111 6 687
111 7 31
111 8 607
111 9 1236
111 10 0863
111 11 11233
111 12 2780
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.