简体   繁体   English

熊猫:如何填写缺少的“年,周”列?

[英]Pandas: How to fill missing Year, Week columns?

I have a dataframe with [Year] & [Week] columns sometimes missing. 我有一个[年]和[周]列有时不见的数据框。 I have another dataframe that is a calendar for reference from which I can get these missing values. 我有另一个数据框,可以作为日历参考,从中可以获取这些缺失的值。 How to fill these missing columns using pandas? 如何使用熊猫填充这些缺失的列?

I have tried using reindex to set them up, but I am getting the following error 我尝试使用reindex进行设置,但是出现以下错误

ValueError: Buffer has wrong number of dimensions (expected 1, got 2) ValueError:缓冲区的维数错误(预期为1,得到2)

import pandas as pd

d1 = {'Year': [2019,2019,2019,2019,2019], 'Week':[1,2,4,6,7], 'Value': 
[20,40,60,75,90]}
d2 = {'Year': [2019,2019,2019,2019,2019,2019,2019,2019,2019,2019], 'Week':[1,2,3,4,5,6,7,8,9,10]}

df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

df1 = df1.set_index(['Year', 'Week'])
df2 = df2.set_index(['Year', 'Week'])

df1 = df1.reindex(df2, fill_value=0)

print(df1)

You should adding index so df2.index 您应该添加index以便df2.index

df1.reindex(df2.index,fill_value=0)
Out[851]: 
           Value
Year Week       
2019 1        20
     2        40
     3         0
     4        60
     5         0
     6        75
     7        90

df2.index.difference(df1.index)
Out[854]: 
MultiIndex(levels=[[2019], [3, 5]],
           labels=[[0, 0], [0, 1]],
           names=['Year', 'Week'],
           sortorder=0)

Update 更新资料

s=df1.reindex(df2.index)
s[s.bfill().notnull().values].fillna(0)
Out[877]: 
           Value
Year Week       
2019 1      20.0
     2      40.0
     3       0.0
     4      60.0
     5       0.0
     6      75.0
     7      90.0
import pandas as pd

d1 = {'Year': [2019,2019,2019,2019,2019], 'Week':[1,2,4,6,7], 'Value': 
[20,40,60,75,90]}
d2 = {'Year': [2019,2019,2019,2019,2019,2019,2019], 'Week':[1,2,3,4,5,6,7]}

df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)

df1 = df1.set_index(['Year', 'Week'])
df2 = df2.set_index(['Year', 'Week'])

fill_value = df1['Value'].mean() #value to fill `NaN` rows with - can choose another logic if you do not want the mean
df1 = df1.join(df2, how='right')


df1.fillna(value=fill_value,axis=1) # Fill missing data here
print(df1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM