简体   繁体   English

Pandas将Grouped-by数据帧与每个组的另一个数据帧合并

[英]Pandas Merge a Grouped-by dataframe with another dataframe for each group

I have a datframe like: 我有一个像这样的数据框:

id  date        temperature

1   2011-09-12   12
    2011-09-15   12
    2011-10-13   12
2   2011-12-12   14
    2011-12-24   15

I want to make sure that each device id has temperature recordings for each day, if the value exists it will be copied from above if it doesn't i will put 0. 我想确保每个设备ID都有每天的温度记录,如果值存在,它将从上面复制,如果它没有我将放0。

so, I prepare another dataframe which has dates for the entire year: 所以,我准备了另一个具有全年日期的数据框:

using pd.DataFrame(0, index=pd.range('2011-01-01', '2011-12-12'), columns=['temperature']) 使用pd.DataFrame(0, index=pd.range('2011-01-01', '2011-12-12'), columns=['temperature'])

date        temperature

2011-01-01     0
.
.
.
2011-12-12    0

Now, for each id I want to merge this dataframe so that I have entire year's entry for each of the id. 现在,对于每个id,我想合并这个数据帧,这样我就可以获得每个id的全年条目。

I am stuck at the merge step, just merging on the date column does not work, ie 我陷入了合并步骤,只是合并日期列不起作用,即

pd.merge(df1, df2, on=['date'])

gives a blank dataframe. 给出一个空白的数据帧。

Create MultiIndex by MultiIndex.from_product and merge by both MultiIndex es: 通过MultiIndex.from_product创建MultiIndex并通过MultiIndex es合并:

mux = pd.MultiIndex.from_product([df.index.levels[0], 
                                  pd.date_range('2011-01-01', '2011-12-12')],
                                  names=['id','date'])
df1 = pd.DataFrame(0, index=mux, columns=['temperature'])

df = pd.merge(df1, df, left_index=True, right_index=True, how='left')

If want only one column temperature : 如果想只有一个temperature

df = pd.merge(df1, df, left_index=True, right_index=True, how='left', suffixes=('','_'))
df['temperature'] = df.pop('temperature_').fillna(df['temperature'])

Another idea is use itertools.product for 2 columns DataFrame: 另一个想法是使用itertools.product2 columns DataFrame:

from  itertools import product
data = list(product(df.index.levels[0],  pd.date_range('2011-01-01', '2011-12-12')))

df1 = pd.DataFrame(data, columns=['id','date'])
df = pd.merge(df1, df, left_on=['id','date'], right_index=True, how='left')

Another idea is use DataFrame.reindex : 另一个想法是使用DataFrame.reindex

mux = pd.MultiIndex.from_product([df.index.levels[0], 
                                  pd.date_range('2011-01-01', '2011-12-12')],
                                  names=['id','date'])

df = df.reindex(mux, fill_value=0)

As an alternative to jezrael's answer , you could also do the following iteration, especially if you want to keep your device id intact: 作为jezrael的答案的替代方案,您还可以执行以下迭代,尤其是如果您希望保持设备ID完整:

data={"date":[pd.Timestamp('2011-09-12'), pd.Timestamp('2011-09-15'), pd.Timestamp('2011-10-13'),pd.Timestamp('2011-12-12'),pd.Timestamp('2011-12-24')],"temperature":[12,12,12,14,15],"sensor_id":[1,1,1,2,2]}
df1=pd.DataFrame(data,index=data["sensor_id"])

df2=pd.DataFrame(0, index=pd.date_range('2011-01-01', '2011-12-12'), columns=['temperature','sensor_id'])

for i,row in df1.iterrows():
    df2.loc[df2.index==row["date"], ['temperature']] = row['temperature']
    df2.loc[df2.index==row["date"], ['sensor_id']] = row['sensor_id']

for t in data["date"]:
    print(df2[df2.index==t])

Note that df2 in your question only goes to 2011-12-12 , hence the last print() will return an empty DataFrame. 请注意,您的问题中的df2仅转到2011-12-12 ,因此最后一个print()将返回一个空的DataFrame。 I wasn't whether you did this on purpose. 我不是故意这样做的。

Also, depending on the variability and density in your actual data, it might make sense to use: 此外,根据实际数据的可变性和密度,使用可能有意义:

for s in [1,2]: ## iterate over device ids
    ma=(df['sensor_id']==s)
    df.loc[ma]=df.loc[ma].fillna(method='ffill') # fill forward

hence an incomplete time series would be filled (forward) by the last measured temperature value. 因此,不完整的时间序列将被最后测量的温度值填充(向前)。 Depends on the quality of your data, of course, and df.resample() might make more sense. 当然,取决于数据的质量, df.resample()可能更有意义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM