简体   繁体   English

使用python中具有唯一ID的另一个帧从另一个具有重复ID的数据帧中查找值

[英]Find the values from another data frame with repetitive ids using another frame with unique id in python

I really stuck in this problem and dont have any idea how to solve that.我真的陷入了这个问题,不知道如何解决。 I have two data frame, one is for the humidity and its data are reported every 15 minutes.我有两个数据框,一个是湿度,它的数据每 15 分钟报告一次。 I have three different sensors for reporting.我有三种不同的传感器用于报告。 So, the table includes the id, the date, and hour of the reporting.因此,该表包括报告的 ID、日期和时间。 Here is:这是:

df_h = pd.DataFrame({'id_h': {0: 1, 1: 1, 2: 2, 3: 2, 4: 3, 5: 3}, 'date': {0: '2021-01-01', 1: '2021-01-01', 2: '2021-01-01', 3: '2021-01-01', 4: '2021-01-01', 5: '2021-01-01'}, 'time_hour': {0: '6:00:00', 1: '6:15:00', 2: '6:00:00', 3: '6:15:00', 4: '6:00:00', 5: '6:15:00'}, 'VALUE': {0: 10, 1: 12, 2: 20, 3: 22, 4: 30, 5: 32}})

   id_h        date time_hour  VALUE
0     1  2021-01-01   6:00:00     10
1     1  2021-01-01   6:15:00     12
2     2  2021-01-01   6:00:00     20
3     2  2021-01-01   6:15:00     22
4     3  2021-01-01   6:00:00     30
5     3  2021-01-01   6:15:00     32

with the following code, I can stick its data together and for each id, in each day, I have the humidity.使用以下代码,我可以将其数据粘贴在一起,并且对于每个 id,我每天都有湿度。

humidity_sticked = df_h.pivot(index=["id_h", "date"], columns="time_hour", values="VALUE")
humidity_sticked.columns = [f"value_{i+1}" for i in range(humidity_sticked.shape[1])]
humidity_sticked  =humidity_sticked.reset_index()
As we can see, we have a table with three rows and two columns.

Also, I have another table which shows the temperature.另外,我还有一张显示温度的表格。 But, the id for the weather center is different.但是,天气中心的 id 是不同的。 For example, for id_h (id of humidity) = 1, 2 we only have the id_t (id of temperature) = 5 .例如,对于 id_h(湿度的 id)= 1、2,我们只有 id_t(温度的 id)= 5 So, we have exact same table for the temperature, but since the ids are different, I can not create the same stick table as humidity.所以,我们有完全相同的温度表,但由于 id 不同,我不能创建与湿度相同的棒表。 Here is the table for the temperature:这是温度表:

df_t = pd.DataFrame({'id_t': {0: 5, 1: 5, 2: 5, 3: 5, 4: 7}, 'date': {0: '2021-01-01', 1: '2021-01-01', 2: '2021-01-01', 3: '2021-01-01', 4: '2021-01-01'}, 'time_hour': {0: '6:00:00', 1: '6:15:00', 2: '6:00:00', 3: '6:15:00', 4: '6:00:00'}, 'VALUE': {0: -1, 1: -8, 2: -2, 3: -9, 4: -3}})

   id_t        date time_hour  VALUE
0     5  2021-01-01   6:00:00     -1
1     5  2021-01-01   6:15:00     -8
2     5  2021-01-01   6:00:00     -2
3     5  2021-01-01   6:15:00     -9
4     7  2021-01-01   6:00:00     -3

When I want to stick the values for id_t=5, I got an error.当我想保留 id_t=5 的值时,出现错误。 The desired output which I want is:我想要的期望输出是: 在此处输入图像描述

Explanation: for id_h=1,2 we have two 5. So, for the first two rows we consider as 1, the second two rows as id=2 and the last two rows are for id=3 which are for id_t=7.解释:对于 id_h=1,2,我们有两个 5。因此,对于前两行,我们认为是 1,后两行是 id=2,最后两行是 id=3,即 id_t=7。

Any help can save me!Thanks任何帮助都可以救我!谢谢

update: I've used the merge by the index, however, when I have missing values in one of the data frame, (for example for a specific date, at time 6:00 I have the humidity, but I don't have the temperature).更新:我已经使用了索引合并,但是,当我在一个数据框中缺少值时(例如对于特定日期,在 6:00 时我有湿度,但我没有温度)。 The results are wrong.结果是错误的。 Here is the the results of the merge by the index, we can see that the time is not same, but it put all them in one row.这是索引合并的结果,我们可以看到时间不一样,但它把它们都放在了一行。 在此处输入图像描述

df_t['rank'] = df_t.id_t.rank(method='dense')
df_h['rank'] = df_h.id_h.rank(method='dense')
df = df_t.merge(df_h, on=['rank', 'date', 'time_hour'], suffixes=['_1', '_2'])
print(df)

Output:输出:

   id_t        date time_hour  VALUE_1  rank  id_h  VALUE_2
0     5  2021-01-01   6:00:00       -1   1.0     1       10
1     5  2021-01-01   6:00:00       -2   1.0     1       10
2     5  2021-01-01   6:15:00       -8   1.0     1       12
3     5  2021-01-01   6:15:00       -9   1.0     1       12
4     7  2021-01-01   6:00:00       -3   2.0     2       20

You can use the pd.merge by index .您可以按index使用pd.merge This way is the shortcut to make your 'sticked dataframe'.这种方式是制作“粘贴数据框”的捷径。

pd.merge(df_t, df_h, left_index=True, right_index=True, suffixes=['_t', '_h'])

Output:输出:

 id_t      date_t time_hour_t  VALUE_t  id_h      date_h time_hour_h  \
0     5  2021-01-01     6:00:00       -1     1  2021-01-01     6:00:00   
1     5  2021-01-01     6:15:00       -8     1  2021-01-01     6:15:00   
2     5  2021-01-01     6:00:00       -2     2  2021-01-01     6:00:00   
3     5  2021-01-01     6:15:00       -9     2  2021-01-01     6:15:00   
4     7  2021-01-01     6:00:00       -3     3  2021-01-01     6:00:00   

   VALUE_h  
0       10  
1       12  
2       20  
3       22  
4       30  

The output above contains useless columns, so you can merge df_t and df_h[only you need to merge] like below:上面的输出包含无用的列,因此您可以合并df_tdf_h[only you need to merge] ,如下所示:

pd.merge(df_t, df_h[['id_h','VALUE']], left_index=True, right_index=True, suffixes=['_t', '_h'])

Output:输出:

   id_t        date time_hour  VALUE_t  id_h  VALUE_h
0     5  2021-01-01   6:00:00       -1     1       10
1     5  2021-01-01   6:15:00       -8     1       12
2     5  2021-01-01   6:00:00       -2     2       20
3     5  2021-01-01   6:15:00       -9     2       22
4     7  2021-01-01   6:00:00       -3     3       30

This is the simplest way you want.这是您想要的最简单的方法。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python - 如何使用python中另一个数据框中的列中的重复值为唯一行对数据框进行子集化? - How can I subset a data frame for unique rows using repeating values from a column in another data frame in python? 在Python中的另一个数据框中查找数据框的值 - Find values of data frame in another dataframe in python 在另一个具有不同 ID 的数据框中查找基于日期的值 - Find the values based on date in another data frame with different ids Pandas 从另一个数据帧中查找值 - Pandas find values from another data frame 使用另一个数据框的值更新数据框 - python - updating a data frame using the values of another data frame - python 从某个范围内的列中找到一个固定值,并在熊猫数据框中找到另一列的每个唯一值 - find a fix value from a column around a range with each unique values of another column in pandas data frame 如何使用来自另一个数据帧的索引ID切片数据帧? - How to slice a data frame using index ids from another data frame? 从一个数据框中获取唯一计数作为熊猫中另一个数据框中的值 - Get unique counts from one data frame as values in another data frame in Pandas 具有其他数据框(pandas,python)中唯一值的数据框 - Data frame with unique values from other data frame(pandas, python) 使用具有 boolean 值的另一个数据帧过滤数据帧值 - Filter data frame values using another data frame with boolean values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM