[英]Scroll through two data frames and compare a column of data
I have the following dataframes:我有以下数据框:
import pandas as pd
import numpy as np
df_Sensor = pd.DataFrame({'ID_System_Embed': ['1000', '1000', '1000', '1003', '1004'],
'Date_Time': ['2020-10-18 12:58:05', '2020-10-18 12:58:15',
'2020-10-19 20:10:10', '2018-12-18 12:58:00',
'2015-10-25 11:00:00']})
df_Period = pd.DataFrame({'ID_System_Embed': ['1000', '1000', '1001', '1002', '1003', '1004'],
'ID_Sensor': ['1', '2', '3', '4', '5', '6'],
'Date_Init': ['2020-10-18 12:58:00', '2020-10-18 19:58:00',
'2019-11-18 19:58:00', '2018-12-29 12:58:00',
'2019-11-20 12:58:00', '2015-10-25 10:00:00'],
'Date_End': ['2020-10-18 16:58:00', '2020-10-19 20:58:00',
'2019-11-25 12:58:00', '2018-12-18 12:58:00',
'2019-11-25 12:58:00', '2015-10-25 12:00:00']})
I need to detect if the date of the dataframe 'df_Sensor' is contained in the date range of the second dataframe (df_Period) for the same ID_System_Embed (Identifier of an embedded system).我需要检测数据帧“df_Sensor”的日期是否包含在同一 ID_System_Embed(嵌入式系统的标识符)的第二个数据帧(df_Period)的日期范围内。
I tried to implement the following code:我尝试实现以下代码:
df_Period['New_Column'] = 0
for j in range(0, len(df_Period)):
for i in range(0, len(df_Sensor)):
if((df_Sensor['ID_System_Embed'].iloc[i] == df_Period['ID_System_Embed'].iloc[j]) &
(df_Sensor['Date_Time'].iloc[i] >= df_Period['Date_Init'].iloc[j]) &
(df_Sensor['Date_Time'].iloc[i] <= df_Period['Date_End'].iloc[j])):
df_Period['New_Column'].iloc[j] += 1
This code is merging and is resulting in the expected output.此代码正在合并并产生预期的输出。 However, it is not very effective because it needs to iterate between the two data frames (using for).
但是,它不是很有效,因为它需要在两个数据帧之间进行迭代(使用 for)。 I would like to discover a faster and more efficient way to do the operation and result in the same output.
我想发现一种更快、更有效的方法来进行操作并产生相同的输出。
The output is:输出是:
ID_System_Embed ID_Sensor Date_Init Date_End New_Column
1000 1 2020-10-18 12:58:00 2020-10-18 16:58:00 2
1000 2 2020-10-18 19:58:00 2020-10-19 20:58:00 1
1001 3 2019-11-18 19:58:00 2019-11-25 12:58:00 0
1002 4 2018-12-29 12:58:00 2018-12-18 12:58:00 0
1003 5 2019-11-20 12:58:00 2019-11-25 12:58:00 0
1004 6 2015-10-25 10:00:00 2015-10-25 12:00:00 1
Group df_Period and df_Sensor by ['ID_System_Embed', 'ID_Sensor'] as unique keys按 ['ID_System_Embed', 'ID_Sensor'] 将 df_Period 和 df_Sensor 分组为唯一键
Then Aggregate values of other dates columns as a list using appnd function然后使用 appnd 函数将其他日期列的聚合值作为列表
def appnd(col):
return [d for d in col]
df_p = df_Period.copy().groupby(['ID_System_Embed', 'ID_Sensor']).agg(appnd)
df_s = df_Sensor.copy().groupby(['ID_System_Embed']).agg(appnd)
Then join the two dataframes (you may fill NaN with 0)然后加入两个数据框(你可以用 0 填充 NaN)
df = df_p.join(df_s).fillna(value = 0)
df['New_Column'] = 0
df
Apply this function to the dates columns mapping results to New_Column将此函数应用于将结果映射到 New_Column 的日期列
def inInterval(row):
ctr = 0
for d in row[2]:
for start, end in zip(row[0], row[1]):
if start <= d <= end: ctr +=1
return ctr
df['New_Column'] = df[ ['Date_Init', 'Date_End', 'Date_Time'] ].copy()\
.apply(lambda x: inInterval(x) if type(x[2]) == list else 0, axis = 1)
df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.