如何将列添加到 pandas DataFrame，其值基于两个 DataFrame 中的匹配值

Question

I am working with two pandas DataFrames.我正在使用两个 pandas 数据帧。 One contains the performance data of different servers for every hour and looks something like this:一个包含不同服务器每小时的性能数据，如下所示：

Date日期	time时间	server_name服务器名称	CPU中央处理器	Memory Memory
2020-10-25 2020-10-25	300 300	server1服务器1	90.2 90.2	64.4 64.4
2020-10-25 2020-10-25	300 300	server2服务器2	50.4 50.4	23.3 23.3

In this case, '300' in the column 'time' means 3am.在这种情况下，“时间”列中的“300”表示凌晨 3 点。

The second DataFrame contains data to errors for the different servers and looks something like this:第二个 DataFrame 包含不同服务器的错误数据，如下所示：

server_name服务器名称	timestamp时间戳
server1服务器1	2020-10-25 00:45:04 2020-10-25 00:45:04
server2服务器2	2020-10-25 03:45:04 2020-10-25 03:45:04

I would like to have a column added to the first DataFrame with the performance metrics, which indicates for every server for every hour if an error occurred at this time.我想在第一个 DataFrame 中添加一列，其中包含性能指标，如果此时发生错误，它会指示每个服务器每小时的情况。 Please note that an error which occurred at 3:45am should be assigned to the row for 3am for the respective server.请注意，应将凌晨 3:45 发生的错误分配给相应服务器的凌晨 3 点的行。 It should look something like this:它应该看起来像这样：

Date日期	time时间	server_name服务器名称	CPU中央处理器	Memory Memory	error错误
2020-10-25 2020-10-25	300 300	server1服务器1	90.2 90.2	64.4 64.4	0 0
2020-10-25 2020-10-25	300 300	server2服务器2	50.4 50.4	23.3 23.3	1 1

In this case, '1' in the column 'error' would mean that at this time, an error occurred on the server.在这种情况下，“错误”列中的“1”表示此时服务器上发生了错误。

I already tried merging the DataFrames on date, time and server_name and many other approaches, but I just don't get the desired results.我已经尝试过在日期、时间和 server_name 上合并 DataFrames 以及许多其他方法，但我只是没有得到想要的结果。

Answer 1

Assuming df1 is your first dataframe, and df2 is the second one, you could add a timestamp column to df1 by adding the Date and time column, and then use merge_asof to bind each row for the second frame to a row from that modified dataframe.假设df1是您的第一个 dataframe，而df2是第二个，您可以通过添加Date和time列将时间戳列添加到df1 ，然后使用merge_asof将第二帧的每一行绑定到修改后的 dataframe 中的一行。

From there, you could merge that new data frame into the first one, and a groupby and count should give the expected result.从那里，您可以将该新数据框合并到第一个数据框中，并且groupby和count应该会给出预期的结果。

Possible code:可能的代码：

df3 = pd.merge_asof(df2, df1.assign(timestamp=df1['Date']
                                    + pd.to_timedelta(df1['time']/100, 'H')),
                    by='server_name', on='timestamp',
                    tolerance=pd.Timedelta('1H'))

print(df3)

result = df1.merge(df3[['server_name', 'timestamp', 'Date', 'time']], 'left',
                   on=['server_name', 'Date', 'time']
                   ).groupby(['server_name',  'Date', 'time', 'CPU', 'Memory']
                             ).count().rename(columns={'timestamp': 'error'}
                                              ).reset_index()

With your data, it gives as expected:使用您的数据，它可以按预期提供：

  server_name       Date  time   CPU  Memory  error
0     server1 2020-10-25   300  90.2    64.4      0
1     server2 2020-10-25   300  50.4    23.3      1

如何将列添加到 pandas DataFrame，其值基于两个 DataFrame 中的匹配值

问题描述

1 个解决方案

解决方案1
0 2021-03-01 12:40:12

如何将列添加到 pandas DataFrame，其值基于两个 DataFrame 中的匹配值

问题描述

1 个解决方案

解决方案1 0 2021-03-01 12:40:12

解决方案1
0 2021-03-01 12:40:12