简体   繁体   English

如何将列添加到 pandas DataFrame,其值基于两个 DataFrame 中的匹配值

[英]How to add a column to a pandas DataFrame with values based on matching values in two DataFrames

I am working with two pandas DataFrames.我正在使用两个 pandas 数据帧。 One contains the performance data of different servers for every hour and looks something like this:一个包含不同服务器每小时的性能数据,如下所示:

Date日期 time时间 server_name服务器名称 CPU中央处理器 Memory Memory
2020-10-25 2020-10-25 300 300 server1服务器1 90.2 90.2 64.4 64.4
2020-10-25 2020-10-25 300 300 server2服务器2 50.4 50.4 23.3 23.3

In this case, '300' in the column 'time' means 3am.在这种情况下,“时间”列中的“300”表示凌晨 3 点。

The second DataFrame contains data to errors for the different servers and looks something like this:第二个 DataFrame 包含不同服务器的错误数据,如下所示:

server_name服务器名称 timestamp时间戳
server1服务器1 2020-10-25 00:45:04 2020-10-25 00:45:04
server2服务器2 2020-10-25 03:45:04 2020-10-25 03:45:04

I would like to have a column added to the first DataFrame with the performance metrics, which indicates for every server for every hour if an error occurred at this time.我想在第一个 DataFrame 中添加一列,其中包含性能指标,如果此时发生错误,它会指示每个服务器每小时的情况。 Please note that an error which occurred at 3:45am should be assigned to the row for 3am for the respective server.请注意,应将凌晨 3:45 发生的错误分配给相应服务器的凌晨 3 点的行。 It should look something like this:它应该看起来像这样:

Date日期 time时间 server_name服务器名称 CPU中央处理器 Memory Memory error错误
2020-10-25 2020-10-25 300 300 server1服务器1 90.2 90.2 64.4 64.4 0 0
2020-10-25 2020-10-25 300 300 server2服务器2 50.4 50.4 23.3 23.3 1 1

In this case, '1' in the column 'error' would mean that at this time, an error occurred on the server.在这种情况下,“错误”列中的“1”表示此时服务器上发生了错误。

I already tried merging the DataFrames on date, time and server_name and many other approaches, but I just don't get the desired results.我已经尝试过在日期、时间和 server_name 上合并 DataFrames 以及许多其他方法,但我只是没有得到想要的结果。

Assuming df1 is your first dataframe, and df2 is the second one, you could add a timestamp column to df1 by adding the Date and time column, and then use merge_asof to bind each row for the second frame to a row from that modified dataframe.假设df1是您的第一个 dataframe,而df2是第二个,您可以通过添加Datetime列将时间戳列添加到df1 ,然后使用merge_asof将第二帧的每一行绑定到修改后的 dataframe 中的一行。

From there, you could merge that new data frame into the first one, and a groupby and count should give the expected result.从那里,您可以将该新数据框合并到第一个数据框中,并且groupbycount应该会给出预期的结果。

Possible code:可能的代码:

df3 = pd.merge_asof(df2, df1.assign(timestamp=df1['Date']
                                    + pd.to_timedelta(df1['time']/100, 'H')),
                    by='server_name', on='timestamp',
                    tolerance=pd.Timedelta('1H'))

print(df3)

result = df1.merge(df3[['server_name', 'timestamp', 'Date', 'time']], 'left',
                   on=['server_name', 'Date', 'time']
                   ).groupby(['server_name',  'Date', 'time', 'CPU', 'Memory']
                             ).count().rename(columns={'timestamp': 'error'}
                                              ).reset_index()

With your data, it gives as expected:使用您的数据,它可以按预期提供:

  server_name       Date  time   CPU  Memory  error
0     server1 2020-10-25   300  90.2    64.4      0
1     server2 2020-10-25   300  50.4    23.3      1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据列中的值匹配两个 Pandas DataFrame - Matching Two Pandas DataFrames based on values in columns Python Pandas:根据匹配值在 dataframe 中添加列 - Python Pandas: add column in a dataframe based on a matching values 如何根据另外两个数据帧的值填充 Pandas 数据帧 - How to fill the Pandas Dataframe based on values from another two dataframes 如何合并 Pandas 中的两个不同大小的 DataFrame 以更新一个 dataframe 取决于一列中的部分值与另一列 dataframe 的匹配 - How to merge two different size DataFrames in Pandas to update one dataframe depends on matching partial values in one column with another dataframe 合并两个基于 Pandas 数据框的列值 - Merging two Pandas dataframes based column values Pandas DataFrames:如何根据另一个数据帧列中的值使用现有数据帧中的索引值定位行? - Pandas DataFrames: How to locate rows using index values in existing dataframe based on values from another dataframe column? Pandas - 基于多个匹配列值更新/合并 2 个数据帧 - Pandas - Update/Merge 2 Dataframes based on multiple matching column values Pandas基于多个匹配的列值合并2个数据帧 - Pandas merge 2 dataframes based on multiple matching column values 在具有匹配值的特定列上连接两个 Pandas DataFrame - Join two Pandas DataFrames on specific column with matching values Pandas - 根据与数据框中某个值匹配的系列索引,将系列中的值添加到数据框列 - Pandas - Add values from series to dataframe column based on index of series matching some value in dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM