比较来自不同数据帧的多列具有相同长度 Pandas

Question

I have four dataframes with the following structure:我有四个具有以下结构的数据框：

df1
   max_proba    chosen_class
0   0.8            class_A
1   0.92           class_B
2   0.82           class_B
3   0.74           class_B
4   0.58           class_A

df2
   max_proba    chosen_class
0   0.6            class_C
1   0.62           class_D
2   0.87           class_D
3   0.94           class_C
4   0.62           class_D

# ... and same for df3 and df4 only chosen class values and probabilities that change!

I want to compare between columns "max_proba" between all the 4 dataframes and keep the maximum value with it's chosen class.我想比较所有 4 个数据帧之间的列“max_proba”，并在选择 class 的情况下保持最大值。

( for example: one sample, if df1 max_proba = 0,23,df2 max_proba = 0,86, df3 max_proba = 0,56, df4 max_proba = 76 ==> here I want only the chosen class with highest probability 0,86 which can be class_E (for example)) （例如：一个样本，如果 df1 max_proba = 0,23,df2 max_proba = 0,86, df3 max_proba = 0,56, df4 max_proba = 76 ==> 在这里我只想要选择的 class 概率最高为 0,86可以是class_E（例如））

Answer 1

If I got you right, you want to compare them row by row.如果我没听错，你想逐行比较它们。

You should join them into one data frame:您应该将它们加入一个数据框：

df = df1.append(df2)

Then make a new columns 'index' with number of row in previous dataframes and column 'level_0' with number of row in this dataframe:然后使用先前数据帧中的行数创建一个新列“索引”，并在此 dataframe 中创建一个具有行数的列“level_0”：

df = df.reset_index()
df = df.reset_index()

And find the indexes of rows with maximum for each index:并找到每个索引最大的行的索引：

indexes = df.groupby('index').apply(lambda x: x.max_proba == max(x['max_proba'])).reset_index()

Finally, select rows with maximum max_proba from the big data frame with our indexes:最后，select 行与我们的索引的大数据帧中的 max_proba 最大：

result = df.loc[indexes[indexes.max_proba].level_1.values]

The output will be like: output 将类似于：

level_0 index   max_proba   chosen_class
0   0   0   0.80    class_A
1   1   1   0.92    class_B
7   7   2   0.87    class_D
8   8   3   0.94    class_C
9   9   4   0.62    class_D

You can drop extra columns with function drop .您可以使用 function drop删除额外的列。

比较来自不同数据帧的多列具有相同长度 Pandas

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-07-01 11:40:29

比较来自不同数据帧的多列具有相同长度 Pandas

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-07-01 11:40:29

解决方案1
0 已采纳 2022-07-01 11:40:29