[英]Element wise numeric comparison in Pandas dataframe column value with list
I have 3 pandas multiindex column dataframes我有 3 个 pandas 多索引列数据帧
dataframe 1(minimum value): dataframe 1(最小值):
| A | B | C |
| Min | Min | Min |
|-------|-------|------|
0 | 26.47 | 17.31 | 1.26 |
1 | 27.23 | 14.38 | 1.36 |
2 | 27.23 | 18.88 | 1.28 |
dataframe 2 (value used to compare with) dataframe 2(用于比较的值)
row 0, row 1 and row 2 are the same, I extend the dataframe to three row for comparison with min and max dataframe.第 0 行、第 1 行和第 2 行相同,我将 dataframe 扩展到三行,以便与最小和最大 dataframe 进行比较。 Value in each dataframe cell is ndarray每个 dataframe 单元格中的值是 ndarray
| A | B | C |
| Val | Val | Val |
|---------------------|-----------------------|--------------------|
0 | [27.58,28.37,28.73] | [17.31, 18.42, 18.72] | [1.36, 1.28, 1.27] |
1 | [27.58,28.37,28.73] | [17.31, 18.42, 18.72] | [1.36, 1.28, 1.27] |
2 | [27.58,28.37,28.73] | [17.31, 18.42, 18.72] | [1.36, 1.28, 1.27] |
dataframe 3(maximum value): dataframe 3(最大值):
| A | B | C |
| Max | Max | Max |
|-------|-------|------|
0 | 28.68 | 18.42 | 1.37 |
1 | 29.50 | 17.31 | 1.47 |
2 | 29.87 | 20.45 | 1.39 |
Expected result:预期结果:
| A | B | C |
| Result | Result | Result |
|---------------------|-----------------------|----------------------|
0 | [True, True, False] | [True, True, False] | [True, True, True] |
1 | [True, True, True] | [True, False, False] | [True, False, False] |
2 | [True, True, True] | [False, False, False] | [True, True, False] |
I'd like to perform element wise comparison in this way:我想以这种方式进行元素比较:
min <= each element in ndarray <= max
ie IE
for row 0:
26.47 <= [27.58,28.37,28.73] <= 28.68
17.31 <= [17.31, 18.42, 18.72] <= 18.42
1.26 <= [1.36, 1.28, 1.27] <= 1.37
and so on等等
I tried ( datafram2 >= dataframe3 ) & ( datafram2 <= datafram3 )
but not work.我试过( datafram2 >= dataframe3 ) & ( datafram2 <= datafram3 )
但没有用。
What's the simplest way and fastest way to compute the result?计算结果的最简单和最快的方法是什么?
Example dataframe code:示例 dataframe 代码:
min_columns = pd.MultiIndex.from_product( [ [ 'A', 'B', 'C' ], [ 'Min' ] ] )
val_columns = pd.MultiIndex.from_product( [ [ 'A', 'B', 'C' ], [ 'Val' ] ] )
max_columns = pd.MultiIndex.from_product( [ [ 'A', 'B', 'C' ], [ 'Max' ] ] )
min_df = pd.DataFrame( [ [ 26.47, 17.31, 1.26 ], [ 27.23, 14.38, 1.36 ], [ 27.23, 18.88, 1.28 ] ], columns=min_columns )
val_df = pd.DataFrame( [ [ [ 27.58, 28.37, 28.73 ], [ 17.31, 18.42, 18.72], [1.36, 1.28, 1.27 ] ] ] , columns=val_columns )
max_df = pd.DataFrame( [ [ 28.68, 18.42, 1.37 ], [ 29.50, 17.31, 1.47 ], [ 29.87, 20.45, 1.39 ] ] , columns=max_columns )
If you've dataframe's like these:如果你有这样的数据框:
df1 = pd.DataFrame({'AMin': {0: 26.47, 1: 27.23, 2: 27.23},
'BMin': {0: 17.31, 1: 14.38, 2: 18.88},
'CMin': {0: 1.26, 1: 1.36, 2: 1.28}})
df2 = pd.DataFrame({'AVal': {0: [27.58, 28.37, 28.73],
1: [27.58, 28.37, 28.73],
2: [27.58, 28.37, 28.73]},
'BVal': {0: [17.31, 18.42, 18.72],
1: [17.31, 18.42, 18.72],
2: [17.31, 18.42, 18.72]},
'CVal': {0: [1.36, 1.28, 1.27], 1: [1.36, 1.28, 1.27], 2: [1.36, 1.28, 1.27]}})
df3 = pd.DataFrame({'AMax': {0: 28.68, 1: 29.5, 2: 29.87},
'BMax': {0: 18.42, 1: 17.31, 2: 20.45},
'CMax': {0: 1.37, 1: 1.47, 2: 1.39}})
Then you can explode
the 2nd dataframe and compare the values.然后你可以explode
第二个 dataframe 并比较值。
m = df2.apply(pd.Series.explode).values
df = pd.DataFrame(
(df1.iloc[np.arange(len(df1)).repeat(3)].values <= m) &
(m <= df3.iloc[np.arange(len(df3)).repeat(3)].values),
columns=df2.columns
)
df = df.groupby(df.index // 3).agg(list)
A B C
0 [True, True, False] [True, True, False] [True, True, True]
1 [True, True, True] [True, False, False] [True, False, False]
2 [True, True, True] [False, False, False] [True, True, False]
Just turn the column values into NumPy arrays.只需将列值转换为 NumPy arrays。 and simply treat it as an array comparing problem (row wise).并简单地将其视为数组比较问题(按行)。
You can use apply
:您可以使用apply
:
def bool_check(row):
col = row.name[0]
min_val = df1[pd.IndexSlice[col]].to_numpy()
max_val = df3[pd.IndexSlice[col]].to_numpy()
x = np.array(row.tolist())
return list((x >= min_val) & (x <= max_val))
res = df2.apply(bool_check,axis=0).rename(columns={'Val':'Result'})
res:资源:
A一个 | B乙 | C C | |
---|---|---|---|
Result结果 | Result结果 | Result结果 | |
0 0 | [True, True, False] [对,对,错] | [True, True, False] [对,对,错] | [True, True, True] [对,对,对] |
1 1 | [True, True, True] [对,对,对] | [True, False, False] [对,错,错] | [True, False, False] [对,错,错] |
2 2 | [True, True, True] [对,对,对] | [False, False, False] [假,假,假] | [True, True, False] [对,对,错] |
(Complete Solution Based on the data you've provided): (完整的解决方案基于您提供的数据):
def bool_check(row):
col = row.name[0]
min_val = min_df[pd.IndexSlice[col]].to_numpy()
max_val = max_df[pd.IndexSlice[col]].to_numpy()
x = np.array(row.tolist())
return list((x >= min_val) & (x <= max_val))
res = val_df.apply(bool_check,axis=0).rename(columns={'Val':'Result'})
Method 1 (Nk03's method1):方法一(Nk03的方法一):
CPU times: user 19.5 ms, sys: 0 ns, total: 19.5 ms Wall time: 18.9 ms CPU 时间:用户 19.5 毫秒,系统:0 ns,总计:19.5 毫秒挂壁时间:18.9 毫秒
Method 2 (Nk03's method2):方法2(Nk03的方法2):
CPU times: user 23 ms, sys: 102 µs, total: 23.1 ms Wall time: 21.9 ms CPU 时间:用户 23 毫秒,系统:102 微秒,总计:23.1 毫秒壁挂时间:21.9 毫秒
Method 3 (Using numpy based comparison):方法3(使用基于numpy的比较):
CPU times: user 8.76 ms, sys: 26 µs, total: 8.79 ms Wall time: 8.91 ms CPU 时间:用户 8.76 毫秒,系统:26 微秒,总计:8.79 毫秒挂壁时间:8.91 毫秒
Nk03's Updated and Optimized Solution: Nk03的更新优化方案:
CPU times: user 16 ms, sys: 0 ns, total: 16 ms Wall time: 15.5 ms CPU 时间:用户 16 毫秒,系统:0 纳秒,总计:16 毫秒挂壁时间:15.5 毫秒
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.