對於給定的 bin，如何確定一個數組中的任何值是否低於另一個數組中的任何值？

Question

我正在嘗試比較不同的線，以了解一條線是否高於另一條線，如果不是，則發生這種變化的x 。

如果我有相同的x值和相同的長度，那將非常容易，並且僅在行的y s 中有所不同。

但是我對不同的線有不同的x值，並且向量的長度不同，但所有曲線的x間隔都相同。

作為一個非常簡單的例子，我使用以下數據：

#curve 1: len = 9
x1 = np.array([5,6,7,8,9,10,11,12,13])
y1 = np.array([100,101,110,130,132,170,190,192,210])

#curve 2: len = 10
x2 = np.array([3,4,5,6,7,8,9,10,11,12])
y2 = np.array([90,210,211,250,260,261,265,180,200,210])

#curve 3: len = 8
x3 = np.array([7.3,8.3,9.3,10.3,11.3,12.3,13.3,14.3])
y3 = np.array([300,250,270,350,380,400,390,380])

它們應該是 2 條回歸線。 在這個簡單的例子中，結果應該是曲線 2 在所有x范圍內都比曲線 1 具有更高的值。

我試圖將x在 2.5-12.5 的范圍內與 bin 長度為 1 進行比較，以比較每個 bin 中相應的y s。

我的實際數據很大，這個比較需要做很多次，所以我需要找到一個不需要太多時間的解決方案。

陰謀

給定 x 軸的數據圖

plt.figure(figsize=(6, 6))
plt.plot(x1, y1, marker='o', label='y1')
plt.plot(x2, y2, marker='o', label='y2')
plt.plot(x3, y3, marker='o', label='y3')
plt.xticks(range(15))
plt.legend()
plt.grid()
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')

Answer 1

職能

def get_new_x使用np.digitize重新組合 x 軸值。
def get_comparison為比較的每兩列添加一def get_comparison
- 當前，每個新列都添加到主數據框df ，但是可以將其更新為單獨的comparison數據框。
- combs是一個列表列組合
  - [Index(['y1', 'y2'], dtype='object'), Index(['y2', 'y3'], dtype='object')]

# function to create the bins 
def get_bins(x_arrays: List[np.array]) -> np.array:
    bin_len = np.diff(x_arrays[0][:2])  # calculate bin length 
    all_x = np.concatenate(x_arrays)  # join arrays
    min_x = min(all_x)  # get min
    max_x = max(all_x)  # get max
    return np.arange(min_x, max_x + bin_len, bin_len)


# function using np.digitize to bin the old x-axis into new bins
def get_new_x(x_arrays: List[np.array]) -> List[np.array]:
    bins = get_bins(x_arrays)  # get the bins
    x_new = list()
    for x in x_arrays:
        x_new.append(bins[np.digitize(np.round(x), bins, right=True)])  # determine bins
    return x_new


# function to create dataframe for arrays with new x-axis as index
def get_df(x_arrays: List[np.array], y_arrays: List[np.array]) -> pd.DataFrame:
    x_new = get_new_x(x_arrays)
    return pd.concat([pd.DataFrame(y, columns=[f'y{i+1}'], index=x_new[i]) for i, y in enumerate(y_arrays)], axis=1)


# compare each successive column of the dataframe
# if the left column is greater than the right column, then True
def get_comparison(df: pd.DataFrame):
    cols = df.columns
    combs = [cols[i:i+2] for i in range(0, len(cols), 1) if i < len(cols)-1]
    for comb in combs:
        df[f'{comb[0]} > {comb[1]}'] = df[comb[0]] > df[comb[1]]

調用函數：

import numpy as np
import pandas as pd

# put the arrays into a list
y = [y1, y2, y3]
x = [x1, x2, x3]

# call get_df
df = get_df(x, y)

# call get_comparison
get_comparison(df)

# get only the index of True values with Boolean indexing
for col in df.columns[3:]:
    vals = df.index[df[col]].tolist()
    if vals:
        print(f'{col}: {vals}')

[out]:
y2 > y3: [8.0]

顯示（df）

         y1     y2     y3  y1 > y2  y2 > y3
3.0     NaN   90.0    NaN    False    False
4.0     NaN  210.0    NaN    False    False
5.0   100.0  211.0    NaN    False    False
6.0   101.0  250.0    NaN    False    False
7.0   110.0  260.0  300.0    False    False
8.0   130.0  261.0  250.0    False     True
9.0   132.0  265.0  270.0    False    False
10.0  170.0  180.0  350.0    False    False
11.0  190.0  200.0  380.0    False    False
12.0  192.0  210.0  400.0    False    False
13.0  210.0    NaN  390.0    False    False
14.0    NaN    NaN  380.0    False    False

陰謀

fig, ax = plt.subplots(figsize=(8, 6))

# add markers for problem values
for i, col in enumerate(df.columns[3:], 1):
    vals = df.iloc[:, i][df[col]]
    if not vals.empty:
        ax.scatter(vals.index, vals.values, color='red', s=110, label='bad')

df.iloc[:, :3].plot(marker='o', ax=ax)  # plot the dataframe        

plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(range(16))
plt.title('y-values plotted against rebinned x-values')
plt.grid()
plt.show()

Answer 2

這是我第一次問這個問題時腦海中的答案，但當時無法讓它發揮作用。 我的想法是基於 x 對 y1 和 y2 進行分箱，並在每個分箱中比較這兩個。 所以，作為一個例子，我有 3 條曲線，我想比較它們。 這些曲線中唯一相似的是delta x （bin 長度），這里為 1。

import numpy as np
import pandas as  pd
import matplotlib.pyplot as plt

#curve 1
x1 = np.array([5,6,7,8,9,10,11,12,13])
y1 = np.array([100,101,110,130,132,170,190,192,210])

#curve 2
x2 = np.array([3,4,5,6,7,8,9,10,11,12])
y2 = np.array([90,210,211,250,260,261,265,180,200,210])

#curve 3
x3 = np.array([7.3,8.3,9.3,10.3,11.3,12.3,13.3,14.3])
y3 = np.array([300,250,270,350,380,400,390,380])

bin_length = 1
# x values have same intervals both in x1 and x2

x_min = min(x1[0],x2[0],x3[0])-bin_length/2
x_max = max(x1[-1],x2[-1],x3[-1])+bin_length/2

bins = np.arange(x_min,x_max+bin_length,bin_length)

# bin mid points to use as index
bin_mid = []
for i in range(len(bins)-1):
    # compute mid point of the bins
    bin_mid.append((bins[i] + bins[i+1])/2)

# This function bins y based on binning x
def bin_fun(x,y,bins,bin_length):
    c = list(zip(x, y))
    # define final out put of the function
    final_y_binning = []
    # define a list for holding members of each bin
    bined_y_members = []
    # compute length of each bin

    for i in range(len(bins)-1):
        # compute high and low threshold of the bins
        low_threshold = bins[i]
        high_threshold = bins[i+1]

        # bin y according to x
        for member in c:
            if (member[0] < high_threshold and member[0] >= low_threshold):
                bined_y_members.append(member[1])
                
        final_y_binning.append(bined_y_members)
        # fill out the container of the bin members

        bined_y_members=[]

        df = pd.DataFrame(final_y_binning)
    return(df)


binned_y =pd.DataFrame(columns=[1,2,3])

Y1 = bin_fun(x1,y1,bins, bin_length)
Y1.columns =[1]

Y2 = bin_fun(x2,y2,bins, bin_length)
Y2.columns =[2]

Y3 = bin_fun(x3,y3,bins, bin_length)
Y3.columns =[3]

binned_y = binned_y.append(Y1)
binned_y[2] = Y2
binned_y[3] = Y3

binned_y.index = bin_mid

print(binned_y)

# comparing curve 2 and curve 1
for i in binned_y.index:
    if (binned_y.loc[i][2]-binned_y.loc[i][1]<0):
        print(i)

 # comparing curve 3 and curve 2
for i in binned_y.index:
    if (binned_y.loc[i][3]-binned_y.loc[i][2]<0):
        print(i)

這將返回 8，這是 y3<y2` 的索引

`binned_y`

          1      2      3
3.0     NaN   90.0    NaN
4.0     NaN  210.0    NaN
5.0   100.0  211.0    NaN
6.0   101.0  250.0    NaN
7.0   110.0  260.0  300.0
8.0   130.0  261.0  250.0
9.0   132.0  265.0  270.0
10.0  170.0  180.0  350.0
11.0  190.0  200.0  380.0
12.0  192.0  210.0  400.0
13.0  210.0    NaN  390.0
14.0    NaN    NaN  380.0
15.0    NaN    NaN    NaN

陰謀

binned_y.plot(marker='o', figsize=(6, 6))  # plot the dataframe
plt.legend(labels=['y1', 'y2', 'y3'], bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(range(16))
plt.grid()

對於給定的 bin，如何確定一個數組中的任何值是否低於另一個數組中的任何值？

問題描述

陰謀

2 個解決方案

解決方案1
1 已采納 2020-09-03 20:36:10

職能

調用函數：

顯示（df）

陰謀

解決方案2
0 2020-09-07 20:22:56

`binned_y`

陰謀

對於給定的 bin，如何確定一個數組中的任何值是否低於另一個數組中的任何值？

問題描述

陰謀

2 個解決方案

解決方案1 1 已采納 2020-09-03 20:36:10

職能

調用函數：

顯示（df）

陰謀

解決方案2 0 2020-09-07 20:22:56

binned_y

陰謀

解決方案1
1 已采納 2020-09-03 20:36:10

解決方案2
0 2020-09-07 20:22:56

`binned_y`