如何檢查 1 個數據幀中的列中的整數值是否存在於第 2 個數據幀中 2 列之間的范圍拆分中？

Question

為了更好地解釋這個問題：

我有2個數據框：

DF1（主）：

    CodeRange                                             Sector Start   End
0   0100-0999                  Agriculture, Forestry and Fishing  0100  0999
1   1000-1499                                             Mining  1000  1499
2   1500-1799                                       Construction  1500  1799
3   1800-1999                                           not used  1800  1999
4   2000-3999                                      Manufacturing  2000  3999
5   4000-4999  Transportation, Communications, Electric, Gas ...  4000  4999
6   5000-5199                                    Wholesale Trade  5000  5199
7   5200-5999                                       Retail Trade  5200  5999
8   6000-6799                 Finance, Insurance and Real Estate  6000  6799
9   7000-8999                                           Services  7000  8999
10  9100-9729                              Public Administration  9100  9729
11  9900-9999                                    Nonclassifiable  9900  9999

和 DF2：

    SICCode Sector
0   1230    Agro
1   4974    Utils
2   5120    shops
3   9997    Utils

在 DF1 中，我能夠將“CodeRange”列值拆分為 2 列（“Start”和“End”）並將它們轉換為 int。

我基本上想檢查 DF2 中的每個 SICCode 是否存在於哪個范圍之間，並將 DF2 中的“Sector”值更新為 DF1 中“Division”列下的相應值。

最終的 DF2 應如下所示：

DF2：

    SICCode Sector
0   1230    Agriculture, Forestry and Fishing
1   4974    Transportation, Communication...
2   5120    Wholesale Trade
3   9997    Non-classifiable

Answer 1

更緊湊的解決方案，沒有循環

關鍵是通過將數字除以 1000 創建索引“start_idx”，以幫助我們合並，隨后，我們檢查 SICCode 是否在范圍內，當它不在時，我們將除法設為空白

df3= df.assign(start_idx=(df['Start']//1000).astype(int)).merge(
    df2.assign(start_idx=(df2['SICCode']//1000).astype(int)), on='start_idx', how='left')
df3['Divison']=np.where( (df3['SICCode']> df3['Start']) &
                       (  df3['SICCode']<=df3['End']  ), df3['Sector_y'], "")
df3.drop(columns=['start_idx','x_y','SICCode','Sector_y'])

    x_x     CodeRange   Sector_x                        Start   End     Divison
0   0   0100-0999   Agriculture, Forestry and Fishing   100     999     
1   1   1000-1499   Mining                              1000    1499    Agro
2   2   1500-1799   Construction                        1500    1799    
3   3   1800-1999   not used                            1800    1999    
4   4   2000-3999   Manufacturing                       2000    3999    
5   5   4000-4999   Transportation, Communications, Electric, Gas ...   4000    4999    Utils
6   6   5000-5199   Wholesale Trade                      5000   5199    Shops
7   7   5200-5999   Retail Trade                        5200    5999    
8   8   6000-6799   Finance, Insurance and Real Estate  6000    6799    
9   9   7000-8999   Services                            7000    8999    
10  10  9100-9729   Public Administration               9100    9729    
11  11  9900-9999   Nonclassifiable                     9900    9999    Utils

Answer 2

您絕對可以使用我認為的掩碼來優化我的解決方案，但您可以通過以下方式實現：

data = []
for i in range(len(df2)):
    code = df2["SICCode"].iloc[i]
    for j in range(len(df1)):
        start = df1["Start"].iloc[j]
        end = df1["End"].iloc[j]
        if code >= start and code <= end:
            data.append(df1["Sector"].iloc[j])
            continue # to move to the next i

df2["Sector"] = data

如何檢查 1 個數據幀中的列中的整數值是否存在於第 2 個數據幀中 2 列之間的范圍拆分中？

問題描述

2 個解決方案

解決方案1
1 2022-06-22 17:17:12

解決方案2
0 2022-06-22 16:37:42

如何檢查 1 個數據幀中的列中的整數值是否存在於第 2 個數據幀中 2 列之間的范圍拆分中？

問題描述

2 個解決方案

解決方案1 1 2022-06-22 17:17:12

解決方案2 0 2022-06-22 16:37:42

解決方案1
1 2022-06-22 17:17:12

解決方案2
0 2022-06-22 16:37:42