當數據在范圍內時使用 python 進行 Vlookup

Question

我有兩個 excel 文件，我想使用 python 甚至 excel 執行 vlookup 並找到成本差異。

我的文件看起來像這樣

source_data.xlsx包含所覆蓋的距離及其價格，例如從 1 到 100 的距離范圍應收費 4800，從 101 到 120 的距離范圍應收費 5100。

DISTANCE     COST

1-100        4800

101-120      5100

121-140      5500

141-160      5900

161-180      6200

181-200      6600

210-220      6900

221-240      7200

分析.xlsx

loading_station  distance_travel     total_cost    status

PUGU                  40                4000       PAID


PUGU                  80                3200       PAID

MOROGORO              50                5000       PAID

MOROGORO              220               30400      PAID

DODOMA                150               5100       PAID

KIGOMA                90                2345       PAID

DODOMA                230               6000       PAID

DODOMA                180               16500      PAID

KIGOMA                32                3000       PAID

DODOMA                45                6000       PAID

DODOMA                65                5000       PAID

KIGOMA                77                1000       PAID

KIGOMA                90                4000       PAID

距離的實際成本在source_data.xlsx中給出，我想檢查Analysis.xlsx中的成本是否對應於實際值，我想檢測支付不足和多付。

所需的 Output 應該是這樣的，添加了兩列， source_cost是使用vlookup從source_xlsx的，而差異是total_cost和source_cost之間的差異

loading_station distance_travel total_cost  status  source_cost Difference

PUGU               40                4000     PAID     4800        -800

PUGU               80                3200     PAID     4800        -1600

MOROGORO           50                5000     PAID     4800         200

MOROGORO           220               30400    PAID     6900         23500

DODOMA             150               5100     PAID     5900         -800

KIGOMA             90                2345     PAID     4800         -2455

DODOMA             230               6000     PAID     7200         -1200

DODOMA             180               16500    PAID     6200          10300

KIGOMA             32                3000     PAID     4800          -1800

DODOMA             45                6000     PAID     4800           1200

DODOMA             65                5000     PAID     4800           200

KIGOMA             77                1000     PAID     4800           -3800

KIGOMA             90                4000     PAID     4800           -800

到目前為止我的代碼

# import pandas
import pandas as pd

# read excel data
source_data = pd.read_excel('source_data.xlsx')
analysis_file = pd.read_excel('analysis.xlsx')
source_data.head(5)
analysis_file.head(5)

Answer 1

您可以使用merge_asof ：

source_data["DISTANCE"] = source_data["DISTANCE"].str.split("-").str[1].astype("int64")
res = (pd.merge_asof(analysis_file.reset_index().sort_values("distance_travel"),
                     source_data,
                     left_on="distance_travel",
                     right_on="DISTANCE",
                     direction="forward")
       .set_index("index")
       .sort_index())
res["Difference"] = res["total_cost"] - res["COST"]

print (res)

      loading_station  distance_travel  total_cost status  DISTANCE  COST  Difference
index
0                PUGU               40        4000   PAID       100  4800        -800
1                PUGU               80        3200   PAID       100  4800       -1600
2            MOROGORO               50        5000   PAID       100  4800         200
3            MOROGORO              220       30400   PAID       220  6900       23500
4              DODOMA              150        5100   PAID       160  5900        -800
5              KIGOMA               90        2345   PAID       100  4800       -2455
6              DODOMA              230        6000   PAID       240  7200       -1200
7              DODOMA              180       16500   PAID       180  6200       10300
8              KIGOMA               32        3000   PAID       100  4800       -1800
9              DODOMA               45        6000   PAID       100  4800        1200
10             DODOMA               65        5000   PAID       100  4800         200
11             KIGOMA               77        1000   PAID       100  4800       -3800
12             KIGOMA               90        4000   PAID       100  4800        -800

請注意，這不考慮 0 行駛距離。 您需要單獨處理。

Answer 2

由於這是一個分類箱問題，我建議使用cut()並找到相應的值。

import pandas as pd
# create bins
bh = df_source['DISTANCE'].apply(lambda x: x.split('-')).apply(pd.Series).astype(int).values[:,0]
bt = df_source['DISTANCE'].apply(lambda x: x.split('-')).apply(pd.Series).astype(int).values[:,1]
bins = pd.IntervalIndex.from_arrays(bh, bt, closed='both')

print(bins)
###
IntervalIndex([[1, 100], [101, 120], [121, 140], [141, 160], [161, 180], [181, 200], [210, 220], [221, 240]], dtype='interval[int64, both]')

如圖所示， IntervalIndex , dtype='interval[int64, both]'

# find corresponding values
df_analysis['source_cost'] = pd.cut(df_analysis['distance_travel'], bins=bins).map(dict(zip(bins, df_source['COST']))).astype(int)

# calculation
df_analysis['Difference'] = df_analysis['total_cost'] - df_analysis['source_cost']

print(df_analysis)
###

加載站	distance_travel	總消耗	地位	source_cost	區別
普谷	40	4000	有薪酬的	4800	-800
普谷	80	3200	有薪酬的	4800	-1600
莫羅五郎	50	5000	有薪酬的	4800	200
莫羅五郎	220	30400	有薪酬的	6900	23500
多多瑪	150	5100	有薪酬的	5900	-800
基戈馬	90	2345	有薪酬的	4800	-2455
多多瑪	230	6000	有薪酬的	7200	-1200
多多瑪	180	16500	有薪酬的	6200	10300
基戈馬	32	3000	有薪酬的	4800	-1800
多多瑪	45	6000	有薪酬的	4800	1200
多多瑪	65	5000	有薪酬的	4800	200
基戈馬	77	1000	有薪酬的	4800	-3800
基戈馬	90	4000	有薪酬的	4800	-800

當數據在范圍內時使用 python 進行 Vlookup

問題描述

1 個解決方案

解決方案1
0 2022-08-11 17:40:16

解決方案2
0 已采納 2022-08-11 18:18:53

當數據在范圍內時使用 python 進行 Vlookup

問題描述

1 個解決方案

解決方案1 0 2022-08-11 17:40:16

解決方案2 0 已采納 2022-08-11 18:18:53

解決方案1
0 2022-08-11 17:40:16

解決方案2
0 已采納 2022-08-11 18:18:53