![](/img/trans.png)
[英]Generate a new dataframe with boolean column based on comparing other dataframes values
[英]Comparing values in a list of dataframes to another dataframe column
我陷入了以下問題。 我將df_input
作為輸入數據框,其中僅包含1個稱為Site_Sector的列。 Site_Sector具有以下結構:
Site_Sector
--------------
DEP_1234
TRE_5421
YUT_0901
IOP_ABC3
POS_3456
MEC_2341
XAZ_4532
QPI_9012
KPI_1200
LPO_1300
KIN_9012
SVP_0001
....
JOP_1289
我有3個稱為df_cr,df_gt和df_ba的數據幀,它們包含在列表中, list_of_dfs = [df_cr,df_gt,df_ba]
。 它們具有以下結構(我將僅鍵入兩個數據幀):
#let's consider some data of df_cr as example
| Date | Site | Sector | KPI_1 | QA_value | Active |
| --------- |---------- |----------|----------|----------| ------ |
09/12/2015 CR_XAZ XAZ_4532 50.0 100.0 Y
09/12/2015 CR_PET PET_2312 50.0 100.0 Y
09/13/2015 CR_XAZ XAZ_4532 50.0 100.0 Y
09/13/2015 CR_PET PET_2312 50.0 100.0 Y
09/14/2015 CR_XAZ XAZ_4532 30.0 60.0 Y
09/14/2015 CR_PET PET_2312 25.0 50.0 N
09/15/2015 CR_XAZ XAZ_4532 25.0 50.0 N
09/15/2015 CR_PET PET_2312 40.0 80.0 Y
09/16/2015 CR_XAZ XAZ_4532 35.0 70.0 Y
09/16/2015 CR_PET PET_2312 45.0 90.0 Y
09/17/2015 CR_XAZ XAZ_4532 15.0 30.0 N
09/17/2015 CR_PET PET_2312 50.0 100.0 Y
.....
09/25/2015 CR_XAZ PET_4532 12.0 24.0 N
09/25/2015 CR_PET XAZ_2312 12.0 24.0 N
#let's consider some data of df_ba as example
| Date | Site | Sector | KPI_1 | QA_value | Active |
| --------- |--------- |----------| ---------|----------| ------ |
09/12/2015 CR_DEP DEP_1234 35.0 70.0 Y
09/12/2015 CR_XZT XZT_1212 50.0 100.0 Y
09/13/2015 CR_DEP DEP_1234 15.0 30.0 N
09/13/2015 CR_XZT XZT_1212 50.0 100.0 Y
09/14/2015 CR_DEP DEP_1234 35.0 70.0 Y
09/14/2015 CR_XZT XZT_1212 25.0 50.0 Y
09/15/2015 CR_DEP DEP_1234 25.0 50.0 Y
09/15/2015 CR_XZT XZT_1212 40.0 80.0 Y
09/16/2015 CR_DEP DEP_1234 15.0 30.0 N
09/16/2015 CR_XZT XZT_1212 45.0 90.0 Y
09/17/2015 CR_DEP DEP_1234 50.0 100.0 Y
09/17/2015 CR_XZT XZT_1212 50.0 100.0 Y
.....
09/25/2015 CR_DEP DEP_1234 10.0 20.0 N
09/25/2015 CR_XZT XZT_1212 50.0 100.0 Y
我的目標是將Site_Sector列數據幀的每個值與列表中包含的每個數據幀的Sector列的每個值進行比較 。 如果Site_Sector和Sector列之間存在匹配項,則將列Date,KPI_1,QA_value和Active添加到df_input數據框中。
#expected output
Site_Sector| Date | KPI_1| QA_value | Active
----------------------------------------------------
DEP_1234 09/12/2015 35.0 70.0 Y
DEP_1234 09/13/2015 15.0 30.0 N
DEP_1234 09/14/2015 35.0 70.0 Y
DEP_1234 09/15/2015 25.0 50.0 N
....
XAZ_4532 09/12/2015 50.0 100.0 Y
XAZ_4532 09/13/2015 50.0 100.0 Y
XAZ_4532 09/14/2015 30.0 60.0 Y
XAZ_4532 09/15/2015 25.0 50.0 N
....
如果有不清楚的地方或需要更多詳細信息,請對此帖子發表評論,我將很樂於解釋更多。
我會用列表理解 + pd.Series.isin
做到這pd.Series.isin
:
data = df_input.Site_Sector
filtered_dfs = [x[x.Sector.isin(data)] for x in list_of_dfs]
output = pd.concat(filtered_dfs).drop('Site', 1)
對於您的輸入,這是您得到的:
print(output.sort_values('Sector'))
Date Sector KPI_1 QA_value Active
0 09/12/2015 DEP_1234 35.0 70.0 Y
2 09/13/2015 DEP_1234 15.0 30.0 N
4 09/14/2015 DEP_1234 35.0 70.0 Y
6 09/15/2015 DEP_1234 25.0 50.0 Y
8 09/16/2015 DEP_1234 15.0 30.0 N
10 09/17/2015 DEP_1234 50.0 100.0 Y
12 09/25/2015 DEP_1234 10.0 20.0 N
0 09/12/2015 XAZ_4532 50.0 100.0 Y
2 09/13/2015 XAZ_4532 50.0 100.0 Y
4 09/14/2015 XAZ_4532 30.0 60.0 Y
6 09/15/2015 XAZ_4532 25.0 50.0 N
8 09/16/2015 XAZ_4532 35.0 70.0 Y
10 09/17/2015 XAZ_4532 15.0 30.0 N
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.