[英]Comparing values in a list of dataframes to another dataframe column
I am stuck in the following problem. 我陷入了以下问题。 I have df_input
as my input data frame which contains only 1 column called Site_Sector. 我将df_input
作为输入数据框,其中仅包含1个称为Site_Sector的列。 Site_Sector has the following structure: Site_Sector具有以下结构:
Site_Sector
--------------
DEP_1234
TRE_5421
YUT_0901
IOP_ABC3
POS_3456
MEC_2341
XAZ_4532
QPI_9012
KPI_1200
LPO_1300
KIN_9012
SVP_0001
....
JOP_1289
I have 3 data frames called df_cr, df_gt and df_ba which are contained in a list, list_of_dfs = [df_cr,df_gt,df_ba]
. 我有3个称为df_cr,df_gt和df_ba的数据帧,它们包含在列表中, list_of_dfs = [df_cr,df_gt,df_ba]
。 They have the following structure (I will type down only two data frame): 它们具有以下结构(我将仅键入两个数据帧):
#let's consider some data of df_cr as example
| Date | Site | Sector | KPI_1 | QA_value | Active |
| --------- |---------- |----------|----------|----------| ------ |
09/12/2015 CR_XAZ XAZ_4532 50.0 100.0 Y
09/12/2015 CR_PET PET_2312 50.0 100.0 Y
09/13/2015 CR_XAZ XAZ_4532 50.0 100.0 Y
09/13/2015 CR_PET PET_2312 50.0 100.0 Y
09/14/2015 CR_XAZ XAZ_4532 30.0 60.0 Y
09/14/2015 CR_PET PET_2312 25.0 50.0 N
09/15/2015 CR_XAZ XAZ_4532 25.0 50.0 N
09/15/2015 CR_PET PET_2312 40.0 80.0 Y
09/16/2015 CR_XAZ XAZ_4532 35.0 70.0 Y
09/16/2015 CR_PET PET_2312 45.0 90.0 Y
09/17/2015 CR_XAZ XAZ_4532 15.0 30.0 N
09/17/2015 CR_PET PET_2312 50.0 100.0 Y
.....
09/25/2015 CR_XAZ PET_4532 12.0 24.0 N
09/25/2015 CR_PET XAZ_2312 12.0 24.0 N
#let's consider some data of df_ba as example
| Date | Site | Sector | KPI_1 | QA_value | Active |
| --------- |--------- |----------| ---------|----------| ------ |
09/12/2015 CR_DEP DEP_1234 35.0 70.0 Y
09/12/2015 CR_XZT XZT_1212 50.0 100.0 Y
09/13/2015 CR_DEP DEP_1234 15.0 30.0 N
09/13/2015 CR_XZT XZT_1212 50.0 100.0 Y
09/14/2015 CR_DEP DEP_1234 35.0 70.0 Y
09/14/2015 CR_XZT XZT_1212 25.0 50.0 Y
09/15/2015 CR_DEP DEP_1234 25.0 50.0 Y
09/15/2015 CR_XZT XZT_1212 40.0 80.0 Y
09/16/2015 CR_DEP DEP_1234 15.0 30.0 N
09/16/2015 CR_XZT XZT_1212 45.0 90.0 Y
09/17/2015 CR_DEP DEP_1234 50.0 100.0 Y
09/17/2015 CR_XZT XZT_1212 50.0 100.0 Y
.....
09/25/2015 CR_DEP DEP_1234 10.0 20.0 N
09/25/2015 CR_XZT XZT_1212 50.0 100.0 Y
My goal is to compare each value of the Site_Sector column data frame against each of the Sector columns of each data frame that is contained in the list . 我的目标是将Site_Sector列数据帧的每个值与列表中包含的每个数据帧的Sector列的每个值进行比较 。 If there is a match between Site_Sector and Sector columns then add the columns Date, KPI_1, QA_value and Active into the df_input data frame. 如果Site_Sector和Sector列之间存在匹配项,则将列Date,KPI_1,QA_value和Active添加到df_input数据框中。
#expected output
Site_Sector| Date | KPI_1| QA_value | Active
----------------------------------------------------
DEP_1234 09/12/2015 35.0 70.0 Y
DEP_1234 09/13/2015 15.0 30.0 N
DEP_1234 09/14/2015 35.0 70.0 Y
DEP_1234 09/15/2015 25.0 50.0 N
....
XAZ_4532 09/12/2015 50.0 100.0 Y
XAZ_4532 09/13/2015 50.0 100.0 Y
XAZ_4532 09/14/2015 30.0 60.0 Y
XAZ_4532 09/15/2015 25.0 50.0 N
....
If something was not clear or more details are needed please comment on this post and I will be glad to explain more. 如果有不清楚的地方或需要更多详细信息,请对此帖子发表评论,我将很乐于解释更多。
I'd do this with a list comprehension + pd.Series.isin
: 我会用列表理解 + pd.Series.isin
做到这pd.Series.isin
:
data = df_input.Site_Sector
filtered_dfs = [x[x.Sector.isin(data)] for x in list_of_dfs]
output = pd.concat(filtered_dfs).drop('Site', 1)
For your input, this is what you get: 对于您的输入,这是您得到的:
print(output.sort_values('Sector'))
Date Sector KPI_1 QA_value Active
0 09/12/2015 DEP_1234 35.0 70.0 Y
2 09/13/2015 DEP_1234 15.0 30.0 N
4 09/14/2015 DEP_1234 35.0 70.0 Y
6 09/15/2015 DEP_1234 25.0 50.0 Y
8 09/16/2015 DEP_1234 15.0 30.0 N
10 09/17/2015 DEP_1234 50.0 100.0 Y
12 09/25/2015 DEP_1234 10.0 20.0 N
0 09/12/2015 XAZ_4532 50.0 100.0 Y
2 09/13/2015 XAZ_4532 50.0 100.0 Y
4 09/14/2015 XAZ_4532 30.0 60.0 Y
6 09/15/2015 XAZ_4532 25.0 50.0 N
8 09/16/2015 XAZ_4532 35.0 70.0 Y
10 09/17/2015 XAZ_4532 15.0 30.0 N
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.