简体   繁体   English

将数据框列表中的值与另一个数据框列进行比较

[英]Comparing values in a list of dataframes to another dataframe column

I am stuck in the following problem. 我陷入了以下问题。 I have df_input as my input data frame which contains only 1 column called Site_Sector. 我将df_input作为输入数据框,其中仅包含1个称为Site_Sector的列。 Site_Sector has the following structure: Site_Sector具有以下结构:

 Site_Sector
--------------
  DEP_1234
  TRE_5421
  YUT_0901
  IOP_ABC3
  POS_3456
  MEC_2341
  XAZ_4532
  QPI_9012
  KPI_1200
  LPO_1300
  KIN_9012
  SVP_0001
  ....
  JOP_1289

I have 3 data frames called df_cr, df_gt and df_ba which are contained in a list, list_of_dfs = [df_cr,df_gt,df_ba] . 我有3个称为df_cr,df_gt和df_ba的数据帧,它们包含在列表中, list_of_dfs = [df_cr,df_gt,df_ba] They have the following structure (I will type down only two data frame): 它们具有以下结构(我将仅键入两个数据帧):

 #let's consider some data of df_cr as example

 |  Date     |   Site    | Sector   |  KPI_1   | QA_value | Active |
 | --------- |---------- |----------|----------|----------| ------ |
  09/12/2015     CR_XAZ    XAZ_4532     50.0        100.0       Y
  09/12/2015     CR_PET    PET_2312     50.0        100.0       Y   
  09/13/2015     CR_XAZ    XAZ_4532     50.0        100.0       Y
  09/13/2015     CR_PET    PET_2312     50.0        100.0       Y
  09/14/2015     CR_XAZ    XAZ_4532     30.0        60.0        Y
  09/14/2015     CR_PET    PET_2312     25.0        50.0        N
  09/15/2015     CR_XAZ    XAZ_4532     25.0        50.0        N
  09/15/2015     CR_PET    PET_2312     40.0        80.0        Y
  09/16/2015     CR_XAZ    XAZ_4532     35.0        70.0        Y
  09/16/2015     CR_PET    PET_2312     45.0        90.0        Y
  09/17/2015     CR_XAZ    XAZ_4532     15.0        30.0        N
  09/17/2015     CR_PET    PET_2312     50.0        100.0       Y
    .....
  09/25/2015     CR_XAZ    PET_4532     12.0        24.0        N
  09/25/2015     CR_PET    XAZ_2312     12.0        24.0        N

 #let's consider some data of df_ba as example

 |  Date     |   Site   | Sector   |  KPI_1   | QA_value | Active |
 | --------- |--------- |----------| ---------|----------| ------ |
  09/12/2015     CR_DEP   DEP_1234     35.0        70.0        Y
  09/12/2015     CR_XZT   XZT_1212     50.0        100.0       Y   
  09/13/2015     CR_DEP   DEP_1234     15.0        30.0        N
  09/13/2015     CR_XZT   XZT_1212     50.0        100.0       Y
  09/14/2015     CR_DEP   DEP_1234     35.0        70.0        Y
  09/14/2015     CR_XZT   XZT_1212     25.0        50.0        Y
  09/15/2015     CR_DEP   DEP_1234     25.0        50.0        Y
  09/15/2015     CR_XZT   XZT_1212     40.0        80.0        Y
  09/16/2015     CR_DEP   DEP_1234     15.0        30.0        N
  09/16/2015     CR_XZT   XZT_1212     45.0        90.0        Y
  09/17/2015     CR_DEP   DEP_1234     50.0        100.0       Y
  09/17/2015     CR_XZT   XZT_1212     50.0        100.0       Y
    .....
  09/25/2015     CR_DEP   DEP_1234     10.0        20.0        N
  09/25/2015     CR_XZT   XZT_1212     50.0        100.0       Y

My goal is to compare each value of the Site_Sector column data frame against each of the Sector columns of each data frame that is contained in the list . 我的目标是将Site_Sector列数据帧的每个值与列表中包含的每个数据帧的Sector列的每个值进行比较 If there is a match between Site_Sector and Sector columns then add the columns Date, KPI_1, QA_value and Active into the df_input data frame. 如果Site_Sector和Sector列之间存在匹配项,则将列Date,KPI_1,QA_value和Active添加到df_input数据框中。

 #expected output

 Site_Sector|  Date     | KPI_1| QA_value | Active 
----------------------------------------------------
  DEP_1234   09/12/2015   35.0    70.0        Y
  DEP_1234   09/13/2015   15.0    30.0        N
  DEP_1234   09/14/2015   35.0    70.0        Y
  DEP_1234   09/15/2015   25.0    50.0        N
   ....
  XAZ_4532   09/12/2015   50.0    100.0       Y
  XAZ_4532   09/13/2015   50.0    100.0       Y
  XAZ_4532   09/14/2015   30.0    60.0        Y
  XAZ_4532   09/15/2015   25.0    50.0        N
   ....

If something was not clear or more details are needed please comment on this post and I will be glad to explain more. 如果有不清楚的地方或需要更多详细信息,请对此帖子发表评论,我将很乐于解释更多。

I'd do this with a list comprehension + pd.Series.isin : 我会用列表理解 + pd.Series.isin做到这pd.Series.isin

data = df_input.Site_Sector
filtered_dfs = [x[x.Sector.isin(data)] for x in list_of_dfs]
output = pd.concat(filtered_dfs).drop('Site', 1)

For your input, this is what you get: 对于您的输入,这是您得到的:

print(output.sort_values('Sector'))
          Date    Sector  KPI_1  QA_value Active
0   09/12/2015  DEP_1234   35.0      70.0      Y
2   09/13/2015  DEP_1234   15.0      30.0      N
4   09/14/2015  DEP_1234   35.0      70.0      Y
6   09/15/2015  DEP_1234   25.0      50.0      Y
8   09/16/2015  DEP_1234   15.0      30.0      N
10  09/17/2015  DEP_1234   50.0     100.0      Y
12  09/25/2015  DEP_1234   10.0      20.0      N
0   09/12/2015  XAZ_4532   50.0     100.0      Y
2   09/13/2015  XAZ_4532   50.0     100.0      Y
4   09/14/2015  XAZ_4532   30.0      60.0      Y
6   09/15/2015  XAZ_4532   25.0      50.0      N
8   09/16/2015  XAZ_4532   35.0      70.0      Y
10  09/17/2015  XAZ_4532   15.0      30.0      N

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于比较其他数据帧值生成带有布尔列的新数据帧 - Generate a new dataframe with boolean column based on comparing other dataframes values 比较 2 个数据帧以获取另一个数据帧 - Comparing 2 dataframes to get another dataframe 如何通过比较另一个数据框中的值将值分配给数据框的列 - How to assign a values to dataframe's column by comparing values in another dataframe 比较两个 DataFrame 的列值 - Comparing column values of two DataFrames 在 DataFrame 中查找未被其他 DataFrames 列中的值中断的连续列值 - Finding continuous column values in a DataFrame that are not interrupted by values in another DataFrames columns 比较 Pyspark 数据帧的值(列表) - Comparing the values (list) of Pyspark dataframes 提取 pandas DataFrame 中的特定列值比较另一个 DataFrame - Extracting specific column values in pandas DataFrame comparing another DataFrame 通过将另一列与第二个DataFrame进行比较,替换一列中的值 - Replace values from one column by comparing another column to a second DataFrame 比较两个数据帧并将结果存储在另一个数据帧中 - Comparing two dataframes and storing results in another dataframe 通过比较各列中的值来填充NaNs - Filling NaNs with values from column of another dataframe by comparing the columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM