简体   繁体   中英

Filtering a dataframe by two columns in another dataframe

I need some tips about a pandas issue.

I have the following DataFrame, df1, which contains the names in the dates that I need to keep in the output dataframe:

name      date          column_1     column_11     
Anne      2018-01-01    some info1    some info11
John      2018-01-01    some info1    some info11
Mark      2018-02-01    some info1    some info11
Ethan     2018-03-01    some info1    some info11
Anne      2018-04-01    some info1    some info11
Ethan     2018-04-01    some info1    some info11

I have this other DataFrame, df2, that contains all the names and dates in my data sample:

name     date           column_2    column_22
Bob      2018-01-01     some info2   some info22
Bob      2018-01-01     some info2   some info22
Anne     2018-01-01     some info2   some info22
John     2018-01-01     some info2   some info22
Mark     2018-02-01     some info2   some info22
Mark     2018-02-01     some info2   some info22
Ethan    2018-03-01     some info2   some info22
Anne     2018-04-01     some info2   some info22
Anne     2018-04-01     some info2   some info22
Ethan    2018-04-01     some info2   some info22
Carl     2018-01-01     some info2   some info22
Joe      2018-01-01     some info2   some info22

And, as an output, I need a DataFrame like df1, but with all the columns in df2.

Note that df1 and df2 have other columns in addition to the ones I show, thus they have different information. The thing is, I want the columns in df2, but only with the names in the dates shown in df1.

Sample output would be:

name      date          column_2     column_22     
Anne      2018-01-01    some info2    some info22
John      2018-01-01    some info2    some info22
Mark      2018-02-01    some info2    some info22
Mark      2018-02-01    some info2    some info22
Ethan     2018-03-01    some info2    some info22
Anne      2018-04-01    some info2    some info22
Anne      2018-04-01    some info2    some info22    
Ethan     2018-04-01    some info2    some info22

NOTE:

doing:

df = df2.merge(df1)

Didn't work

NOTE 2:

df1 contains aggregated and filtered data from df2, that's why there are less rows in df1 than in df2. I just want to keep, in df2, those rows that contain the name and the date in df1.

None of the solutions work, so I thought maybe this explanation would help get the right anser.

I would do the following:

df_out = (df1.reset_index()[["name", "date"]]
          .merge(df2.reset_index(), on=["name", "date"], how="inner"))

I'm going to do this in steps with intermediate DataFrames . This is less efficient but it will give you more insight into what is happening.

Take only the name and date from df1 :

df_key = df1.loc[:, ["name", "date"]] 

Use an inner join (referred to as a natural join in this article ) of the key table and df2 , which will produce only records where name and date match :

df_out_1 = df_2.merge(
        df_key, 
        how="inner", 
        left_on=["name", "date"], 
        right_on=["name", "date"]
] 

Pick out the columns you want from the resulting join and you are done :

df_out_2 = df_out_1.loc[:, ["name", "date", "column_2", "column_22"]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM