I need some tips about a pandas issue.
I have the following DataFrame, df1, which contains the names in the dates that I need to keep in the output dataframe:
name date column_1 column_11
Anne 2018-01-01 some info1 some info11
John 2018-01-01 some info1 some info11
Mark 2018-02-01 some info1 some info11
Ethan 2018-03-01 some info1 some info11
Anne 2018-04-01 some info1 some info11
Ethan 2018-04-01 some info1 some info11
I have this other DataFrame, df2, that contains all the names and dates in my data sample:
name date column_2 column_22
Bob 2018-01-01 some info2 some info22
Bob 2018-01-01 some info2 some info22
Anne 2018-01-01 some info2 some info22
John 2018-01-01 some info2 some info22
Mark 2018-02-01 some info2 some info22
Mark 2018-02-01 some info2 some info22
Ethan 2018-03-01 some info2 some info22
Anne 2018-04-01 some info2 some info22
Anne 2018-04-01 some info2 some info22
Ethan 2018-04-01 some info2 some info22
Carl 2018-01-01 some info2 some info22
Joe 2018-01-01 some info2 some info22
And, as an output, I need a DataFrame like df1, but with all the columns in df2.
Note that df1 and df2 have other columns in addition to the ones I show, thus they have different information. The thing is, I want the columns in df2, but only with the names in the dates shown in df1.
Sample output would be:
name date column_2 column_22
Anne 2018-01-01 some info2 some info22
John 2018-01-01 some info2 some info22
Mark 2018-02-01 some info2 some info22
Mark 2018-02-01 some info2 some info22
Ethan 2018-03-01 some info2 some info22
Anne 2018-04-01 some info2 some info22
Anne 2018-04-01 some info2 some info22
Ethan 2018-04-01 some info2 some info22
NOTE:
doing:
df = df2.merge(df1)
Didn't work
NOTE 2:
df1 contains aggregated and filtered data from df2, that's why there are less rows in df1 than in df2. I just want to keep, in df2, those rows that contain the name and the date in df1.
None of the solutions work, so I thought maybe this explanation would help get the right anser.
I would do the following:
df_out = (df1.reset_index()[["name", "date"]]
.merge(df2.reset_index(), on=["name", "date"], how="inner"))
I'm going to do this in steps with intermediate DataFrames
. This is less efficient but it will give you more insight into what is happening.
Take only the name and date from df1
:
df_key = df1.loc[:, ["name", "date"]]
Use an inner join (referred to as a natural join in this article ) of the key table and df2
, which will produce only records where name and date match :
df_out_1 = df_2.merge(
df_key,
how="inner",
left_on=["name", "date"],
right_on=["name", "date"]
]
Pick out the columns you want from the resulting join and you are done :
df_out_2 = df_out_1.loc[:, ["name", "date", "column_2", "column_22"]]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.