I have two dataframes:
df1: contains all information
rowname a b c d
R1 1 2 0 1
R2 2 2 0 1
R3 0 2 0 0
R4 1 2 0 1
df2: contains a subset of the rows and columns:
rowname a b c
R1 1 2 0
R2 2 2 0
R4 1 2 0
I want to filter out all the rows df2
that are not in df1
. So for this case, I'm looking to get rid of R3 in df1
while keeping all columns.
I think using df1.merge(df2, ...)
could work to make this happen, but I've tried a variety of arguments with no success. I'm using python3.
Simpy filter the dataframe using isin()
df1[df1.rowname.isin(df2.rowname)]
rowname a b c d
0 R1 1 2 0 1
1 R2 2 2 0 1
3 R4 1 2 0 1
This is one way, which matches only on columns ['a', 'b', 'c']
.
df = pd.concat([df1, df2])
df = df.loc[df.duplicated(['a', 'b', 'c'], keep=False)]\
.dropna(subset=['d'], axis=0)
df['d'] = df['d'].astype(int)
Result:
a b c d rowname
0 1 2 0 1 R1
1 2 2 0 1 R2
3 1 2 0 1 R4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.