Map 两个数据帧使用 pandas - python

Question

Have two dataframes有两个数据框

import pandas as pd

df1 = pd.DataFrame([['tom', 2, 11111]], columns=["name", "cell", "marks"])

df2 = pd.DataFrame([['tomm', 2, 11111, 2548],
                    ['matt', 2, 158416, 2483],
                    ['tonmmm', 2, 11111, 2549]
                    ], columns=["name", "cell", "marks", "passwd"])

Input输入

df1 df1

    name    cell        marks
0   tom       2         11111

df2 df2

    name    cell    marks   passwd
0   tomm    2      11111     2548
1   matt    2      158416    2483
2   tonmmm  2      11111     2549

map two dataframe which has similar columns map 两个 dataframe 具有相似的列

get columns from df2 which has match atleast a count of 2. here cell and marks matches with df1 with 2 values从 df2 中获取至少匹配计数为 2 的列。此处cell和marks与 df1 匹配，具有 2 个值

expected output:预期 output：

    name    cell    marks   passwd
0   tomm    2      11111     2548
1   tonmmm  2      11111     2549

Answer 1

You could try this:你可以试试这个：

df1 = pd.DataFrame([['tom', 2, 11111]], columns=["name", "cell", "marks"])

df2 = pd.DataFrame([['tomm', 2, 11111, 2548],
                    ['matt', 2, 158416, 2483],
                    ['tonmmm', 2, 11111, 2549]
                    ], columns=["name", "cell", "marks", "passwd"])

temp=[len([i for i in list(row)[1:] if i in list(df1.iloc[0,:])])>=2 for row in df2[df2.columns[:len(df2.columns)-1]].to_records()]
newdf=df2[temp]
print(newdf)

Output: Output：

     name  cell  marks  passwd
0    tomm     2  11111    2548
2  tonmmm     2  11111    2549

Edit : In the case you want to sort it base on the number of matches, you could try:编辑：如果您想根据匹配数对其进行排序，您可以尝试：

import pandas as pd
import numpy as np
df1 = pd.DataFrame([['tom', 2, 11111]], columns=["name", "cell", "marks"])  
df2 = pd.DataFrame([['tomm', 2, 11111, 2548],['matt', 2, 158416, 2483], ['tom', 2, 11111, 2549]], columns=["name", "cell", "marks", "passwd"])
temp=[len([i for i in list(row)[1:] if i in list(df1.iloc[0,:])]) for row in df2[df2.columns[:len(df2.columns)-1]].to_records()]
newdf=df2.copy().assign(val=temp).sort_values(by='val',ascending=False)
mask=np.where(newdf.val.ge(2), True, False)
newdf=newdf.drop(['val'],axis=1).reset_index(drop=True)[mask]
print(newdf)

Output: Output：

   name  cell  marks  passwd
0   tom     2  11111    2549
1  tomm     2  11111    2548

Map 两个数据帧使用 pandas - python

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-30 17:19:41

Map 两个数据帧使用 pandas - python

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-30 17:19:41

解决方案1
1 已采纳 2020-06-30 17:19:41