[英]Compare two data-frames with different column names and update first data-frame with the column from second data-frame
I am working on two data-frames which have different column names and dimensions.我正在研究两个具有不同列名和维度的数据框。
First data-frame "df1" contains single column "name" that has names need to be located in second data-frame.第一个数据帧“df1”包含单列“名称”,其名称需要位于第二个数据帧中。 If matched, value from df2 first column df2[0] needs to be returned and added in the result_df
如果匹配,则需要返回 df2 第一列 df2[0] 的值并将其添加到 result_df
Second data-frame "df2" has multiple columns and no header.第二个数据帧“df2”有多个列,没有 header。 This contains all the possible diminutive names and full names.
这包含所有可能的小名和全名。 Any of the column can have the "name" that needs to be matched
任何列都可以有需要匹配的“名称”
Goal: Locate the name in "df1" in "df2" and if it is matched, return the value from first column of the df2 and add in the respective row of df1目标:在“df2”中找到“df1”中的名称,如果匹配,则从 df2 的第一列返回值并添加到 df1 的相应行中
df1 df1
name![]() |
---|
ab![]() |
alex![]() |
bob![]() |
robert![]() |
bill![]() |
df2 df2
0 ![]() |
1 ![]() |
2 ![]() |
3 ![]() |
---|---|---|---|
abram![]() |
ab![]() |
||
robert![]() |
rob![]() |
bob![]() |
robbie![]() |
alexander![]() |
alex![]() |
al![]() |
|
william![]() |
bill![]() |
result_df结果_df
name![]() |
matched_name![]() |
---|---|
ab![]() |
abram![]() |
alex![]() |
alexander![]() |
bob![]() |
robert![]() |
robert![]() |
robert![]() |
bill![]() |
william![]() |
The code i have written so far is giving error.到目前为止我编写的代码给出了错误。 I need to write it as an efficient code as it will be checking millions of entries in df1 with df2:
我需要将其编写为高效代码,因为它将使用 df2 检查 df1 中的数百万个条目:
''' result_df = process_name(df1, df2) ''' result_df = process_name(df1, df2)
def process_name(df1, df2):定义进程名称(df1,df2):
for elem in df2.values:
if elem in df1['name']:
df1["matched_name"] = df2[0]
''' '''
Try via concat()
, merge()
, drop()
and rename()
and reset_index()
method:尝试通过
concat()
、 merge()
、 drop()
和rename()
和reset_index()
方法:
df=(pd.concat((df1.merge(df2,left_on='name',right_on=x) for x in df2.columns))
.drop(['1','2','3'],1)
.rename(columns={'0':'matched_name'})
.reset_index(drop=True))
Output of df
: Output 的
df
:
name matched_name
0 robert robert
1 ab abram
2 alex alexander
3 bill william
4 bob robert
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.