简体   繁体   English

Pandas:根据条件将值从一个 dataframe 合并到另一个

[英]Pandas: Merge values from one dataframe to another based on condition

using fuzzy logic and fuzzywuzzy module I am able to match Names(from one dataframe) with Short Names(from another Dataframe).使用模糊逻辑和fuzzywuzzy模块,我能够将名称(来自一个数据帧)与短名称(来自另一个数据帧)匹配。 Both these Dataframes also contain a table ISIN.这两个数据框还包含一个表 ISIN。

This is the dataframe I get after logic is applied.这是应用逻辑后得到的 dataframe。

ISIN                                      Name Currency         Value  % Weight  Asset Type Comments/ Assumptions          matches
236   NaN            Partnerre Ltd 4.875% Perp Sr:J      USD  1.684069e+05    0.0004         NaN                   NaN
237   NaN  Berkley (Wr) Corporation 5.700% 03/30/58      USD  6.955837e+04    0.0002         NaN                   NaN
238   NaN             Tc Energy Corp Flt Perp Sr:11      USD  6.380262e+04    0.0001         NaN                   NaN   TC ENERGY CORP
239   NaN                      Cash and Equivalents      USD  2.166579e+07    0.0499         NaN                   NaN
240   NaN                                       AUM      NaN  4.338766e+08    0.9999         NaN                   NaN  AUM IND BARC US

A new column 'matches' is created which basically implies that Short name from 2nd dataframe matches Name from the first dataframe.创建了一个新列“匹配”,这基本上意味着来自第二个 dataframe 的短名称与来自第一个 dataframe 的名称匹配。

ISIN from dataframe1 is empty and ISIN from dataframe2 is present.来自 dataframe1 的 ISIN 为空,来自 dataframe2 的 ISIN 存在。 Upon a subsequent Match(Name from 1st Dataframe and Short Name from 2nd Dataframe), I want to add the relevant ISIN from 2nd dataframe to 1st dataframe.在随后的匹配中(第一个 Dataframe 的名称和第二个数据帧的短名称),我想将第二个 dataframe 中的相关 ISIN 添加到第一个 Z6A8064B5DF479455500553C47DZ55500553C47DZC。

How do I get the ISIN from 2nd dataframe to the 1st dataframe so that my final output would look like this?如何从第二个 dataframe 到第一个 dataframe 获取 ISIN,以便我的最终 output 看起来像这样?

ISIN                                      Name Currency         Value  % Weight  Asset Type Comments/ Assumptions          matches
236   NaN            Partnerre Ltd 4.875% Perp Sr:J      USD  1.684069e+05    0.0004         NaN                   NaN
237   NaN  Berkley (Wr) Corporation 5.700% 03/30/58      USD  6.955837e+04    0.0002         NaN                   NaN
238   78s9             Tc Energy Corp Flt Perp Sr:11      USD  6.380262e+04    0.0001         NaN                   NaN   TC ENERGY CORP
239   NaN                      Cash and Equivalents      USD  2.166579e+07    0.0499         NaN                   NaN
240   123e                                       AUM      NaN  4.338766e+08    0.9999         NaN                   NaN  AUM IND BARC US

EDIT : dataframes and their in their original form df1编辑:数据框及其原始形式 df1

ISIN                                 Name Currency       Value  % Weight  Asset Type                              Comments/ Assumptions
0   NaN     Transcanada Trust 5.875 08/15/76      USD  7616765.00    0.0176         NaN  https://assets.cohenandsteers.com/assets/conte...
1   NaN      Bp Capital Markets Plc Flt Perp      USD  7348570.50    0.0169         NaN  Holding value for each constituent is derived ...
2   NaN       Transcanada Trust Flt 09/15/79      USD  7341250.00    0.0169         NaN                                                NaN
3   NaN      Bp Capital Markets Plc Flt Perp      USD  6734022.32    0.0155         NaN                                                NaN
4   NaN  Prudential Financial 5.375% 5/15/45      USD  6508290.68    0.0150         NaN                                                NaN
(241, 7)

df2 df2

Short Name          ISIN
0  ABU DHABI COMMER  AEA000201011
1  ABU DHABI NATION  AEA002401015
2  ABU DHABI NATION  AEA006101017
3  ADNOC DRILLING C  AEA007301012
4  ALPHA DHABI HOLD  AEA007601015
(66987, 2)

EDIT 2 : the fuzzy logic to get matches from the dataframes编辑 2 :从数据帧中获取匹配的模糊逻辑

df1 = pd.read_excel('file.xlsx', sheet_name=1, usecols=[1, 2, 3, 4, 5, 6, 8], header=1)
df2 = pd.read_excel("Excel files/file2.xlsx", sheet_name=0, usecols=[1, 2], header=1)

# empty lists for storing the matches
# later
mat1 = []
mat2 = []
p = []

# converting dataframe column
# to list of elements
# to do fuzzy matching
list1 = df1['Name'].tolist()
list2 = df2['Short Name'].tolist()

# taking the threshold as 80
threshold = 93

# iterating through list1 to extract
# it's closest match from list2
for i in list1:
    mat1.append(process.extractOne(i, list2, scorer=fuzz.token_set_ratio))
df1['matches'] = mat1

# iterating through the closest matches
# to filter out the maximum closest match
for j in df1['matches']:
    if j[1] >= threshold:
        p.append(j[0])
    mat2.append(",".join(p))
    p = []

# storing the resultant matches back
# to df1
df1['matches'] = mat2
print("\nDataFrame after Fuzzy matching using token_set_ratio():")
#print(df1.to_csv('todays-result1.csv'))
print(df1.head(20))

Assuming your first dataframe has ISINs filled out to null, then a simple merge will do what you need.假设您的第一个 dataframe 的 ISIN 填写到 null,那么简单的合并就可以满足您的需要。 If you need the non-null ISINs in the first dataframe to be preserved, then you need to use a boolean mask:-如果您需要保留第一个 dataframe 中的非空 ISIN,则需要使用 boolean 掩码:-

df1 = pd.DataFrame(
  [[None, "Apple", "appl"], 
  [None, "Google", "ggl"], 
  [None, "Amazon", 'amzn']], 
  columns=["ISIN", "Name", "matches"]
)

df2 = pd.DataFrame(
  [["ISIN1", "appl"], 
  ["ISIN2", "ggl"]], 
  columns= ["ISIN", "Short Name"]
)

missing_isin = df1['ISIN'].isnull()

df1.loc[missing_isin, 'ISIN'] = df1.loc[missing_isin][['matches']].merge(
    df2[['ISIN', 'Short Name']], 
    how='left', 
    left_on='matches', 
    right_on='Short Name'
)['ISIN']

left_on / right_on :- Column names to match the dataframes on left_on / right_on :- 与数据帧匹配的列名

how='left' :- (In simple terms) Preserves the order/index of the leftmost dataframe, check out the docs for more info how='left' :- (简单来说)保留最左边的 dataframe 的顺序/索引,查看文档了解更多信息

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 更快地从一个数据帧获取行数据(基于条件)并合并到另一个b pandas python上 - Faster way to get row data (based on a condition) from one dataframe and merge onto another b pandas python 根据条件用一个python pandas dataframe列的值替换为另一个python pandas dataframe列的值 - Substitute the values of one python pandas dataframe column by values from another based on a condition 根据条件从另一个数据帧的值替换一个数据帧的值 - substitue values of one dataframe from values of another dataframe based on condition Pandas:根据另一个 dataframe 的匹配条件移植列值(并以矢量化形式进行) - Pandas: Transplant column values from one dataframe based on matching condition of another (and do it in vectorized form) 熊猫:根据时间条件将行从一个数据框映射到另一个数据框 - Pandas: Map rows from one dataframe to another based on a time condition 根据条件将值从一个pandas数据帧替换为另一个pandas数据帧 - Substitute values from one pandas data frame to another based on condition 根据行值将单元格从一个 Pandas 数据帧覆盖到另一个 - overwriting cells from one pandas dataframe to another based on row values 根据 Pandas 中的列值将内容从一个 Dataframe 复制到另一个 Dataframe - Copy contents from one Dataframe to another based on column values in Pandas Pandas数据框根据查询数据框中的值选择行,然后根据列值选择其他条件 - Pandas Dataframe Select rows based on values from a lookup dataframe and then another condition based on column value Pandas 根据条件从 dataframe 中提取值 - Pandas extracting values from dataframe based on condition
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM