简体   繁体   English

如何循环 pandas 系列(列表类型)并检查列表中的字符串是否与另一个 df 中的另一个系列匹配?

[英]How to loop a pandas series(list type) and check if the string in list matches with another series in a another df?

I have two Data Frames:

**df1-- data1 = {'Company':['ADIDAS','NIKE','PUMA','NEW BALANCE','UNDER ARMOUR'],
   'Keywords':['COPA, PREDATOR, ORIGINALS, SPEEDFLOW','MERCURIAL, SUPERSTAR, VAPOR','ULTRA, FUTURE, RAPIDO','FURON','TEKELA'],
   'Suppliers':['', '', '[STADIUM, JD]', '', '']}


| Company    |Keywords                              | Suppliers   |
| ---------- |--------------------------------------|-------------|
| ADIDAS     |[COPA, PREDATOR, ORIGINALS, SPEEDFLOW]| <NA>        |
| NIKE       |[MERCURIAL, SUPERSTAR, VAPOR]         | <NA>        |
| PUMA       |[ULTRA, FUTURE, RAPIDO]               |[STADIUM, JD]|
|NEW BALANCE |[FURON]                               | <NA>        |
|UNDER ARMOUR|[TEKELA]                              | <NA>        |

**df2 --data2 = {'Names':['ADIDAS PREDATOR 17.1','NIKE MERCURIAL 2020','NIKE VAPOR 2021','NEW BALANCE FURON','PUMA RAPIDO 21.3', 'PUMA RAPIDO 21.4'],
   'Supplier':['ADIDAS','NIKE','NIKE','JD','STADIUM', 'JD'], 'Company.1': ['', '', '', '', '', '']}**
| Names                | Supplier | Company.1 |
| -------------------- | ---------|-----------|
| ADIDAS PREDATOR 17.1 |ADIDAS    | <NA>      |
| NIKE MERCURIAL 2020  |NIKE      | <NA>      |
| NIKE VAPOR 2021      |NIKE      | <NA>      |
| NEW BALANCE FURON    |JD        | <NA>      |
| PUMA RAPIDO 21.3     |STADIUM   | <NA>      |
| PUMA RAPIDO 21.4     |JD        | <NA>      |

Goal is to check if df2[Names] contains any word from df1[keywords], if YES then check if df1[suppliers] and df2[supplier] are equal, if YES then assign df2[Company.1] as df1[Company].目标是检查 df2[Names] 是否包含来自 df1[keywords] 的任何单词,如果是,则检查 df1[suppliers] 和 df2[supplier] 是否相等,如果是,则将 df2[Company.1] 分配为 df1[Company] . (If df1[Suppliers] is empty then no need to check for supplier) (如果 df1[Suppliers] 为空,则无需检查供应商)

Here is some code I've tried.(Print statement is just a reference for me)这是我尝试过的一些代码。(打印语句仅供我参考)

for i in range(len(df1["Keywords"])):       
    for j in range(len(df1["Keywords"][i])):
        for name_index in range(len(df2["Product_name"])):
            if df1["Keywords"][i][j].strip() in df2["Product_name"][name_index]:
                print("YES " + df1["Keywords"][i][j] + " in "+ df2["Product_name"][name_index])  

  # Now need to check if suppliers are same

Expected Output:预期 Output:

| Names                | Supplier | Company.1 |
| -------------------- | ---------|-----------|
| ADIDAS PREDATOR 17.1 |ADIDAS    | ADIDAS    |
| NIKE MERCURIAL 2020  |NIKE      | NIKE      |
| NIKE VAPOR 2021      |NIKE      | NIKE      |
| NEW BALANCE FURON    |JD        | NEW BALANCE|
| PUMA RAPIDO 21.3     |STADIUM   | PUMA      |

How to add Company name to Company.1 using the satisfied condition?如何使用满足的条件将公司名称添加到 Company.1?

Is Suppliers in data1 supposed to have <NA> values, except for [STADIUM, JD] ?除了[STADIUM, JD]之外, data1中的Suppliers是否应该具有<NA>值? If so, I'm unsure how you've gotten the Company.1 values in your expected output.如果是这样,我不确定您如何在预期的 output 中获得Company.1值。 None of the values in data2 's Supplier are <NA> , and — of the one that is not <NA> in data1 — the Keywords do not match the Names in data2 . data2Supplier中的值都不是<NA> ,并且 - 在data1中不是<NA>的值中 - Keywordsdata2中的Names不匹配。

Regardless, I believe I have the gist of what you're looking for.无论如何,我相信我有你要找的东西的要点。

keywords: str = "Keywords"
names: str = "Names"

# 1 - Compare the values of data2.Name to data1.Keywords
data1[keywords] = [i.split(", ") for i in data1.get(keywords)]
data2[names] = [i.split() for i in data2.get(names)]
data2["_match_keywords"] = [any(i in name for i in keyword) for name, keyword in zip(data2.get(names), data1.get(keywords))]

# Out - data2
# {'Names': [['ADIDAS', 'PREDATOR', '17.1'], ['NIKE', 'MERCURIAL', '2020'], #['NIKE', 'VAPOR', '2021'],
#           ['NEW', 'BALANCE', 'FURON'], ['PUMA', 'RAPIDO', '21.3']],
# 'Supplier': ['ADIDAS', 'NIKE', 'NIKE', 'JD', 'STADIUM'], 'Company.1': ['', # '', '', '', ''],
# '_match_keywords': [True, True, False, True, False]}

# 2 - Compare data2.Supplier to data1.Suppliers
data2["_match_supplier"] = [any(i in s1 for i in s2) for s2, s1 in zip(data2.get("Supplier"), data1.get("Suppliers"))]

# Out
# {'Names': [['ADIDAS', 'PREDATOR', '17.1'], ['NIKE', 'MERCURIAL', '2020'], ['NIKE', 'VAPOR', '2021'],
#            ['NEW', 'BALANCE', 'FURON'], ['PUMA', 'RAPIDO', '21.3']],
#  'Supplier': ['ADIDAS', 'NIKE', 'NIKE', 'JD', 'STADIUM'], 'Company.1': ['', '', '', '', ''],
#  '_match_keywords': [True, True, False, True, False], '_match_supplier': [False, False, True, False, False]}

# 3 - If Keyword and Supplier are equal, then assign data1.Company to data2.Company.1
for match, org in zip(data2.get("_match_keywords"), data2.get("_match_supplier")):
    if match == org:
        data2["Company.1"] = org

# 4 - Make the frames and drop the helper columns (_match_keywords, _match_supplier)
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df2 = df2[[col for col in df2.columns if not col.startswith("_")]]

df1 df1

Company公司 Keywords关键词 Suppliers供应商
ADIDAS阿迪达斯 ['COPA', 'PREDATOR', 'ORIGINALS', 'SPEEDFLOW'] ['COPA','PREDATOR','原件','SPEEDFLOW']
NIKE耐克 ['MERCURIAL', 'SUPERSTAR', 'VAPOR'] ['水银','超级明星','蒸汽']
PUMA彪马 ['ULTRA', 'FUTURE', 'RAPIDO'] ['超','未来','RAPIDO'] [STADIUM, JD] [体育场,法学博士]
NEW BALANCE新百伦 ['FURON'] ['呋喃']
UNDER ARMOUR安德玛 ['TEKELA'] ['特克拉']

df2 w/conditions df2 带条件

Names名称 Supplier供应商 Company.1公司.1 _match_keywords _match_keywords _match_supplier _match_supplier
['ADIDAS', 'PREDATOR', '17.1'] ['阿迪达斯','捕食者','17.1'] ADIDAS阿迪达斯 True真的 False错误的
['NIKE', 'MERCURIAL', '2020'] ['耐克','水银','2020'] NIKE耐克 True真的 False错误的
['NIKE', 'VAPOR', '2021'] ['耐克','蒸汽','2021'] NIKE耐克 False错误的 True真的
['NEW', 'BALANCE', 'FURON'] ['新','平衡','FURON'] JD京东 True真的 False错误的
['PUMA', 'RAPIDO', '21.3'] ['彪马','RAPIDO','21.3'] STADIUM体育场 False错误的 False错误的

df2 w/helpers dropped df2 w/helpers 掉落

| Names                          | Supplier   | Company.1   |
|--------------------------------|------------|-------------|
| ['ADIDAS', 'PREDATOR', '17.1'] | ADIDAS     |             |
| ['NIKE', 'MERCURIAL', '2020']  | NIKE       |             |
| ['NIKE', 'VAPOR', '2021']      | NIKE       |             |
| ['NEW', 'BALANCE', 'FURON']    | JD         |             |
| ['PUMA', 'RAPIDO', '21.3']     | STADIUM    |             |

Here is the code that is working for me.这是对我有用的代码。 If anybody has any suggestion on improving my code, I would appreciate it!如果有人对改进我的代码有任何建议,我将不胜感激!

for i in range(len(df1["Keywords"])):       
  for j in range(len(df1["Keywords"][i])):
    for name_index in range(len(df2["Product_name"])):
        if df1["Keywords"][i][j].strip() in df2["Product_name"][name_index]:
            print("YES," + df1["Keywords"][i][j] + " in "+ df2["Product_name"][name_index])       

            print("---Checking Supplier-------")
            df1["Suppliers"].fillna("Empty", inplace = True)
            if df1["Suppliers"][i] == "Empty":
                print("---Supplier empty so addding brand name")
                df2["Company.1"][name_index] = df1["Company"][i]
                print("---Brand added--")
            else:
                print("---Supplier not empty so looking for match")
                for suppliers in df1["Suppliers"][i]:                               
                    if df2["Supplier"][name_index] in suppliers:
                        print("Supplier matched", end =" ")
                        print(df2["Supplier"][name_index], suppliers)
                        df2["Company.1"][name_index] = df1["Company"][i]
                        print("Brand added")
                        break                                                 
                    else:
                        df2["Company.1"][name_index] = ("Unmapped")
                        print("Supplier not matched so unmapped.", end = " ")
                        print(df2["Supplier"][name_index], suppliers)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas-如何检查DF行中的字符串列表是否包含另一个DF中的任何串联字符串? - Pandas- How to check if list of strings in DF row contains any of strings in series in another DF? 检查python系列是否在另一个列表中包含任何字符串 - Check if a python series contains any string in another list 如何测试熊猫系列是否包含来自另一个列表(或 NumPy 数组或熊猫系列)的元素? - How to test whether a pandas series contains elements from another list (or NumPy array or pandas series)? 遍历多个 Pandas 列表类型系列并找到匹配项 - Iterate through multiple Pandas list-type series and find matches 检查 Pandas 系列是否为字符串类型 - Check if Pandas Series is of type string 如何“松散地”检查字符串是否与列表中的另一个字符串匹配 - how to 'loosely' check if string matches another string in list 如果值存在于另一个 df.Series 描述中,则从值列表中填充 pd.Series - Fill pd.Series from list of values if value exist in another df.Series description 在 Pandas 系列中搜索值列表并屏蔽匹配项 - Searching for a list of values in Pandas series and masking the matches 熊猫-如何将不同的功能应用于df.series而该功能取决于另一个系列? - pandas-how to apply different function to df.series while the function depending on another series? 获取另一个系列中一个系列的元素的索引列表 - Get list of index of elements of a series in another series
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM