如何循环 pandas 系列（列表类型）并检查列表中的字符串是否与另一个 df 中的另一个系列匹配？

Question

I have two Data Frames:

**df1-- data1 = {'Company':['ADIDAS','NIKE','PUMA','NEW BALANCE','UNDER ARMOUR'],
   'Keywords':['COPA, PREDATOR, ORIGINALS, SPEEDFLOW','MERCURIAL, SUPERSTAR, VAPOR','ULTRA, FUTURE, RAPIDO','FURON','TEKELA'],
   'Suppliers':['', '', '[STADIUM, JD]', '', '']}


| Company    |Keywords                              | Suppliers   |
| ---------- |--------------------------------------|-------------|
| ADIDAS     |[COPA, PREDATOR, ORIGINALS, SPEEDFLOW]| <NA>        |
| NIKE       |[MERCURIAL, SUPERSTAR, VAPOR]         | <NA>        |
| PUMA       |[ULTRA, FUTURE, RAPIDO]               |[STADIUM, JD]|
|NEW BALANCE |[FURON]                               | <NA>        |
|UNDER ARMOUR|[TEKELA]                              | <NA>        |

**df2 --data2 = {'Names':['ADIDAS PREDATOR 17.1','NIKE MERCURIAL 2020','NIKE VAPOR 2021','NEW BALANCE FURON','PUMA RAPIDO 21.3', 'PUMA RAPIDO 21.4'],
   'Supplier':['ADIDAS','NIKE','NIKE','JD','STADIUM', 'JD'], 'Company.1': ['', '', '', '', '', '']}**
| Names                | Supplier | Company.1 |
| -------------------- | ---------|-----------|
| ADIDAS PREDATOR 17.1 |ADIDAS    | <NA>      |
| NIKE MERCURIAL 2020  |NIKE      | <NA>      |
| NIKE VAPOR 2021      |NIKE      | <NA>      |
| NEW BALANCE FURON    |JD        | <NA>      |
| PUMA RAPIDO 21.3     |STADIUM   | <NA>      |
| PUMA RAPIDO 21.4     |JD        | <NA>      |

Goal is to check if df2[Names] contains any word from df1[keywords], if YES then check if df1[suppliers] and df2[supplier] are equal, if YES then assign df2[Company.1] as df1[Company].目标是检查 df2[Names] 是否包含来自 df1[keywords] 的任何单词，如果是，则检查 df1[suppliers] 和 df2[supplier] 是否相等，如果是，则将 df2[Company.1] 分配为 df1[Company] . (If df1[Suppliers] is empty then no need to check for supplier) （如果 df1[Suppliers] 为空，则无需检查供应商）

Here is some code I've tried.(Print statement is just a reference for me)这是我尝试过的一些代码。（打印语句仅供我参考）

for i in range(len(df1["Keywords"])):       
    for j in range(len(df1["Keywords"][i])):
        for name_index in range(len(df2["Product_name"])):
            if df1["Keywords"][i][j].strip() in df2["Product_name"][name_index]:
                print("YES " + df1["Keywords"][i][j] + " in "+ df2["Product_name"][name_index])  

  # Now need to check if suppliers are same

Expected Output:预期 Output：

| Names                | Supplier | Company.1 |
| -------------------- | ---------|-----------|
| ADIDAS PREDATOR 17.1 |ADIDAS    | ADIDAS    |
| NIKE MERCURIAL 2020  |NIKE      | NIKE      |
| NIKE VAPOR 2021      |NIKE      | NIKE      |
| NEW BALANCE FURON    |JD        | NEW BALANCE|
| PUMA RAPIDO 21.3     |STADIUM   | PUMA      |

How to add Company name to Company.1 using the satisfied condition?如何使用满足的条件将公司名称添加到 Company.1？

Answer 1

Is Suppliers in data1 supposed to have <NA> values, except for [STADIUM, JD] ?除了[STADIUM, JD]之外， data1中的Suppliers是否应该具有<NA>值？ If so, I'm unsure how you've gotten the Company.1 values in your expected output.如果是这样，我不确定您如何在预期的 output 中获得Company.1值。 None of the values in data2 's Supplier are <NA> , and — of the one that is not <NA> in data1 — the Keywords do not match the Names in data2 . data2的Supplier中的值都不是<NA> ，并且 - 在data1中不是<NA>的值中 - Keywords与data2中的Names不匹配。

Regardless, I believe I have the gist of what you're looking for.无论如何，我相信我有你要找的东西的要点。

keywords: str = "Keywords"
names: str = "Names"

# 1 - Compare the values of data2.Name to data1.Keywords
data1[keywords] = [i.split(", ") for i in data1.get(keywords)]
data2[names] = [i.split() for i in data2.get(names)]
data2["_match_keywords"] = [any(i in name for i in keyword) for name, keyword in zip(data2.get(names), data1.get(keywords))]

# Out - data2
# {'Names': [['ADIDAS', 'PREDATOR', '17.1'], ['NIKE', 'MERCURIAL', '2020'], #['NIKE', 'VAPOR', '2021'],
#           ['NEW', 'BALANCE', 'FURON'], ['PUMA', 'RAPIDO', '21.3']],
# 'Supplier': ['ADIDAS', 'NIKE', 'NIKE', 'JD', 'STADIUM'], 'Company.1': ['', # '', '', '', ''],
# '_match_keywords': [True, True, False, True, False]}

# 2 - Compare data2.Supplier to data1.Suppliers
data2["_match_supplier"] = [any(i in s1 for i in s2) for s2, s1 in zip(data2.get("Supplier"), data1.get("Suppliers"))]

# Out
# {'Names': [['ADIDAS', 'PREDATOR', '17.1'], ['NIKE', 'MERCURIAL', '2020'], ['NIKE', 'VAPOR', '2021'],
#            ['NEW', 'BALANCE', 'FURON'], ['PUMA', 'RAPIDO', '21.3']],
#  'Supplier': ['ADIDAS', 'NIKE', 'NIKE', 'JD', 'STADIUM'], 'Company.1': ['', '', '', '', ''],
#  '_match_keywords': [True, True, False, True, False], '_match_supplier': [False, False, True, False, False]}

# 3 - If Keyword and Supplier are equal, then assign data1.Company to data2.Company.1
for match, org in zip(data2.get("_match_keywords"), data2.get("_match_supplier")):
    if match == org:
        data2["Company.1"] = org

# 4 - Make the frames and drop the helper columns (_match_keywords, _match_supplier)
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df2 = df2[[col for col in df2.columns if not col.startswith("_")]]

df1 df1

Company公司	Keywords关键词	Suppliers供应商
ADIDAS阿迪达斯	['COPA', 'PREDATOR', 'ORIGINALS', 'SPEEDFLOW'] ['COPA'，'PREDATOR'，'原件'，'SPEEDFLOW']
NIKE耐克	['MERCURIAL', 'SUPERSTAR', 'VAPOR'] ['水银'，'超级明星'，'蒸汽']
PUMA彪马	['ULTRA', 'FUTURE', 'RAPIDO'] ['超'，'未来'，'RAPIDO']	[STADIUM, JD] [体育场，法学博士]
NEW BALANCE新百伦	['FURON'] ['呋喃']
UNDER ARMOUR安德玛	['TEKELA'] ['特克拉']

df2 w/conditions df2 带条件

Names名称	Supplier供应商	_match_keywords _match_keywords	_match_supplier _match_supplier
['ADIDAS', 'PREDATOR', '17.1'] ['阿迪达斯'，'捕食者'，'17.1']	ADIDAS阿迪达斯	True真的	False错误的
['NIKE', 'MERCURIAL', '2020'] ['耐克'，'水银'，'2020']	NIKE耐克	True真的	False错误的
['NIKE', 'VAPOR', '2021'] ['耐克'，'蒸汽'，'2021']	NIKE耐克	False错误的	True真的
['NEW', 'BALANCE', 'FURON'] ['新'，'平衡'，'FURON']	JD京东	True真的	False错误的
['PUMA', 'RAPIDO', '21.3'] ['彪马'，'RAPIDO'，'21.3']	STADIUM体育场	False错误的	False错误的

df2 w/helpers dropped df2 w/helpers 掉落

| Names                          | Supplier   | Company.1   |
|--------------------------------|------------|-------------|
| ['ADIDAS', 'PREDATOR', '17.1'] | ADIDAS     |             |
| ['NIKE', 'MERCURIAL', '2020']  | NIKE       |             |
| ['NIKE', 'VAPOR', '2021']      | NIKE       |             |
| ['NEW', 'BALANCE', 'FURON']    | JD         |             |
| ['PUMA', 'RAPIDO', '21.3']     | STADIUM    |             |

Answer 2

Here is the code that is working for me.这是对我有用的代码。 If anybody has any suggestion on improving my code, I would appreciate it!如果有人对改进我的代码有任何建议，我将不胜感激！

for i in range(len(df1["Keywords"])):       
  for j in range(len(df1["Keywords"][i])):
    for name_index in range(len(df2["Product_name"])):
        if df1["Keywords"][i][j].strip() in df2["Product_name"][name_index]:
            print("YES," + df1["Keywords"][i][j] + " in "+ df2["Product_name"][name_index])       

            print("---Checking Supplier-------")
            df1["Suppliers"].fillna("Empty", inplace = True)
            if df1["Suppliers"][i] == "Empty":
                print("---Supplier empty so addding brand name")
                df2["Company.1"][name_index] = df1["Company"][i]
                print("---Brand added--")
            else:
                print("---Supplier not empty so looking for match")
                for suppliers in df1["Suppliers"][i]:                               
                    if df2["Supplier"][name_index] in suppliers:
                        print("Supplier matched", end =" ")
                        print(df2["Supplier"][name_index], suppliers)
                        df2["Company.1"][name_index] = df1["Company"][i]
                        print("Brand added")
                        break                                                 
                    else:
                        df2["Company.1"][name_index] = ("Unmapped")
                        print("Supplier not matched so unmapped.", end = " ")
                        print(df2["Supplier"][name_index], suppliers)

如何循环 pandas 系列（列表类型）并检查列表中的字符串是否与另一个 df 中的另一个系列匹配？

问题描述

2 个解决方案

解决方案1
0 2022-08-14 18:57:21

df1 df1

df2 w/conditions df2 带条件

df2 w/helpers dropped df2 w/helpers 掉落

解决方案2
0 2022-08-15 06:22:24

如何循环 pandas 系列（列表类型）并检查列表中的字符串是否与另一个 df 中的另一个系列匹配？

问题描述

2 个解决方案

解决方案1 0 2022-08-14 18:57:21

df1 df1

df2 w/conditions df2 带条件

df2 w/helpers dropped df2 w/helpers 掉落

解决方案2 0 2022-08-15 06:22:24

解决方案1
0 2022-08-14 18:57:21

解决方案2
0 2022-08-15 06:22:24