简体   繁体   中英

Looking for words in string that matches the values in a dictionary, then return key in a new column

I've been trying to iterate through strings in a pandas dataframe to look for a certain set of words and here I've been successful.

However, I realised that I didn't just want to find words but also look at the semantics of a word and group together a certain set of words that bare the same meaning as my main keyword.

I stumbled upon the following question How to return key if a given string matches the keys value in a dictionary which is exactly what I want to do but unfortunately can't get it to work in a pandas dataframe.

Below is one of the solutions which can be found in the link:

my_dict = {"color": ("red", "blue", "green"), "someothercolor":("orange", "blue", "white")}

solutions = []

my_color = 'blue'

for key, value in my_dict.items():
    if my_color in value:
        solutions.append(key)

Outputs:

color

My data frame:

Now I have a data frame where I would like to iterate through df['Name'] to find a value and then I want to add the key to a new column. In this example it would be df['Colour']

+---+----------+--------------------------+-----------------------------+----------+--------+
|   |   SKU    |           Name           |         Description         | Category | Colour |
+---+----------+--------------------------+-----------------------------+----------+--------+
| 0 | 7E+10    | Red Lace Midi Dress      | Red Lace Midi D...          | Dresses  |        |
| 1 | 7E+10    | Long Armed Sweater Azure | Long Armed Sweater Azure... | Sweaters |        |
| 2 | 2,01E+08 | High Top Ruby Sneakers   | High Top Ruby Sneakers...   | Shoes    |        |
| 3 | 4,87E+10 | Tight Indigo Jeans       | Tight Indigo Jeans...       | Denim    |        |
| 4 | 2,2E+09  | T-Shirt Navy             | T-Shirt Navy...             | T-Shirts |        |
+---+----------+--------------------------+-----------------------------+----------+--------+

Expected result:

+---+----------+--------------------------+-----------------------------+----------+--------+
|   |   SKU    |           Name           |         Description         | Category | Colour |
+---+----------+--------------------------+-----------------------------+----------+--------+
| 0 | 7E+10    | Red Lace Midi Dress      | Red Lace Midi D...          | Dresses  | red    |
| 1 | 7E+10    | Long Armed Sweater Azure | Long Armed Sweater Azure... | Sweaters | blue   |
| 2 | 2,01E+08 | High Top Ruby Sneakers   | High Top Ruby Sneakers...   | Shoes    | red    |
| 3 | 4,87E+10 | Tight Indigo Jeans       | Tight Indigo Jeans...       | Denim    | blue   |
| 4 | 2,2E+09  | T-Shirt Navy             | T-Shirt Navy...             | T-Shirts | blue   |
+---+----------+--------------------------+-----------------------------+----------+--------+

My code:

colour = {'red': ('red', 'rose', 'ruby’), ‘blue’: (‘azure’, ‘indigo’, ’navy')}

def fetchColours(x):
    for key, value in colour.items():
            if value in x:
                return key
            else:
                return np.nan

df['Colour'] = df['Name'].apply(fetchColours)

I get the following error:

TypeError: 'in <string>' requires string as left operand, not tuple

I can't run a tuple against string. How would I approach this?

You need to loop through each value in the dictionary key tuple values.

As per the error message, you cannot check whether a tuple exists in a str type.

In addition, make sure your else statement occurs after the outer for loop, so that all keys are tested before you output the default value.

Finally, make sure you check versus str.lower() , since string matching is case sensitive in Python.

import pandas as pd

df = pd.DataFrame({'Name': ['Red Lace Midi Dress', 'Long Armed Sweater Azure',
                            'High Top Ruby Sneakers', 'Tight Indigo Jeans',
                            'T-Shirt Navy']})

colour = {'red': ('red', 'rose', 'ruby'), 'blue': ('azure', 'indigo', 'navy')}

def fetchColours(x):
    for key, values in colour.items():
        for value in values:
            if value in x.lower():
                return key
    else:
        return np.nan

df['Colour'] = df['Name'].apply(fetchColours)

Result:

                       Name Colour
0       Red Lace Midi Dress    red
1  Long Armed Sweater Azure   blue
2    High Top Ruby Sneakers    red
3        Tight Indigo Jeans   blue
4              T-Shirt Navy   blue

You are trying to search a tuple of words inside a string, while I guess you want to check if any word of the tuple is in the string.

BTW strings are case sensitive in python.

You could replace :

if value in x: 

by

if any(word in x.lower() for word in value):

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM