简体   繁体   中英

Update one column's value based on another column's value in Pandas using regular expression

Suppose I have a dataframe like below:

>>> df = pd.DataFrame({'Category':['Personal Care', 'Home Care', 'Pharma', 'Pet'], 'SubCategory':['Shampoo', 'Floor Wipe', 'Veterinary', 'Animal Feed']})
>>> df
        Category  SubCategory
0  Personal Care      Shampoo
1      Home Care   Floor Wipe
2         Pharma   Veterinary
3            Pet  Animal Feed

I'd like to update the value in 'Category' column whenever the 'Subcategory' column's value contains either 'Veterinary' or 'Animal' (case-insensitive). To do that, I devised a method like below:

def update_col1_values_based_on_values_in_col2_using_regex_mappings(
            df,
            col1_name: str,
            col2_name: str,
            dictionary_of_regex_mappings: dict):
        for pattern, new_str_value in dictionary_of_regex_mappings.items():
            mask = df[col2_name].str.contains(pattern)
            df.loc[mask, col1_name] = new_str_value

        return df

This method works as expected as shown below:

>>> df1 = update_col1_values_based_on_values_in_col2_using_regex_mappings(df, 'Category', 'SubCategory', {"(?i).*Veterinary.*": "Pet Related", "(?i).*Animal.*": "Pet Related"})

>>> df1
        Category  SubCategory
0  Personal Care      Shampoo
1      Home Care   Floor Wipe
2    Pet Related   Veterinary
3    Pet Related  Animal Feed

In practice, there will be more than 'Veterinary' and 'Animal Feed' to map from, so some of the suggestions below, although they read elegant, are not going be practical for the actual use case. In other words, please assume that the mapping is going to be more like this:

{
"(?i).*Veterinary.*": "Pet Related", 
"(?i).*Animal.*": "Pet Related"
"(?i).*Pharma.*": "Pharmaceutical",
"(?i).*Diary.*": "Other",
... # lots and lots more mapping here
}

I'm wondering if there's a more elegant (Pandas-ish) way to accomplish this. Thank you in advance for your suggestions!

EDIT: I didn't clarify in the beginning that the mapping between 'Category' and 'Subcategory' columns wouldn't be restricted to just 'Veterinary' and 'Animal'.

You can use the following code, which is intuitive.

df['Category'] = df['SubCategory'].map(lambda x: "Pet Related" if "Animal" in x or "Veterinary" in x else x)

You could do it with pd.DataFrame.where , and re to add the flag case-insensitive:

import re
df.Category.where(~df.SubCategory.str.contains('Veterinary|Animal',flags = re.IGNORECASE),'Pet Related',inplace=True)

Output:

        Category  SubCategory
0  Personal Care      Shampoo
1      Home Care   Floor Wipe
2    Pet Related   Veterinary
3    Pet Related  Animal Feed

Not sure if this is the best way, but you can do this:

df.loc[df.SubCategory.str.contains('Veterinary|Animal'), 'Category']='Pet Related'

If you need to use regex, str.contains() does also support regex

pattern = r'(?i)veterinary|animal'
df.loc[df.SubCategory.str.contains(pattern, regex=True), 'Category']='Pet Related'

And this is the result

In [3]: df
Out[3]:
        Category  SubCategory
0  Personal Care      Shampoo
1      Home Care   Floor Wipe
2    Pet Related   Veterinary
3    Pet Related  Animal Feed

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM