Suppose I have a dataframe like below:
>>> df = pd.DataFrame({'Category':['Personal Care', 'Home Care', 'Pharma', 'Pet'], 'SubCategory':['Shampoo', 'Floor Wipe', 'Veterinary', 'Animal Feed']})
>>> df
Category SubCategory
0 Personal Care Shampoo
1 Home Care Floor Wipe
2 Pharma Veterinary
3 Pet Animal Feed
I'd like to update the value in 'Category' column whenever the 'Subcategory' column's value contains either 'Veterinary' or 'Animal' (case-insensitive). To do that, I devised a method like below:
def update_col1_values_based_on_values_in_col2_using_regex_mappings(
df,
col1_name: str,
col2_name: str,
dictionary_of_regex_mappings: dict):
for pattern, new_str_value in dictionary_of_regex_mappings.items():
mask = df[col2_name].str.contains(pattern)
df.loc[mask, col1_name] = new_str_value
return df
This method works as expected as shown below:
>>> df1 = update_col1_values_based_on_values_in_col2_using_regex_mappings(df, 'Category', 'SubCategory', {"(?i).*Veterinary.*": "Pet Related", "(?i).*Animal.*": "Pet Related"})
>>> df1
Category SubCategory
0 Personal Care Shampoo
1 Home Care Floor Wipe
2 Pet Related Veterinary
3 Pet Related Animal Feed
In practice, there will be more than 'Veterinary' and 'Animal Feed' to map from, so some of the suggestions below, although they read elegant, are not going be practical for the actual use case. In other words, please assume that the mapping is going to be more like this:
{
"(?i).*Veterinary.*": "Pet Related",
"(?i).*Animal.*": "Pet Related"
"(?i).*Pharma.*": "Pharmaceutical",
"(?i).*Diary.*": "Other",
... # lots and lots more mapping here
}
I'm wondering if there's a more elegant (Pandas-ish) way to accomplish this. Thank you in advance for your suggestions!
EDIT: I didn't clarify in the beginning that the mapping between 'Category' and 'Subcategory' columns wouldn't be restricted to just 'Veterinary' and 'Animal'.
You can use the following code, which is intuitive.
df['Category'] = df['SubCategory'].map(lambda x: "Pet Related" if "Animal" in x or "Veterinary" in x else x)
You could do it with pd.DataFrame.where
, and re
to add the flag case-insensitive:
import re
df.Category.where(~df.SubCategory.str.contains('Veterinary|Animal',flags = re.IGNORECASE),'Pet Related',inplace=True)
Output:
Category SubCategory
0 Personal Care Shampoo
1 Home Care Floor Wipe
2 Pet Related Veterinary
3 Pet Related Animal Feed
Not sure if this is the best way, but you can do this:
df.loc[df.SubCategory.str.contains('Veterinary|Animal'), 'Category']='Pet Related'
If you need to use regex, str.contains() does also support regex
pattern = r'(?i)veterinary|animal'
df.loc[df.SubCategory.str.contains(pattern, regex=True), 'Category']='Pet Related'
And this is the result
In [3]: df
Out[3]:
Category SubCategory
0 Personal Care Shampoo
1 Home Care Floor Wipe
2 Pet Related Veterinary
3 Pet Related Animal Feed
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.