简体   繁体   中英

Replace value in Pandas DataFrame column, based on a condition (contains a string)

I'm currently working with a pandas dataset (US startups) and am trying to aggregate sectors by keywords. In other words, I need to loop through a column and if a value contains a given string, replace the whole value with a new string.

If already tried some simple "if" statement loops, but can't seem to get the syntax right. I've also tried some .loc, but all I can seem to do is replace all values of the column with one string.

Thanks!

A simple way to do this is store the mappings of sectors to sector categories as a dictionary, and then apply a function that calls that mapping.

import pandas as pd

data = pd.DataFrame(["chunky spam", "watery spam", "hard-boiled", "scrambled"])

def mapping(sector):
    mapping_dict = {"chunky spam": "spam", 
                    "watery spam": "spam", 
                    "hard-boiled": "eggs", 
                    "scrambled": "eggs"}

    return mapping_dict[sector]

data[0].apply(mapping)

You can accomplish this using pd.DataFrame.where() :

df.where(df.column_name != "str", "replace")

Based on the formulation of the df.where() method, it will replace all of the values that DO NOT match the condition. This is why we use the negated != when looking for "str" in some column. All instance which are equal to "str" will be replaced with the string "replace"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM