简体   繁体   中英

Conditional If Statement: If value in row contains string ... set another column equal to string


I have the 'Activity' column filled with strings and I want to derive the values in the 'Activity_2' column using an if statement.

So Activity_2 shows the desired result. Essentially I want to call out what type of activity is occurring.

I tried to do this using my code below but it won't run (please see screen shot below for error). Any help is greatly appreciated!


    for i in df2['Activity']:
        if i contains 'email':
            df2['Activity_2'] = 'email'
        elif i contains 'conference'
            df2['Activity_2'] = 'conference'
        elif i contains 'call'
            df2['Activity_2'] = 'call'
            df2['Activity_2'] = 'task'

Error: if i contains 'email':
SyntaxError: invalid syntax

I assume you are using pandas , then you can use numpy.where , which is a vectorized version of if/else , with the condition constructed by str.contains :

df['Activity_2'] = pd.np.where(df.Activity.str.contains("email"), "email",
                   pd.np.where(df.Activity.str.contains("conference"), "conference",
                   pd.np.where(df.Activity.str.contains("call"), "call", "task")))


#   Activity            Activity_2
#0  email personA       email
#1  attend conference   conference
#2  send email          email
#3  call Sam            call
#4  random text         task
#5  random text         task
#6  lwantto call        call

This also works:

df.loc[df['Activity'].str.contains('email'), 'Activity_2'] = 'email'
df.loc[df['Activity'].str.contains('conference'), 'Activity_2'] = 'conference'
df.loc[df['Activity'].str.contains('call'), 'Activity_2'] = 'call'

The current solution behaves wrongly if your df contains NaN values. In that case I recommend using the following code which worked for me

df['Activity_2'] = pd.np.where(temp.str.contains("0"),"None",
                   pd.np.where(temp.str.contains("email"), "email",
                   pd.np.where(temp.str.contains("conference"), "conference",
                   pd.np.where(temp.str.contains("call"), "call", "task"))))

you have an invalid syntax for checking strings.

try using

 for i in df2['Activity']:
        if 'email' in i :
            df2['Activity_2'] = 'email'

Another solution can be found in a post made by @unutbu. This also works great for creating conditional columns. I changed the example from that post df['Set'] == Z to match your question to df['Activity'].str.contains('yourtext') . See an example below:

df = pd.DataFrame({'Activity': ['email person A', 'attend conference', 'call Charly'],
                       'Colleague': ['Knor', 'Koen', 'Hedge']})

conditions = [

values = ['email', 'conference', 'call']

df['Activity_2'] = np.select(conditions, values, default='task')


You can find the original post here: Pandas conditional creation of a series/dataframe column

  1. Your code had bugs- no colons on "elif" lines.
  2. You didn't mention you were using Pandas, but that's the assumption I'm going with.
  3. My answer handles defaults, uses proper Python conventions, is the most efficient, up-to-date, and easily adaptable for additional activities.


def assign_activity(todo_item):
    """Assign activity to raw text TODOs
    activities = ['email', 'conference', 'call']

    for activity in activities:
        if activity in todo_item:
            return activity
            # Default value
            return DEFAULT_ACTIVITY

df = pd.DataFrame({'Activity': ['email person A', 'attend conference', 'call Charly'],
                   'Colleague': ['Knor', 'Koen', 'Hedge']})

# You should really come up with a better name than 'Activity_2', like 'Labels' or something.
df["Activity_2] = df["Activity"].apply(assign_activity)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM