简体   繁体   中英

Conditional If Statement: If value contains string then set another column equal to string

I write one python 3 script


I have a column 'original_title', where I have different film titles ia all films of Star Wars (+ the name of the episode) and Star Trek (+ the name of the episode). I want to create one column which will show me only 'star trek' (without the name of episode), 'star wars' and 'na'.

This is my code for the new column:

df['Trek_Wars'] = pd.np.where(df.original_title.str.contains("Star Wars"), "star_wars", 
              pd.np.where(df.original_title.str.contains("Star Trek"), "star_trek"))

However, it doesn't work

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-5472b36a2193> in <module>()
      1 df['Trek_Wars'] = pd.np.where(df.original_title.str.contains("Star Wars"), "star_wars",
----> 2                    pd.np.where(df.original_title.str.contains("Star Trek"), "star_trek"))

ValueError: either both or neither of x and y should be given

What should I do?

I assume you are using Pandas. I am not aware of a pd.np.where method, but there is np.where , which you can use for your task:

df['Trek_Wars'] = np.where(df['original_title'].str.contains('Star Wars'),
                           'star_wars', 'na')

Notice we have to provide values for when the condition is met and for when the condition is not met. For multiple conditions, you can use pd.DataFrame.loc :

# set default value
df['Trek_Wars'] = 'na'

# update according to conditions
df.loc[df['original_title'].str.contains('Star Wars'), 'Trek_Wars'] = 'star_wars'
df.loc[df['original_title'].str.contains('Star Trek'), 'Trek_Wars'] = 'star_trek'

You can simply your logic further with a dictionary mapping:

# map search string to update string
mapping = {'Star Wars': 'star_wars', 'Star Trek': 'star_trek'}

# iterate mapping items
for k, v in mapping.items():
    df.loc[df['original_title'].str.contains(k), 'Trek_Wars'] = v

As in your example both the values ie "Star Wars" and "Star Trek" contain same number of characters (9), you can just split the string till first 9 letters. But for more finer parsing of that column you will need to find a more better method.

X['Film_Series'] = 0
for ind, row in df.iterrows():
    X['Film_Series'].loc[ind] = X['film_name'].loc[ind].str[:9]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM