简体   繁体   中英

Replace pandas Dataframe column values with the list if it matches a word

I have a list of colors like this:

color = ['green', 'blue', 'red']

I have a Dataframe like this:

df:
col1        col2
 A        dark green
 B        sea blue
 C          blue
 D       exclusive red
 E          green
 F       pale red

I want to match col2 with the color list. If any word of col2 matches the element of the color list, replace it with the lists value.

The result data frame will be

 col1          col2
  A            green
  B            blue
  C            blue
  D            red
  E            green
  F            red

What is the most efficient way to do it using pandas?

Use Series.str.extract with joined values by | for regex OR , last add fillna for replace non matched values ( NaN s) by original column:

print (df)
  col1           col2
0    A     dark green
1    B       sea blue
2    C           blue
3    D  exclusive red
4    E          green
5    F           pale <- not matched value

color=['green','blue','red']

pat = r'({})'.format('|'.join(color))
df['col2'] = df['col2'].str.extract(pat, expand=False).fillna(df['col2'])

print (df)
  col1   col2
0    A  green
1    B   blue
2    C   blue
3    D    red
4    E  green
5    F   pale

Use str.extract :

df['col2'] = df.col2.str.extract(f"({'|'.join(color)})", expand=False)
df

  col1   col2
0    A  green
1    B   blue
2    C   blue
3    D    red
4    E  green
5    F    red

For better performance, you can use a list comprehension that uses a precompiled regex pattern to perform re.search :

import re

p = re.compile(rf"({'|'.join(color)})")
def try_extract(s):
    try:
        return p.search(s).group(1)
    except (TypeError, AttributeError):
        return s

df['col2'] = [try_extract(s) for s in df['col2']
df

  col1   col2
0    A  green
1    B   blue
2    C   blue
3    D    red
4    E  green
5    F    red

If the color doesn't match how to keep keep the original color? I don't want nan values there.

This is automatically handled by try_except :

df2 = df.append(pd.Series(
    {'col1': 'G', 'col2': 'something else'}), ignore_index=True)
df2['col2'] = [try_extract(s) for s in df2['col2']]
df2

  col1            col2
0    A           green
1    B            blue
2    C            blue
3    D             red
4    E           green
5    F             red
6    G  something else   # other values are preserved.

For more information on why list comprehensions should be considered a competitive alternative, you can check For loops with pandas - When should I care? .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM