I have a list of colors like this:
color = ['green', 'blue', 'red']
I have a Dataframe like this:
df:
col1 col2
A dark green
B sea blue
C blue
D exclusive red
E green
F pale red
I want to match col2
with the color
list. If any word of col2
matches the element of the color
list, replace it with the lists value.
The result data frame will be
col1 col2
A green
B blue
C blue
D red
E green
F red
What is the most efficient way to do it using pandas?
Use Series.str.extract
with joined values by |
for regex OR
, last add fillna
for replace non matched values ( NaN
s) by original column:
print (df)
col1 col2
0 A dark green
1 B sea blue
2 C blue
3 D exclusive red
4 E green
5 F pale <- not matched value
color=['green','blue','red']
pat = r'({})'.format('|'.join(color))
df['col2'] = df['col2'].str.extract(pat, expand=False).fillna(df['col2'])
print (df)
col1 col2
0 A green
1 B blue
2 C blue
3 D red
4 E green
5 F pale
Use str.extract
:
df['col2'] = df.col2.str.extract(f"({'|'.join(color)})", expand=False)
df
col1 col2
0 A green
1 B blue
2 C blue
3 D red
4 E green
5 F red
For better performance, you can use a list comprehension that uses a precompiled regex pattern to perform re.search
:
import re
p = re.compile(rf"({'|'.join(color)})")
def try_extract(s):
try:
return p.search(s).group(1)
except (TypeError, AttributeError):
return s
df['col2'] = [try_extract(s) for s in df['col2']
df
col1 col2
0 A green
1 B blue
2 C blue
3 D red
4 E green
5 F red
If the color doesn't match how to keep keep the original color? I don't want nan values there.
This is automatically handled by try_except
:
df2 = df.append(pd.Series(
{'col1': 'G', 'col2': 'something else'}), ignore_index=True)
df2['col2'] = [try_extract(s) for s in df2['col2']]
df2
col1 col2
0 A green
1 B blue
2 C blue
3 D red
4 E green
5 F red
6 G something else # other values are preserved.
For more information on why list comprehensions should be considered a competitive alternative, you can check For loops with pandas - When should I care? .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.