[英]Replace column values using regex in pandas data frame
I have a column in pandas
data frame like below. 我在
pandas
数据框中有一个列,如下所示。 Column name is ABC
列名是
ABC
ABC
Fuel
FUEL
Fuel_12_ab
Fuel_1
Lube
Lube_1
Lube_12_a
cat_Lube
Now I want to replace the values in this column using regex like below 现在我想使用下面的正则表达式替换此列中的值
ABC
Fuel
FUEL
Fuel
Fuel
Lube
Lube
Lube
cat_Lube
How can we do this type of string matching in pandas
data frame. 我们如何在
pandas
数据框中进行这种类型的字符串匹配。
In [63]: df.ABC.str.replace(r'_\d+.*', r'')
Out[63]:
0 Fuel
1 FUEL
2 Fuel
3 Fuel
4 Lube
5 Lube
6 Lube
7 cat_Lube
Name: ABC, dtype: object
Alt with str.extract
: 使用
str.extract
Alt:
df.ABC.str.extract('^(.*?)(?=_\d|$)', expand=False)
0 Fuel
1 FUEL
2 Fuel
3 Fuel
4 Lube
5 Lube
6 Lube
7 cat_Lube
Name: ABC, dtype: object
Extension courtesy piRSquared: 延伸礼貌piRSquared:
df.ABC.str.extract('(.*(?<=lube|fuel)).*', re.IGNORECASE, expand=False)
0 Fuel
1 FUEL
2 Fuel
3 Fuel
4 Lube
5 Lube
6 Lube
7 cat_Lube
Name: ABC, dtype: object
Use positive lookbehind for lube
or fuel
while ignoring case. 在忽略案例的情况下,使用正面的后视
lube
或fuel
。
import re
import pandas as pd
pat = re.compile('(?<=lube|fuel)_', re.IGNORECASE)
df.assign(ABC=[re.split(pat, x, 1)[0] for x in df.ABC])
ABC
0 Fuel
1 FUEL
2 Fuel
3 Fuel
4 Lube
5 Lube
6 Lube
7 cat_Lube
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.