简体   繁体   English

使用pandas数据框中的regex替换列值

[英]Replace column values using regex in pandas data frame

I have a column in pandas data frame like below. 我在pandas数据框中有一个列,如下所示。 Column name is ABC 列名是ABC

ABC
Fuel
FUEL
Fuel_12_ab
Fuel_1
Lube
Lube_1
Lube_12_a
cat_Lube

Now I want to replace the values in this column using regex like below 现在我想使用下面的正则表达式替换此列中的值

ABC
Fuel
FUEL
Fuel
Fuel
Lube
Lube
Lube
cat_Lube

How can we do this type of string matching in pandas data frame. 我们如何在pandas数据框中进行这种类型的字符串匹配。

In [63]: df.ABC.str.replace(r'_\d+.*', r'')
Out[63]:
0        Fuel
1        FUEL
2        Fuel
3        Fuel
4        Lube
5        Lube
6        Lube
7    cat_Lube
Name: ABC, dtype: object

Alt with str.extract : 使用str.extract Alt:

df.ABC.str.extract('^(.*?)(?=_\d|$)', expand=False)

0        Fuel
1        FUEL
2        Fuel
3        Fuel
4        Lube
5        Lube
6        Lube
7    cat_Lube
Name: ABC, dtype: object

Extension courtesy piRSquared: 延伸礼貌piRSquared:

df.ABC.str.extract('(.*(?<=lube|fuel)).*', re.IGNORECASE, expand=False)

0        Fuel
1        FUEL
2        Fuel
3        Fuel
4        Lube
5        Lube
6        Lube
7    cat_Lube
Name: ABC, dtype: object

Use positive lookbehind for lube or fuel while ignoring case. 在忽略案例的情况下,使用正面的后视lubefuel

import re
import pandas as pd

pat = re.compile('(?<=lube|fuel)_', re.IGNORECASE)

df.assign(ABC=[re.split(pat, x, 1)[0] for x in df.ABC])

        ABC
0      Fuel
1      FUEL
2      Fuel
3      Fuel
4      Lube
5      Lube
6      Lube
7  cat_Lube

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM