pandas dataframe 中的正则表达式

Question

I have a dataframe like this a column like this我有一个像这样的 dataframe 像这样的专栏

COL1      
RED[10%(INC)]
RED[12%(INC)]

and I want create col2 as this我想这样创建col2

COL2
10
12

Could cou help me to find the good regex?你能帮我找到好的正则表达式吗？ I tried this:我试过这个：

RED\[(\d+\.\d+) %INC\]

but it doesn't walk.但它不会走路。

Answer 1

If you want to use your regex and only extract numbers in the specified context, you can use如果你想使用你的正则表达式并且只提取指定上下文中的数字，你可以使用

df['COL2'] = df['COL1'].str.extract(r'RED\[(\d+(?:\.\d+)?)%\[INC]]', expand=False)

Details细节

RED\[ - a RED[ string RED\[ - 一个RED[字符串
(\d+(?:\.\d+)?) - Capturing group 1: one or more digits followed with an optional sequence of a dot and one or more digits (\d+(?:\.\d+)?) - 捕获第 1 组：一个或多个数字后跟可选的点序列和一个或多个数字
%\[INC]] - a %[INC]] literal string. %\[INC]] - %[INC]]文字字符串。

You could also explore other options:您还可以探索其他选项：

Extracting the number followed with a percentage sign: df['COL1'].str.extract(r'(\d+(?:\.\d+)?)%', expand=False)提取数字后跟百分号： df['COL1'].str.extract(r'(\d+(?:\.\d+)?)%', expand=False)
Splitting with [ , getting the second item and removing % from it: df['COL1'].str.split("[").str[1].str.replace("%", "")用[拆分，获取第二项并从中删除% ： df['COL1'].str.split("[").str[1].str.replace("%", "")

Answer 2

This solution uses re.findall :此解决方案使用re.findall ：

Modules and data:模块和数据：

import pandas as pd
df = pd.DataFrame({'COL1':['RED[10%(INC)','RED[12%(INC)']})

Solution:解决方案：

df['COL2'] = df['COL1'].apply(lambda x: re.findall('[0-9]+', x))
df['COL2'] = pd.DataFrame(df['COL2'].tolist())

Answer 3

df['COL2']= np.where(df['COL1'].str.extract(r'RED\[(d+)')