简体   繁体   English

pandas dataframe 中的正则表达式

[英]Regex in pandas dataframe

I have a dataframe like this a column like this我有一个像这样的 dataframe 像这样的专栏

COL1      
RED[10%(INC)]
RED[12%(INC)]

and I want create col2 as this我想这样创建col2

COL2
10
12

Could cou help me to find the good regex?你能帮我找到好的正则表达式吗? I tried this:我试过这个:

RED\[(\d+\.\d+) %INC\]

but it doesn't walk.但它不会走路。

If you want to use your regex and only extract numbers in the specified context, you can use如果你想使用你的正则表达式并且只提取指定上下文中的数字,你可以使用

df['COL2'] = df['COL1'].str.extract(r'RED\[(\d+(?:\.\d+)?)%\[INC]]', expand=False)

See the regex demo .请参阅正则表达式演示

Details细节

  • RED\[ - a RED[ string RED\[ - 一个RED[字符串
  • (\d+(?:\.\d+)?) - Capturing group 1: one or more digits followed with an optional sequence of a dot and one or more digits (\d+(?:\.\d+)?) - 捕获第 1 组:一个或多个数字后跟可选的点序列和一个或多个数字
  • %\[INC]] - a %[INC]] literal string. %\[INC]] - %[INC]]文字字符串。

You could also explore other options:您还可以探索其他选项:

  • Extracting the number followed with a percentage sign: df['COL1'].str.extract(r'(\d+(?:\.\d+)?)%', expand=False)提取数字后跟百分号: df['COL1'].str.extract(r'(\d+(?:\.\d+)?)%', expand=False)
  • Splitting with [ , getting the second item and removing % from it: df['COL1'].str.split("[").str[1].str.replace("%", "")[拆分,获取第二项并从中删除%df['COL1'].str.split("[").str[1].str.replace("%", "")

This solution uses re.findall :此解决方案使用re.findall

Modules and data:模块和数据:

import pandas as pd
df = pd.DataFrame({'COL1':['RED[10%(INC)','RED[12%(INC)']})

Solution:解决方案:

df['COL2'] = df['COL1'].apply(lambda x: re.findall('[0-9]+', x))
df['COL2'] = pd.DataFrame(df['COL2'].tolist())
df['COL2']= np.where(df['COL1'].str.extract(r'RED\[(d+)')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM