[英]Regex in pandas dataframe
I have a dataframe like this a column like this我有一个像这样的 dataframe 像这样的专栏
COL1
RED[10%(INC)]
RED[12%(INC)]
and I want create col2
as this我想这样创建
col2
COL2
10
12
Could cou help me to find the good regex?你能帮我找到好的正则表达式吗? I tried this:
我试过这个:
RED\[(\d+\.\d+) %INC\]
but it doesn't walk.但它不会走路。
If you want to use your regex and only extract numbers in the specified context, you can use如果你想使用你的正则表达式并且只提取指定上下文中的数字,你可以使用
df['COL2'] = df['COL1'].str.extract(r'RED\[(\d+(?:\.\d+)?)%\[INC]]', expand=False)
See the regex demo .请参阅正则表达式演示。
Details细节
RED\[
- a RED[
string RED\[
- 一个RED[
字符串(\d+(?:\.\d+)?)
- Capturing group 1: one or more digits followed with an optional sequence of a dot and one or more digits (\d+(?:\.\d+)?)
- 捕获第 1 组:一个或多个数字后跟可选的点序列和一个或多个数字%\[INC]]
- a %[INC]]
literal string. %\[INC]]
- %[INC]]
文字字符串。 You could also explore other options:您还可以探索其他选项:
df['COL1'].str.extract(r'(\d+(?:\.\d+)?)%', expand=False)
df['COL1'].str.extract(r'(\d+(?:\.\d+)?)%', expand=False)
[
, getting the second item and removing %
from it: df['COL1'].str.split("[").str[1].str.replace("%", "")
[
拆分,获取第二项并从中删除%
: df['COL1'].str.split("[").str[1].str.replace("%", "")
This solution uses re.findall
:此解决方案使用
re.findall
:
Modules and data:模块和数据:
import pandas as pd
df = pd.DataFrame({'COL1':['RED[10%(INC)','RED[12%(INC)']})
Solution:解决方案:
df['COL2'] = df['COL1'].apply(lambda x: re.findall('[0-9]+', x))
df['COL2'] = pd.DataFrame(df['COL2'].tolist())
df['COL2']= np.where(df['COL1'].str.extract(r'RED\[(d+)')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.