简体   繁体   English

Python字符串文字到正则表达式对象

[英]Python string literal to regex object

I have a function returning a string "r'^A Plat'" which is written into a text file 我有一个返回字符串"r'^A Plat'"的函数,该字符串被写入文本文件

get_Pat(file)
    #process text file and now returns "r'^A Plat'"

originally, I had it hard coded inside the code. 最初,我在代码内部进行了硬编码。

pat = r'^A Plat'
use(pat)

now 现在

pat = get_Pat(file)
use(pat)

But its complaining because i suppose its string instead of regex object. 但它的抱怨,因为我想它的字符串,而不是正则表达式对象。

I have tried 我努力了

re.escape(get_Pat(file))

and

re.compile(get_Pat(file))

but none of them works 但它们都不起作用

How do i convert string literal into regex object? 如何将字符串文字转换为正则表达式对象?

Is r'^A Plat' a equivalent of simply re.compile("A Plat")?? r'^ A Plat'是否等效于re.compile(“ A Plat”)? dumb question, maybe 愚蠢的问题,也许

it would work if its use("^A Plat'") 如果它use("^A Plat'")
Doesnt work if its use("r'^A Plat'") <--- what get_Pat(file) is spitting out 如果它的use("r'^A Plat'") <--- get_Pat(file)吐出了什么,则不起作用

I suppose my task is simply tranforming string r'^A Plat' in to ^A Plat. 我想我的任务只是将字符串r'^ A Plat'转换为^ A Plat。
But i feel like its just a cheap hack. 但是我觉得这只是一个廉价的黑客。

r'^A Plat' is identical to '^A Plat' without the r . r'^A Plat'是相同的'^A Plat'r The r stands for raw , not regex. r代表raw而不是regex。 It lets you write strings with special characters like \\ without having to escape them. 它使您可以编写带有特殊字符(如\\字符串,而不必对其进行转义。

>>> r'^A Plat'
'^A Plat'
>>> r'/ is slash, \ is backslash'
'/ is slash, \\ is backslash'
>>> r'write \t for tab, \n for newline, \" for double quote'
'write \\t for tab, \\n for newline, \\" for double quote'

Raw strings are commonly used when writing regexes since regexes often contain backslashes that would otherwise need to be escaped. 原始字符串通常在编写正则表达式时使用,因为正则表达式通常包含反斜杠,否则应将其转义。 r does not create regex objects, though. r 创建regex对象,虽然。

From the Python manual : Python手册

§ 2.4.1. 第2.4.1节 String literals 字符串文字

String literals may optionally be prefixed with a letter 'r' or 'R' ; 字符串文字可以选择以字母'r''R'开头; such strings are called raw strings and use different rules for interpreting backslash escape sequences. 这样的字符串称为原始字符串,并使用不同的规则来解释反斜杠转义序列。

... ...

Unless an 'r' or 'R' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C. 除非存在'r''R'前缀,否则将根据类似于标准C使用的规则来解释字符串中的转义序列。

Not sure what you mean by 'none of them works', but re.compile() is what you're looking for: 不确定“什么都不起作用”是什么意思,但是re.compile()是您要寻找的内容:

>>> def getPat():
...     return r'^A Plat'
...
...
>>> getPat()
'^A Plat'
>>> reObj = re.compile(getPat())
>>> reObj
<_sre.SRE_Pattern object at 0x16cfa18>
>>> reObj.match("A Plat")
<_sre.SRE_Match object at 0x16c3058>
>>> reObj.match("foo")

edit: 编辑:

You can get rid of the extra r' ' cruft after it's returned with this code: 使用此代码返回后,您可以消除多余的r' '残留:

>>> s = "r'^A Plat'"
>>> s = s[1:].strip("'")
>>> s
'^A Plat'

According to the comment in your get_pat function its returning: 根据您的get_pat函数中的注释,其返回:

"r'^A Plat'" “ r'^ A Plat'”

Which is not what you thought you were getting: 这不是您认为得到的:

>>> x = re.compile("r'^A Plat'")
>>> y = "A Plat wins"
>>> x.findall(y)
[]
>>> x = re.compile("^A Plat")
>>> x.findall(y)
['A Plat']
>>>

So the regex your using isn't r'^A Plat' its "r'^A Plat'", r'^A Plat' is fine: 因此,您使用的正则表达式不是r'^ A Plat',而是“ r'^ A Plat'”,r'^ A Plat'可以:

>>> x = re.compile(r'^A Plat')
>>> x.findall(y)
['A Plat']

To fix this I would have to understand how you where getting the string "r'^A Plat'" in the first place. 要解决此问题,我将必须了解您首先如何在字符串中获取“ r'^ A Plat'”。

Do

from ast import literal_eval
pat = literal_eval(get_Pat(file))

.

EDIT 编辑

aelon, aelon,

As you wrote in a comment you can't import literal_eval() , the above solution of mine is useless for you. 正如您在评论中所写,您不能导入literal_eval() ,我的上述解决方案对您没有用。 Besides, though expressing interesting information, the other answers didn't brought another solution. 此外,尽管表达了有趣的信息,但其他答案并没有带来其他解决方案。
So, I propose a new one, not using literal_eval() . 因此,我提出了一个新的建议,而不是使用literal_eval()

import re

detect = re.compile("r(['\"])(.*?)\\1[ \t]*$")

with open('your_file.txt') as f:
    pat = f.readline()

if detect.match(pat):
    r = re.compile(detect.match(pat).group(2))
else:
    r = re.compile(pat)

.

Explanations: 说明:

.

Suppose there is the succession of characters r'^Six o\\'clock\\nJim' written as first line of *your_file* 假设有连续的字符r'^Six o\\'clock\\nJim'写为* r'^Six o\\'clock\\nJim' *的第一行

The opening and reading of the first line of *your_file* creates an object pat * your_file *第一行的打开和读取会创建对象pat
- its TYPE is <type 'str'> in Python 2 and <class 'str'> in Python 3 -其类型是<type 'str'>在Python 2和<class 'str'>在Python 3
- its REPRESENTATION is "r'^Six o\\'clock\\nJim'" -其表示形式为"r'^Six o\\'clock\\nJim'"
- its VALUE is r'^Six o\\'clock\\nJim' , that is to say the succession of characters r , ' , ^ , S , i , x , -它的值是r'^Six o\\'clock\\nJim' ,也就是说,字符r'^Six , o , \\ , ' , c , l , o , c , k , \\ , n , J , i , m o\\'clock\\nJim
There may be also the "character" \\n at the end if there is a second line in the file. 如果文件中有第二行,则末尾可能还会有“字符” \\n And there may be also blanks or tabs, who knows ?, between the end of r'^Six o\\'clock\\nJim' written in the file and the end of its line. 在文件中写入的r'^Six o\\'clock\\nJim'的末尾与其行尾之间可能还有空白或制表符,谁知道?。 That's why I close the regex pattern to define detect with [ \\t]*$ . 这就是为什么我关闭正则表达式模式以使用[ \\t]*$定义detect的原因。
So, we may obtain possible additional blanks and tabs and newline after the characters of interest, and then if we do print tuple(pat) we'll obtain for example: 因此,我们可能会在感兴趣的字符之后获得其他可能的空白,制表符和换行符,然后如果我们print tuple(pat)我们将获得例如:

('r', "'", '^', 'S', 'i', 'x', ' ', 'o', '\\', "'", 'c', 'l', 'o', 'c', 'k', '\\', 'n', 'J', 'i', 'm', "'", ' ', ' ', ' ', '\t', '\n')

.

Now, let us consider the object obtained with the expression detect.match(pat).group(2) . 现在,让我们考虑使用表达式detect.match(pat).group(2)获得的对象。
Its value is ^Six o\\'clock\\nJim , composed of 18 characters, \\ and ' and n being three distinct characters among them, there are not one escaped character \\' and one escaped character \\n in it. 它的值是^Six o\\'clock\\nJim ,由18个字符组成, \\'n是三个不同的字符,其中没有一个转义字符\\'和一个转义字符\\n
This value is exactly the same as the one we would obtain for an object rawS of name rawS by writing the instruction rawS = r'^Six o\\'clock\\nJim' 这个值与我们通过写指令rawS = r'^Six o\\'clock\\nJim'来获得名称为rawS的对象rawS的值完全相同rawS = r'^Six o\\'clock\\nJim'
Then, we can obtain the regex whose pattern is written in a file under the form r'....' by writing directly r = re.compile(detect.match(pat).group(2)) 然后,通过直接写r = re.compile(detect.match(pat).group(2)) ,我们可以获得正则表达式,该正则表达式的模式以r'....'的形式写入文件中。
In my example, there are only the sequences \\' and \\n in the series of characters written in the file. 在我的示例中,文件中写入的字符序列中只有序列\\'\\n But all that precedes is valid for any of the Escape Sequences of the language. 但是所有在此之前的内容对于该语言的任何转义序列均有效。

In other words, we don't have to wonder about a function that would do the same as the EXPRESSION r'^Six o\\'clock\\nJim' from the STRING "r'^Six o\\'clock\\nJim'" of value r'^Six o\\'clock\\nJim' , 换句话说,我们不必怀疑一个函数会与来自STRING的"r'^Six o\\'clock\\nJim'" r'^Six o\\'clock\\nJim'的EXPRESSION r'^Six o\\'clock\\nJim' "r'^Six o\\'clock\\nJim'"的功能相同值r'^Six o\\'clock\\nJim'
we have directly the result of r'^Six o\\'clock\\nJim' as the value of the string catched by detect.match(pat).group(2) . 我们直接将r'^Six o\\'clock\\nJim'作为detect.match(pat).group(2)的字符串的值。

.

Nota Bene Nota Bene

In Python 2, the type <type 'str'> is the type of a limited repertoire of characters. 在Python 2中,类型<type 'str'>是有限字符集的类型。
It is the type of the read content of a file, opened as well with mode 'r' as with mode 'rb' . 它是文件读取内容的类型,在模式'r'和模式'rb'也可以打开。

In Python 3, the type <class 'str'> covers the unicode characters. 在Python 3中,类型<class 'str'>涵盖了Unicode字符。
But contrary to Python 3, the read content of a file opened with mode 'r' is of type <type 'str'> 但是与Python 3相反,以模式'r'打开的文件的读取内容的类型为<type 'str'>
while it is of type <class 'bytes'> if the file is opened with mode 'rb' . 如果文件以'rb'模式打开,则其类型为<class 'bytes'>

Then, I think the above code works as well in Python 3 as in Python 2, so such the file is opened with mode 'r' . 然后,我认为上面的代码在Python 3和Python 2中都可以正常工作,因此可以使用'r'模式打开该文件。

If the file should be opened with 'rb' the regex pattern should be changed to b"r(['\\"])(.*?)\\\\1[ \\t]*\\r?\\n" . 如果应使用'rb'打开文件,则应将正则表达式模式更改为b"r(['\\"])(.*?)\\\\1[ \\t]*\\r?\\n"

.

AFAIHU AFAIHU

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM