简体   繁体   English

Python正则表达式:为什么python不接受我的模式?

[英]Python Regex: why doesn't python accept my pattern?

I want to write Python regex that takes a string of pattern: 我想编写一个带有一串模式的Python正则表达式:

"u'Johns's Place'," “u'Johns的地方”,“

and returns: 并返回:

Johns's Place 约翰斯的地方

It should locate the character 'u', the apostrophe comes after it and then the apostrophe that comes before the comma and returns what there is between these two apostrophes. 它应该找到字符'u',撇号在它之后,然后是逗号之前的撇号,并返回这两个撇号之间的内容。

Therefore, I wrote the following code: 因此,我写了以下代码:

title = "u'Johns's Place',"
print re.sub(r"u'([^\"']*)',", r"\"\1\"", title)

however, I still got the entire string 但是,我仍然得到了整个字符串

"u'Johns's Place'," “u'Johns的地方”,“

with no filtering. 没有过滤。

Do you know how it can be resolved? 你知道怎么解决吗?

Python doesn't accept your pattern because of the middle ' in "John's" . Python不接受,因为中间的模式'"John's" It isn't followed by a comma, as described in your pattern. 它后面没有逗号,如您的模式中所述。 The matching cannot continue to look for a ', because you only allow characters that aren't " or ' with [^\\"']* . 匹配不能继续查找',因为您只允许使用[^\\"']*而不是"'字符。

If you want to parse JSON with Python, use json package, not regexen applied to escaped unicode strings. 如果要使用Python解析JSON,请使用json包,而不是将regexen应用于转义的unicode字符串。

I don't use Python much but this regex should solve your problem 我不太多使用Python,但这个正则表达式应该可以解决你的问题

^u'(.*)',$

from the beginning match the u and single quote, capture anything after that until the single quote and comma at the end 从一开始就匹配你和单引号,在那之后捕获任何东西,直到最后的单引号和逗号

print re.sub(r"^u'(.*)',$", r"\"\1\"", title)

remove ^ and $ if there's more to your string than the replaced (in other words, if there is any context) 删除^和$如果你的字符串比被替换的更多(换句话说,如果有任何上下文)

After making a bigger research I found this package https://simplejson.readthedocs.io/en/latest/ 经过更大的研究后,我发现了这个软件包https://simplejson.readthedocs.io/en/latest/

It can make you read a JSON file without putting u'..' for every string. 它可以让你读取一个JSON文件,而不会为每个字符串添加“..”。

import simplejson as json
import requests

response_json = requests.get(<url-address>)
current_json = json.loads(response_json.content)

current_json will not have the character 'u' at the beginnig of every string. current_json在每个字符串的beginnig上都没有字符'u'。

It answers my question partially because it returns keys and values that are delimited by a single quote mark(') and not by quotation marks(") as it's needed in JSON format. 它部分回答了我的问题,因为它返回的键和值由单引号标记(')分隔,而不是由JSON格式所需的引号(“)分隔。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM