简体   繁体   English

使用正则表达式替换单词中间的双引号

[英]Using Regex to replace double quotes in the middle of a word

I have a string which looks like this: 我有一个看起来像这样的字符串:

my_string='''[u"column1" : u"abcd", u"column2" : u"te"st"]'''

I'd like to replace the double quotes (by simple quotes) that are in the middle of a word without changing the ones that in the beginning or in the end. 我想替换一个单词中间的双引号(通过简单的引号)而不改变开头或结尾的双引号。 Meaning that I'd like my_string to be like this: 这意味着我希望my_string像这样:

'''[u"column1" : u"abcd", u"column2" : u"te'st"]'''

Right now, I'm just using a workaround solution to do so. 现在,我只是使用解决方案来解决这个问题。 Basically, my solution replaces the double quotes that are in the middle of words if they are not preceded by the letter u. 基本上,我的解决方案取代了单词中间的双引号,如果它们之前没有字母u。 Here is what it looks like: 这是它的样子:

unusual=re.findall(r'([a-tv-zA-TV-Z0-9]\"[a-zA-Z0-9])', my_string)
if unusual:
  for un in unusual:
    my_string=my_string.replace(un, un.replace('"', "'"))

This works for me now, but it would be interesting to improve this solution because if I have a u in the middle of the word next to a double quote, it will not work any more. 这对我来说很有用,但是改进这个解决方案会很有意思,因为如果我在双引号旁边的单词中间有一个u ,它将不再起作用。 For example: my_string='''[u"column1" : u"abcd", u"column2" : u"teu"st"]''' 例如: my_string='''[u"column1" : u"abcd", u"column2" : u"teu"st"]'''

Can I get some help with this guys ? 我可以帮助这些人吗? I'm running out of ideas :) 我的想法已经不多了:)

PS: I'm using python 2.7 PS:我正在使用python 2.7

You could try to use lookarounds (not 100% perfect): 您可以尝试使用外观(不是100%完美):

(?<=\w)(?<![\[\s:]u)"(?=\w)

and replace these occurences with ' , see a demo on regex101.com . 并用'替换这些出现' ,请参阅regex101.com上的演示


Broken down, this says: 细分,这说:

 (?<=\\w) # require a word character immediately before (?<![\\[\\s:]u) # no [u nor :u nor u (with spaces) " # a double quote (?=\\w) # require a word character afterwards. 


In Python : Python

 import re my_string='''[u"column1" : u"abcd", u"column2" : u"te"st"]''' rx = re.compile(r'(?<=\\w)(?<![\\[\\s:]u)"(?=\\w)') new_string = rx.sub("'", my_string) print(new_string) # [u"column1" : u"abcd", u"column2" : u"te'st"] 

Better yet: fix the string where it came from. 更好的是:修改它来自的字符串。

>>> my_string='''[u"column1" : u"abcd", u"column2" : u"te"st"]'''

>>> print(re.sub(r'("\w+)(")(\w+")', r"\1'\3", my_string))
[u"column1" : u"abcd", u"column2" : u"te'st"]

Explanation: 说明:

("\\w+) will match any word starting with quote " and parenthesis are used to represent groups ie it will match "te in your case (group 1) ("\\w+)将匹配开始帖任何单词"和括号用于表示基团,即它将匹配"te你的情况(第1组)

(") will match any existing quote after word ie it will match " after "te in your case (group 2) (")将匹配任何现有的报价,即在您的情况下它将匹配""te之后(第2组)

(\\w+") will match any word ending with quote " ie it will match st" in your case (group 3) (\\w+")将匹配任何单词帖结尾" ,即它将匹配st"你的情况(第3组)

in re.sub() we can directly represent group to keep from match re.sub()我们可以直接表示组以防止匹配

\\1 will keep all the matched characters by ("\\w+) unchanged \\1将保持所有匹配的字符("\\w+)不变

\\3 will keep all the matched characters by (\\w+") unchanged \\3将保持所有匹配的字符(\\w+")不变

\\2 is representing the quote " between both of matched group hence we can write any character(s) to replace group 2 \\2是代表报价"两者匹配组之间,因此,我们可以写出任何字符(多个),以取代第2组

Take care of string and the approach you choose to reach expected task. 照顾字符串和您选择的方法来达到预期的任务。 You may go with searching for: 你可以去寻找:

"(?<![[ :]u.)(?=[a-zA-Z\d])

and replacing with ' . 并用'替换。

Live demo 现场演示

If you consider _ as a word character above regex could be shorter: 如果您认为_作为单词字符,则正则表达式可以更短:

"(?<![[ :]u.)(?=\w)

Breakdown: 分解:

  • " Match a double quotation mark "匹配双引号
  • (?<![[ :]u.) That's not preceded by delimiters : , space or [ (?<![[ :]u.)之前没有分隔符: ,空格或[
  • (?=\\w) And is followed by a word character (?=\\w)后跟一个单词字符

Python code: Python代码:

re.sub(r'"(?<![[ :]u.)(?=\w)', "'", my_string)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM