[英]Find and replace both quotation styles in Python unicoded string
I'm trying to replace strings marked in both quotation mark styles (“...” and "...") on a string in Python. 我正在尝试在Python字符串中替换用双引号样式(“ ...”和“ ...”)标记的字符串。
I've already written a regex to replace the standard quotations 我已经写了一个正则表达式来替换标准报价
print re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)
When I try to do it for the literary (?) ones it doesn't replace anything. 当我尝试为文学类(?)做它时,它什么也不会替代。
return re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', self.title)
In fact, as I have it right now, I can't even make a conditional query: 实际上,就目前而言,我什至无法进行条件查询:
quote_list = ['“', '”']
if all(character in self.title for character in quote_list):
print "It has literary quotes"
print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', self.title)
print re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)
EDIT : Further context: It's an object 编辑 :进一步的上下文:这是一个对象
class Entry(models.Model):
title = models.CharField(max_length=200)
def render_title(self):
"""
This function wraps italics around quotation marks
"""
quote_list = ['“', '”']
if all(character in self.title for character in quote_list):
print "It has literary quotes"
return re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', self.title)
return re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)
I am not well-versed in regex commands. 我不熟悉regex命令。 What am I doing wrong?
我究竟做错了什么?
EDIT2 : One step closer to the problem! EDIT2 :离问题更近一步! It lies with the fact that I'm dealing with unicoded strings.
这是因为我正在处理未编码的字符串。 I'm still stumped as how I can solve this.
我仍然为解决这个问题而感到困惑。 Any help is appreciated!
任何帮助表示赞赏!
>>> title = u"sdsfgsdfgsdgfsdgs “ asd” asd"
>>> print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', title)
sdsfgsdfgsdgfsdgs “ asd” asd
>>> title = "sdsfgsdfgsdgfsdgs “ asd” asd"
>>> print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', title)
sdsfgsdfgsdgfsdgs <em>“ asd”</em> asd
#!/usr/bin/python
# -*- coding: utf-8 -*-
import re
quote_list = ['“', '”']
title = "“...”"
if all(character in title for character in quote_list):
print "It has literary quotes"
print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', title)
all
return true when every element is Truthy
Truthy
时,所有元素all
返回true Ensure where ever when you compare or use regexpression the coding format is same. 确保当您比较或使用正则表达式时,编码格式是相同的。 support using a unicode regexp pattern against a unicode string
支持对Unicode字符串使用Unicode正则表达式模式
quote_list = [u'“', u'”']
title = u"“...”"
if all(character in title for character in quote_list):
print "It has literary quotes"
print re.sub(u'\“(.+?)\”', u'<em>“\1”</em>', title)
I finally found an answer. 我终于找到了答案。 After printing the variable as suggested by @interjay I found out that the string was unicoded.
按照@interjay的建议打印变量后,我发现该字符串未编码。
Comparing it with a simple string didn't work so I removed the conditional and used this answer to simply make an unicode-escaped regex string to handle both simple and "literary" quotes. 与简单的字符串进行比较无法正常工作,因此我删除了条件语句,并使用此答案来简单地制作一个转义为Unicode的正则表达式字符串,以处理简单和“文学”引号。
title = re.sub(ur'\“(.+?)\”', ur'“<em>\1</em>”', self.title) # notice the ur
title = re.sub(ur'\"(.+?)\"', ur'"<em>\1</em>"', title)
I've seen here in a comment (unfortunately now deleted) how one could merge the above two sentences in one, but for now it works. 我在此处的评论中(不幸的是现在已删除)在其中看到了如何将以上两个句子合并为一个,但现在可以了。
Thank you very much for your help! 非常感谢您的帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.