简体   繁体   English

在Python未编码的字符串中查找和替换两种引号样式

[英]Find and replace both quotation styles in Python unicoded string

I'm trying to replace strings marked in both quotation mark styles (“...” and "...") on a string in Python. 我正在尝试在Python字符串中替换用双引号样式(“ ...”和“ ...”)标记的字符串。

I've already written a regex to replace the standard quotations 我已经写了一个正则表达式来替换标准报价

print re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)

When I try to do it for the literary (?) ones it doesn't replace anything. 当我尝试为文学类(?)做它时,它什么也不会替代。

return re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', self.title)

In fact, as I have it right now, I can't even make a conditional query: 实际上,就目前而言,我什至无法进行条件查询:

quote_list = ['“', '”']

if all(character in self.title for character in quote_list):
    print "It has literary quotes"
    print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', self.title)
print re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)

EDIT : Further context: It's an object 编辑 :进一步的上下文:这是一个对象

class Entry(models.Model):
    title = models.CharField(max_length=200)

def render_title(self):
    """
    This function wraps italics around quotation marks
    """
    quote_list = ['“', '”']

    if all(character in self.title for character in quote_list):
        print "It has literary quotes"
        return re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', self.title)
    return re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)

I am not well-versed in regex commands. 我不熟悉regex命令。 What am I doing wrong? 我究竟做错了什么?

EDIT2 : One step closer to the problem! EDIT2 :离问题更近一步! It lies with the fact that I'm dealing with unicoded strings. 这是因为我正在处理未编码的字符串。 I'm still stumped as how I can solve this. 我仍然为解决这个问题而感到困惑。 Any help is appreciated! 任何帮助表示赞赏!

>>> title = u"sdsfgsdfgsdgfsdgs “ asd” asd"
>>> print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', title)
sdsfgsdfgsdgfsdgs “ asd” asd
>>> title = "sdsfgsdfgsdgfsdgs “ asd” asd"
>>> print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', title)
sdsfgsdfgsdgfsdgs <em>“ asd”</em> asd
#!/usr/bin/python
# -*- coding: utf-8 -*-
import re
quote_list = ['“', '”']
title = "“...”"

if all(character in title for character in quote_list):
    print "It has literary quotes"
    print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', title)
  1. Please check if you encoding supports the characters that you are using. 请检查您的编码是否支持您正在使用的字符。 I am here using utf-8 which supports quotes that you have used, and everything worked well. 我在这里使用utf-8,它支持您使用的引号,并且一切正常。
  2. your if condition might not be true at all, check if the condition can every be true. 如果条件可能根本不成立,请检查条件是否可以全部成立。 all return true when every element is Truthy 当每个元素都为Truthy时,所有元素all返回true

Ensure where ever when you compare or use regexpression the coding format is same. 确保当您比较或使用正则表达式时,编码格式是相同的。 support using a unicode regexp pattern against a unicode string 支持对Unicode字符串使用Unicode正则表达式模式

quote_list = [u'“', u'”']
title = u"“...”"

if all(character in title for character in quote_list):
   print "It has literary quotes"
   print re.sub(u'\“(.+?)\”', u'<em>“\1”</em>', title)

I finally found an answer. 我终于找到了答案。 After printing the variable as suggested by @interjay I found out that the string was unicoded. 按照@interjay的建议打印变量后,我发现该字符串未编码。

Comparing it with a simple string didn't work so I removed the conditional and used this answer to simply make an unicode-escaped regex string to handle both simple and "literary" quotes. 与简单的字符串进行比较无法正常工作,因此我删除了条件语句,并使用此答案来简单地制作一个转义为Unicode的正则表达式字符串,以处理简单和“文学”引号。

title = re.sub(ur'\“(.+?)\”', ur'“<em>\1</em>”', self.title)  # notice the ur
title = re.sub(ur'\"(.+?)\"', ur'"<em>\1</em>"', title)

I've seen here in a comment (unfortunately now deleted) how one could merge the above two sentences in one, but for now it works. 我在此处的评论中(不幸的是现在已删除)在其中看到了如何将以上两个句子合并为一个,但现在可以了。

Thank you very much for your help! 非常感谢您的帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM