简体   繁体   English

Tornado请求处理程序映射到国际字符

[英]Tornado request handler mapping to international characters

I want to be able to match URL requests for some internationalized characters, like /Comisión . 我希望能够匹配某些国际化角色的URL请求,例如/Comisión This is my setup: 这是我的设置:

class Application(tornado.web.Application):
    def __init__(self):
        handlers = [ 
            '''some handlers, and then this: '''
            (r"/([\w\:\,]+)", InternationalizedHandler)
            ]
            tornado.web.Application.__init__(self, handlers, **settings)

But setting locales in Tornado doesn't seem to be the right solution. 但是在Tornado中设置区域设置似乎不是正确的解决方案。 How is it possible to set up the regex to catch characters such as é,å,µ etc.? 如何设置正则表达式来捕捉é,å,μ等字符? Will changing the re mode in python do? 会改变python中的re模式吗?

TL;DR: It's impossible to do with Tornado's built-in router. TL; DR:使用Tornado的内置路由器是不可能的。

Tornado buries the regexp compiling for handler patterns pretty deep, so @stema's suggestion to use the re.Unicode flag is difficult, because it's not immediately clear where to pass in the flag. Tornado将正则表达式编译为处理程序模式非常深,因此@stema建议使用re.Unicode标志很困难,因为它不能立即清楚传递到标志的位置。 There are two ways to tackle that particular problem: subclass URLSpec and override the __init__ function, or put a flag prefix in the pattern. 有两种方法可以解决该特定问题:子类URLSpec并覆盖__init__函数,或在模式中添加一个标志前缀。

The first option is a lot of work. 第一个选择是很多工作。 The second option takes advantage of a feature in Python's re module in which patterns may specify (?u) at the beginning of the pattern instead of passing in the re.UNICODE flag as a parameter. 第二个选项利用了Python的re模块中的一个特性,其中模式可以在模式的开头指定(?u) ,而不是将re.UNICODE标志作为参数传递。

Unfortunately, neither option will work since Tornado matches patterns against the request URL before percent-decoding it into the unicode string. 遗憾的是,由于Tornado 在将百分比解码为unicode字符串之前将模式与请求URL匹配,因此这两个选项都不起作用。 Therefore, compiling the pattern with the Unicode flag has no effect since you're matching against percent-encoded ASCII URLs, not Unicode strings. 因此,使用Unicode标志编译模式不起作用,因为您要匹配百分比编码的ASCII URL,而不是Unicode字符串。

If you look here you see what your expression "means": http://regex101.com/r/zO9zC8 如果你看这里,你会看到你的表达“意味着什么”: http//regex101.com/r/zO9zC8

If you want to match é,å,µ , you need to match the inverse of a-zA-Z0-9 , which would be [^a-zA-Z0-9] . 如果你想匹配é,å,µ ,你需要匹配a-zA-Z0-9的倒数,这将是[^a-zA-Z0-9] Seeing as how you used \\w prior, you may aswell use \\W which is the same as [^\\w] . 看看你如何使用\\w之前,你也可以使用\\W ,它与[^\\w]

Good luck! 祝好运!

Edit: Re-reading your question I suggest you follow @stemas answer instead. 编辑:重新阅读您的问题我建议您关注@stemas答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM