简体   繁体   English

preg_match UTF-8问题是未知符号而不是西里尔语

[英]preg_match UTF-8 problems unknown symbols instead of Cyrillic

my script work great, but today after checkin logs i found some matrix words, after analysing i understood that there is something with utf8, files are parsed, title is extracted, but result instead of russian words is (Сериалы ТУТ! СериÐ) unknown symbols 我的脚本工作得很好,但今天在签入日志后我找到了一些矩阵词,经过分析我明白有一些东西与utf8,文件被解析,标题被提取,但结果而不是俄语单词是(СÐμриР°Ð»Ñ<ТТ¢Ð¡ÐμриÐ)未知符号

i use 我用

$cont = "dasdas<title>Сериалы ТУТ! Сериалы онлайн sda</title>";
preg_match("'<title[^>]*?>(.*)</title>'siU", $cont, $match);

//$match[1] = Сериалы ТУТ! СериРsda

when i try to add pattern modifier /u there is no changes, the same unknown matrix words. 当我尝试添加模式修饰符/ u时没有变化,相同的未知矩阵词。 Please. 请。

Maybe there is something with PHP? 也许有PHP的东西?

It is not a php or a regex problem, but an html problem. 它不是php或正则表达式问题,而是一个html问题。 To obtain a correct display, you must add <meta charset="UTF-8"/> in the header of your html code. 要获得正确的显示,您必须在html代码的标题中添加<meta charset="UTF-8"/>

As an aside comment: using the U modifier is useless: 作为旁注:使用U修饰符是无用的:

preg_match('~<title[^>]*>(.*?)</title>~si', $cont, $match);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM