preg_match UTF-8问题是未知符号而不是西里尔语

Question

my script work great, but today after checkin logs i found some matrix words, after analysing i understood that there is something with utf8, files are parsed, title is extracted, but result instead of russian words is (Ð¡ÐµÑ€Ð¸Ð°Ð»Ñ‹ Ð¢Ð£Ð¢! Ð¡ÐµÑ€Ð¸Ð) unknown symbols 我的脚本工作得很好，但今天在签入日志后我找到了一些矩阵词，经过分析我明白有一些东西与utf8，文件被解析，标题被提取，但结果而不是俄语单词是（Ð¡ÐμÑ€Ð¸Ð °Ð»Ñ<Ð¢Ð¢¢Ð¡ÐμÑ€Ð¸Ð）未知符号

i use 我用

$cont = "dasdas<title>Сериалы ТУТ! Сериалы онлайн sda</title>";
preg_match("'<title[^>]*?>(.*)</title>'siU", $cont, $match);

//$match[1] = Ð¡ÐµÑ€Ð¸Ð°Ð»Ñ‹ Ð¢Ð£Ð¢! Ð¡ÐµÑ€Ð¸Ð sda

when i try to add pattern modifier /u there is no changes, the same unknown matrix words. 当我尝试添加模式修饰符/ u时没有变化，相同的未知矩阵词。 Please. 请。

Maybe there is something with PHP? 也许有PHP的东西？

Answer 1

It is not a php or a regex problem, but an html problem. 它不是php或正则表达式问题，而是一个html问题。 To obtain a correct display, you must add <meta charset="UTF-8"/> in the header of your html code. 要获得正确的显示，您必须在html代码的标题中添加<meta charset="UTF-8"/> 。

As an aside comment: using the U modifier is useless: 作为旁注：使用U修饰符是无用的：

preg_match('~<title[^>]*>(.*?)</title>~si', $cont, $match);

preg_match UTF-8问题是未知符号而不是西里尔语

问题描述

1 个解决方案

解决方案1
2 已采纳 2014-04-23 00:17:54

preg_match UTF-8问题是未知符号而不是西里尔语

问题描述

1 个解决方案

解决方案1 2 已采纳 2014-04-23 00:17:54

解决方案1
2 已采纳 2014-04-23 00:17:54