[英]Problems with encoding in PHP functions
I am trying to build an URL from a string, that I get with webcrawler. 我正在尝试使用webcrawler从字符串构建URL。 I have managed to create a crawler, but I can't make an URL string... I have managed to find out that PHP function preg_match_all messes up my result. 我已经成功创建了一个搜寻器,但是无法创建URL字符串...我已经设法发现PHP函数preg_match_all弄乱了我的结果。 This is what I have: 这就是我所拥有的:
preg_match_all('/"([^"]+)"/', $str, $matches);
foreach ($matches[1] as $value) {
$termsArray[] = $this->createUrl($value);
}
The preg_match_all function returns the correct string but, I guess, the encoding is wrong... And the createUrl function looks lite this: preg_match_all函数返回正确的字符串,但是,我想编码是错误的……createUrl函数看起来很简洁:
private function createLikitUrl($term)
{
$ltSymbolsArray = array(
'a1' => 'ą',
'c2' => 'č',
'e1' => 'ę',
'e2' => 'ė',
'i1' => 'į',
's2' => 'š',
'u1' => 'ų',
'u2' => 'ū',
'z2' => 'ž',
'_' => ' '
);
$chars = preg_split("//u", $term, -1, PREG_SPLIT_NO_EMPTY);
$urlStr = '';
foreach ($chars as $value) {
foreach ($ltSymbolsArray as $key => $replacement) {
if ($value == $replacement) {
$value = $key;
}
}
$urlStr .= $value;
}
}
The problem is that preg_split returns the same string that I pass as $term variable when preg_match_all is used with a string that has multibyte symbols. 问题是,当preg_match_all与具有多字节符号的字符串一起使用时,preg_split返回与$ term变量传递的字符串相同的字符串。 If I pass a string to the createUrl function without using preg_match_all, then it works perfectly. 如果我在不使用preg_match_all的情况下将字符串传递给createUrl函数,那么它将完美地工作。 My guess is that I am missing unicode modifier in preg_match_all pattern, but I have hard time on writting regular expressions. 我的猜测是我在preg_match_all模式中缺少unicode修饰符,但是我很难写正则表达式。
Anny help would be apreciated. 安妮的帮助将不胜感激。
Not sure I understand your request, but I tried your script and it works pretty fine, except that you need to add a return $urlStr;
不确定我是否理解您的请求,但是我尝试了您的脚本,该脚本运行正常,只是您需要添加return $urlStr;
at the end of the function. 在函数的末尾。
And it has to be renamed createUrl
. 并且必须将其重命名为createUrl
。
I finnaly found a solution. 我终于找到了解决方案。 If anyone will come across to this problem, this should help. 如果有人遇到这个问题,这将有所帮助。 As I thought, there was a problem with encoding. 我认为,编码存在问题。 I just added one line of code before the preg_match_all that solved the problem: 我在解决问题的preg_match_all之前添加了一行代码:
$matches = mb_convert_encoding($str, 'UTF-8', 'ISO-8859-13');
Cheers! 干杯! :) :)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.