简体   繁体   English

PHP函数中的编码问题

[英]Problems with encoding in PHP functions

I am trying to build an URL from a string, that I get with webcrawler. 我正在尝试使用webcrawler从字符串构建URL。 I have managed to create a crawler, but I can't make an URL string... I have managed to find out that PHP function preg_match_all messes up my result. 我已经成功创建了一个搜寻器,但是无法创建URL字符串...我已经设法发现PHP函数preg_match_all弄乱了我的结果。 This is what I have: 这就是我所拥有的:

preg_match_all('/"([^"]+)"/', $str, $matches); 
foreach ($matches[1] as $value) {
     $termsArray[] = $this->createUrl($value);
}

The preg_match_all function returns the correct string but, I guess, the encoding is wrong... And the createUrl function looks lite this: preg_match_all函数返回正确的字符串,但是,我想编码是错误的……createUrl函数看起来很简洁:

private function createLikitUrl($term)
    {
    $ltSymbolsArray = array(
              'a1' => 'ą',
              'c2' => 'č',
              'e1' => 'ę',
              'e2' => 'ė',
              'i1' => 'į',
              's2' => 'š',
              'u1' => 'ų',
              'u2' => 'ū',
              'z2' => 'ž',
              '_' => ' '
          );
          $chars = preg_split("//u", $term, -1, PREG_SPLIT_NO_EMPTY);
          $urlStr = '';
          foreach ($chars as $value) {
            foreach ($ltSymbolsArray as $key => $replacement) {
              if ($value == $replacement) {
                $value = $key;
              }
            }
            $urlStr .= $value;
          }
}

The problem is that preg_split returns the same string that I pass as $term variable when preg_match_all is used with a string that has multibyte symbols. 问题是,当preg_match_all与具有多字节符号的字符串一起使用时,preg_split返回与$ term变量传递的字符串相同的字符串。 If I pass a string to the createUrl function without using preg_match_all, then it works perfectly. 如果我在不使用preg_match_all的情况下将字符串传递给createUrl函数,那么它将完美地工作。 My guess is that I am missing unicode modifier in preg_match_all pattern, but I have hard time on writting regular expressions. 我的猜测是我在preg_match_all模式中缺少unicode修饰符,但是我很难写正则表达式。

Anny help would be apreciated. 安妮的帮助将不胜感激。

Not sure I understand your request, but I tried your script and it works pretty fine, except that you need to add a return $urlStr; 不确定我是否理解您的请求,但是我尝试了您的脚本,该脚本运行正常,只是您需要添加return $urlStr; at the end of the function. 在函数的末尾。
And it has to be renamed createUrl . 并且必须将其重命名为createUrl

I finnaly found a solution. 我终于找到了解决方案。 If anyone will come across to this problem, this should help. 如果有人遇到这个问题,这将有所帮助。 As I thought, there was a problem with encoding. 我认为,编码存在问题。 I just added one line of code before the preg_match_all that solved the problem: 我在解决问题的preg_match_all之前添加了一行代码:

$matches = mb_convert_encoding($str, 'UTF-8', 'ISO-8859-13');

Cheers! 干杯! :) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM