简体   繁体   中英

PHP regex not matching utf-8 decoded string

I am having trouble with some a regex statement. I'm not sure why it is doing this, however I think it may have something to do with character encoding.

So I am using curl to receive the page content from a website. Then I am using domXPath query to get a certain element, then from that element I get its content, then from that content I perform a regex statement. However the regex statement is not working and I don't know why.

This is what I receive from the element:

X: asdasdfgdgdrrY: dfgdfgfgZ: ukuykyukjghj
  a B 7dd. 

Now when I try to match it with this code:

/X: (?P<x>.*)Y: (?P<y>.*)Z: (?P<z>.*)\s*(?P<a>[a-zA-Z]+) (?P<b>[a-zA-Z]+) (?P<c>[0-9]+)dd/

I have tested this in Dreamweaver and it matches so I have no idea what it wouldn't online

Also the page I am receiving has a content of utf-8,

I attempt to convert the content to remove the utf-8 characters by using

iconv('utf-8', 'ISO-8859-1//IGNORE', $td->item(0)->nodeValue);

if I don't remove the utf-8 characters there are weird Á symbols after the 'a', 'b' and 'c' variable values.

Ok I figured it out, all i had to do to get rid of these invisible invalid characters was:

$value = preg_replace("/[^a-zA-Z0-9 %():\$.\/-]/",' ',$value);

pre much just replace any character that wasnt valid, with a space, or blank. In my case I used space because it appeared some spaces were invalid.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM