简体   繁体   中英

String lookup into unknown charset html content

I'm using strpos to lookup for string into web page bodies. 50% it fails, although the search string is present. I have tried to strtolower both search string and searched content, same results. Probabily the problem arises when dealing with different charsets...

Assuming: - search string charset is unknown - searched content charset is unknown - charset could be any ISOxx, UTF-8, Shift-JIS

Is there a bulletproof function to find a substring?

您可以尝试使用mb_detect_encoding首先检测编码,然后转换为要使用的编码(使用iconv或mb_convert_encoding)并搜索该编码中的模式。

yup首先将html转换为utf8 / latin1,从Content-Type标头或meta标签中获取内容编码,然后使用iconv转换为utf8 / latin1,然后不再担心它

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM