简体   繁体   中英

setting output from exec to UTF-8

I am writing a php script using phantomjs to execute javascript on the site and then returning its content to php script. My problem is that the returned output is not in UTF-8 obviously. I tried setlocale,iconv even utf8_encode none of it working. Here are the codes:

inspectOffer.php

<?php

$url=$argv[1];
$locale='cs_CZ.UTF-8';
setlocale(LC_ALL,$locale);
putenv('LC_ALL='.$locale);

$phantom_script= dirname(__FILE__). '/inspectOffer.js';
$response =  exec ('phantomjs ' . $phantom_script. ' '.$url,$out);

foreach ($out as $index =>$value){
    $output.=$value;
}
$output=iconv(mb_detect_encoding($output, mb_detect_order(), true), "UTF-8", $output);
$output=utf8_encode($output);

var_dump($output);

inspectOffer.js

var webPage = require('webpage');
var page = webPage.create();

var system = require('system');
var args = system.args;
var url=args[1];

page.open(url, function(status) {
    console.log(page.content);
    phantom.exit();
});

Something like this on the page:

V blízkosti Rezidence se nachází veškerá občanská vybavenost.

Looks like this on output:

V bl├şzkosti Rezidence se nach├íz├ş ve┼íker├í ob─Źansk├í vybavenost.

executing the script from cmd in windows 10:

php inspectOffer.php https://www.sreality.cz/detail/prodej/byt/2+kk/karlovy-vary-dvory-/398053724

I think the output is in UTF-8, but the open encoding is DOS (CP 437) or something similar

You can try saving it, and reopen with your editor in that encoding to try it

Edit: You can try to put this tag on your document:

<meta http-equiv="content-type" content="text/html; charset=UTF-8">

exec does not support this,

you may set you charset via header function as follows:

header('Content-type: text/plain; charset=utf-8');

使用passthru而不是exec ,输出应保持不变。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM