I am calling the python script from php using the following command
$url = "https://www.digitalocean.com/community/tutorials/how-to-secure-your-redis-installation-on-ubuntu-14-04";
$output = shell_exec('python PythonScripts/readable.py '.$url);
echo($output);
When I run the file using commandline
python extractor.py https://www.digitalocean.com/community/tutorials/how-to-secure-your-redis-installation-on-ubuntu-14-04
I am getting the desired output.
The contents of python file are
import sys
from readability.readability import Document
from urllib import FancyURLopener
class MyOpener(FancyURLopener):
version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11)'
myopener = MyOpener()
request = myopener.open(sys.argv[1])
html = request.read()
readable_article = Document(html).summary()
readable_title = Document(html).short_title()
print readable_article #If I use readable_title then it's getting printed in php
The problem is that readable_article
is not captured by php. But when I use command line both readable_article
and readable_title
are printed out.
What can be the issue? I tried with exec(), system()
and still no luck.
There can be multiple issues:
You could use popen instead of shell_exec, to make debugging easier. That results in numeric return value, stdout and stderr.
I finally found the issue. The problem was that the python script was giving out the error
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2014' in position 3102: ordinal not in range(128)
I fixed the issue by using either one of the below
readable_article = Document(html).summary().encode('utf8')
readable_article = Document(html).summary().encode('ascii', 'replace')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.