[英]PHP edit text from webpage
At the moment I have this: 目前,我有这个:
<?php
$stran = file_get_contents("http://meteo.arso.gov.si/uploads/probase/www/fproduct/text/sl/fcast_si_text.html");
$stran = str_replace("<h2>","\n",$stran);
$stran = str_replace("</h2>","\n",$stran);
$stran = str_replace("<h1>","\n",$stran);
$stran = str_replace("</h1>","\n",$stran);
$stran = strip_tags($stran);
echo $stran;
?>
Now this gives me some empty lines at the top. 现在,这在顶部给了我一些空行。 I also want to delete every text after "Vir: Državna meteorološka služba RS (meteo.si - ARSO)" including empty lines before this string.
我还想删除“ Vir:DržavnameteorološkaslužbaRS(meteo.si-ARSO)”之后的所有文本,包括在该字符串之前的空行。
I've tried some regular expressions but the all delete all text. 我试过一些正则表达式,但是全部删除所有文本。 Hot do I do it?
我热吗?
Can be done using regex. 可以使用正则表达式来完成。
// Convert h1/h2 opening/closing tags to new line, ignore case
$stran = preg_replace('/<\/?h[12]>/i', "\n", $stran);
$stran = strip_tags($stran);
// Remove all leading whitespace
$stran = preg_replace('/^\s+/', '', $stran);
// Remove everything after "Vir: ..."
$stran = preg_replace('/(?<=Vir: Državna meteorološka služba RS \(meteo.si - ARSO\)).*/s', '', $stran);
Generally speaking I would recommend to really parse the html to extract the information. 一般来说,我建议您真正解析html以提取信息。 Have a look at http://php.net/manual/en/class.domdocument.php
看看http://php.net/manual/en/class.domdocument.php
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.