简体   繁体   English

PHP从网页编辑文本

[英]PHP edit text from webpage

At the moment I have this: 目前,我有这个:

<?php
$stran = file_get_contents("http://meteo.arso.gov.si/uploads/probase/www/fproduct/text/sl/fcast_si_text.html");
$stran = str_replace("<h2>","\n",$stran);
$stran = str_replace("</h2>","\n",$stran);
$stran = str_replace("<h1>","\n",$stran);
$stran = str_replace("</h1>","\n",$stran);
$stran = strip_tags($stran);

echo $stran;
?>

Now this gives me some empty lines at the top. 现在,这在顶部给了我一些空行。 I also want to delete every text after "Vir: Državna meteorološka služba RS (meteo.si - ARSO)" including empty lines before this string. 我还想删除“ Vir:DržavnameteorološkaslužbaRS(meteo.si-ARSO)”之后的所有文本,包括在该字符串之前的空行。

I've tried some regular expressions but the all delete all text. 我试过一些正则表达式,但是全部删除所有文本。 Hot do I do it? 我热吗?

Can be done using regex. 可以使用正则表达式来完成。

// Convert h1/h2 opening/closing tags to new line, ignore case
$stran = preg_replace('/<\/?h[12]>/i', "\n", $stran);

$stran = strip_tags($stran);

// Remove all leading whitespace
$stran = preg_replace('/^\s+/', '', $stran);

// Remove everything after "Vir: ..."
$stran = preg_replace('/(?<=Vir: Državna meteorološka služba RS \(meteo.si - ARSO\)).*/s', '', $stran);    

Generally speaking I would recommend to really parse the html to extract the information. 一般来说,我建议您真正解析html以提取信息。 Have a look at http://php.net/manual/en/class.domdocument.php 看看http://php.net/manual/en/class.domdocument.php

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM