PHP从网页编辑文本

Question

At the moment I have this: 目前，我有这个：

<?php
$stran = file_get_contents("http://meteo.arso.gov.si/uploads/probase/www/fproduct/text/sl/fcast_si_text.html");
$stran = str_replace("<h2>","\n",$stran);
$stran = str_replace("</h2>","\n",$stran);
$stran = str_replace("<h1>","\n",$stran);
$stran = str_replace("</h1>","\n",$stran);
$stran = strip_tags($stran);

echo $stran;
?>

Now this gives me some empty lines at the top. 现在，这在顶部给了我一些空行。 I also want to delete every text after "Vir: Državna meteorološka služba RS (meteo.si - ARSO)" including empty lines before this string. 我还想删除“ Vir：DržavnameteorološkaslužbaRS（meteo.si-ARSO）”之后的所有文本，包括在该字符串之前的空行。

I've tried some regular expressions but the all delete all text. 我试过一些正则表达式，但是全部删除所有文本。 Hot do I do it? 我热吗？

Answer 1

Can be done using regex. 可以使用正则表达式来完成。

// Convert h1/h2 opening/closing tags to new line, ignore case
$stran = preg_replace('/<\/?h[12]>/i', "\n", $stran);

$stran = strip_tags($stran);

// Remove all leading whitespace
$stran = preg_replace('/^\s+/', '', $stran);

// Remove everything after "Vir: ..."
$stran = preg_replace('/(?<=Vir: Državna meteorološka služba RS \(meteo.si - ARSO\)).*/s', '', $stran);

Generally speaking I would recommend to really parse the html to extract the information. 一般来说，我建议您真正解析html以提取信息。 Have a look at http://php.net/manual/en/class.domdocument.php 看看http://php.net/manual/en/class.domdocument.php

PHP从网页编辑文本

问题描述

1 个解决方案

解决方案1
1 2016-03-17 17:00:18

PHP从网页编辑文本

问题描述

1 个解决方案

解决方案1 1 2016-03-17 17:00:18

解决方案1
1 2016-03-17 17:00:18