简体   繁体   English

如何从源文件中提取HTML元素

[英]How to extract HTML element from a source file

I need to replace a HTML section identified by a tag id in a source code, which is combination of HTML and PHP using PHP. 我需要替换由源代码中的标签ID标识的HTML部分,该部分是使用PHP的HTML和PHP的组合。 In case it's pure HTML, DOM parser could be used; 如果它是纯HTML,则可以使用DOM解析器。 in case there is no DIV in DIV, I can imagine how to use preg_match. 如果DIV中没有DIV,我可以想象如何使用preg_match。 This is what I am trying to do - I have a code (loaded into a string) like: 这就是我想要做的-我有一个代码(加载到字符串中),例如:

<div>
  <img >
</div>

<? include(); ?>

<div id="mydiv">
   <div>
      <div>
        <img >
      </div>
   </div>
</div>

and my task is to replace content of "mydiv" DIV with a new one eg 我的任务是用新的内容替换“ mydiv” DIV的内容

<div id="newdiv>
  some text
</div>

so the string will look like this after the change: 因此更改后字符串将如下所示:

<div>
  <img >
</div>

<? include(); ?>

<div id="mydiv">
  <div id="newdiv>
    some text
  </div>
</div>

I have already tried: 我已经尝试过:

1) parsing the code using DOMdocument's loadHTML => it produces a lot of errors in case PHP code is included. 1)使用DOMdocument的loadHTML =>解析代码,如果包含PHP代码,则会产生很多错误。

2) I played around a bit with regexes like preg_match_all('/<div id="myid"([^<]*)<\\/div>/', $src, $matches) , which fails in case more child divs are included. 2)我玩过preg_match_all('/<div id="myid"([^<]*)<\\/div>/', $src, $matches)等正则表达式,如果更多子div失败,则失败被包含在内。

The best approach I have found so far is: 到目前为止,我发现的最佳方法是:

1) find id="mydiv" string 1)查找id="mydiv"字符串

2) search for '<' and '>' chars and count them like '<'=1 and '>'=-1 (not exactly, but it gives the idea) 2)搜索'<'和'>'字符,并像'<'= 1和'>'=-1一样对它们进行计数(不完全是,但这给出了主意)

3) once I get sum == 0 I should be on position of the closing tag, so I know, which portion string I should exchange 3)一旦我得到sum == 0我应该在结束标签的位置,所以我知道应该交换哪一部分字符串

This is quite "heavy" solution, which can stop working in some cases, where the code is different (eg onpage PHP code contains the chars as well instead of just simple "include"). 这是一个非常“繁重”的解决方案,在某些情况下,在代码不同的情况下,它可能会停止工作(例如,页面PHP代码中也包含字符,而不仅仅是简单的“ include”)。 So I am looking so some better solution. 所以我正在寻找更好的解决方案。

You could try something like this: 您可以尝试这样的事情:

$file = 'filename.php';
$content = file_get_contents($file);
$array_one = explode( '<div id="mydiv">' , $content );
$my_div_content = explode("</div>" , $array_one[1] )[0];

Or use preg_match like you said: 或像你说的那样使用preg_match:

preg_match('/<div id="mydiv"(.*?)<\/div>/s', $content, $matches)

Yes there is. 就在这里。 First you need to use a function that will get the content of the file. 首先,您需要使用一个将获取文件内容的函数。 Lets call the file homepage.php: 让我们调用文件homepage.php:

$homepageString = file_get_contents('homepage.php');

Now you have a string with all the content. 现在您有了一个包含所有内容的字符串。 The next thing you would do is use the preg_replace() function to take out the part of code that you want to take out: 接下来要做的是使用preg_replace()函数来提取要提取的代码部分:

$newHomepageString = preg_replace('/id="mydiv"/',"", $homepageString);

Now you overwrite the existing homepage.php file with the new source code: 现在,您将使用新的源代码覆盖现有的homepage.php文件:

file_put_contents("homepage.php", $newHomepageString);

Let me know if it worked for you! 让我知道它是否对您有用! :) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM