简体   繁体   English

PHP - 从HTML中提取文本

[英]PHP - Extracting text from HTML

I have a long string of HTML that contains 我有一长串HTML包含

<p>
<img>
<span> 

and a bunch of other tags. 和一堆其他标签。

Is there anyway of extracting ONLY the text within the tags from this string? 无论如何只从该字符串中提取标签内的文本?

If you want to extract all text within any tags, the simple way is to strip the tags: strip_tags() 如果要提取任何标记中的所有文本,最简单的方法是去除标记: strip_tags()

If you want to remove specific tags, maybe this SO questions helps. 如果你想删除特定的标签,也许这个问题有帮助。

I know I'll be getting a lot of bashing for this, but for a simple task like this I'd use regular expressions. 我知道我会为此付出很多抨击,但对于像这样的简单任务,我会使用正则表达式。

preg_match_all('~(<span>(.*?)</span>)~', $html, $matches);

$matches[0] will contain all the span tags and their contents, $matches[1] contains only the contents. $matches[0]将包含所有span标签及其内容, $matches[1]仅包含内容。

For more complicated stuff you might want to take a look at PHP Simple HTML DOM Parser or similar: 对于更复杂的东西,您可能需要查看PHP Simple HTML DOM Parser或类似的东西:

// Create DOM from URL or file
$html = str_get_html($html);

// Find all images
foreach($html->find('img') as $element) {
   echo $element->src . '<br>';
}

Etc. 等等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM