简体   繁体   English

如何用PHP从HTML字符串中仅提取文本?

[英]How to extract only text from HTML string with PHP?

I want to extract only text from a php string. 我想只从php字符串中提取文本。

This php string contains html code like tags or etc. 这个php字符串包含html代码,如标签等。

So I only need a simple text from this string. 所以我只需要这个字符串中的简单文本。

This is the actual string: 这是实际的字符串:

<div class="devblog-index-content battlelog-wordpress">
<p><strong>The celebration of the Recon class in our second </strong><a href="http://blogs.battlefield.com/2014/10/bf4-class-week-recon/" target="_blank">BF4 Class Week</a><strong> continues with a sneaky stroll down memory lane. Learn more about how the Recon has changed in appearance, name and weaponry over the years&hellip;</strong></p>

<p>&nbsp;</p>

<p style="text-align:center"><a href="http://eaassets-a.akamaihd.net/battlelog/prod/954660ddbe53df808c23a0ba948e7971/en_US/blog/wp-content/uploads/2014/10/bf4-history-of-recon-1.jpg?v=1412871863.37"><img alt="bf4-history-of-recon-1" class="aligncenter" src="http://eaassets-a.akamaihd.net/battlelog/prod/954660ddbe53df808c23a0ba948e7971/en_US/blog/wp-content/uploads/2014/10/bf4-history-of-recon-1.jpg?v=1412871863.37" style="width:619px" /></a></p>

I want to show this from the string: 我想从字符串中显示:

The celebration of the Recon class in our second BF4 Class Week continues with a sneaky stroll down memory lane. Learn more about how the Recon has changed in appearance, name and weaponry over the years…

Actually this text will be placed in meta description tag so I don't need any HTML in meta tag. 实际上这个文本将放在元描述标签中,所以我不需要元标记中的任何HTML。 How can I perform this? 我怎么能这样做? Any ideas and thoughts about this technique ? 关于这种技术的任何想法和想法?

You may try: 你可以尝试:

echo(strip_tags($your_string));

More info here: http://php.net/manual/en/function.strip-tags.php 更多信息: http//php.net/manual/en/function.strip-tags.php

Another option is to use Html2Text. 另一种选择是使用Html2Text。 It will do a much better job than strip_tags, especially if you want to parse complicated HTML code. 它会比strip_tags做得好得多,特别是如果你想解析复杂的HTML代码。

Extracting text from HTML is tricky, so your best bet is to use a library built for this purpose. 从HTML中提取文本很棘手,因此最好的办法是使用为此目的而构建的库。

https://github.com/mtibben/html2text https://github.com/mtibben/html2text

Install using composer: 使用composer安装:

composer require html2text/html2text

Basic usage: 基本用法:

$html = new \Html2Text\Html2Text('Hello, &quot;<b>world</b>&quot;');

echo $html->getText();  // Hello, "WORLD"

Adding another option for someone else who may need this, the Stringizer library might be an option, see Strip Tags . 为可能需要此功能的其他人添加另一个选项, Stringizer库可能是一个选项,请参阅Strip Tags

Full disclosure I'm the owner of the project. 完全披露我是项目的所有者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM