从网页复制文本

Question

Let's say we have a website speedywap.com假设我们有一个网站 speedywap.com

When I open the website in my browser and then I copy the page to the clipboard and when I paste it in my notepad (windows) only text remains.当我在浏览器中打开网站，然后将页面复制到剪贴板时，当我将其粘贴到记事本（Windows）中时，只剩下文本。 All the code is removed except for the text that was in links etc (ie displayed on the screen).除了链接等中的文本（即显示在屏幕上）之外，所有代码都被删除。

I want to do something similar with php because I am trying to create a keyword density analyser.我想用 php 做一些类似的事情，因为我正在尝试创建一个关键字密度分析器。 So I want something that is able to just keep the text from a webpage that is displayed on the screen.所以我想要一些能够只保留屏幕上显示的网页中的文本的东西。

My server is running apache, php, centos and mysql我的服务器正在运行 apache、php、centos 和 mysql

Answer 1

<?php
$content = file_get_contents('http://speedywap.com');
echo $content;
?>

you can use strip_tags to strip tags from it then you are just left with text.您可以使用 strip_tags 从中删除标签，然后只剩下文本。

Answer 2

For a very naïve start, you can use this:对于一个非常天真的开始，你可以使用这个：

<?php

echo strip_tags(file_get_contents('http://speedywap.com'));

?>

Answer 3

function curl($url){
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL,$url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
    return curl_exec($ch);
    curl_close ($ch);
}

$html = curl('http://speedywap.com');

cURL is many times faster then fgc. cURL 比 fgc 快很多倍。 You can use strip_tags but that doesnt guarantee anything, only way is to manually parse the page, using str_replace, preg_replace etc.您可以使用 strip_tags 但这并不能保证任何事情，唯一的方法是手动解析页面，使用 str_replace、preg_replace 等。

This is what you get using strip_tags : http://pokit.etf.ba/get/47a07bd62ea42dd3d447f060c01ccfb5.png这是你使用 strip_tags 得到的： http : //pokit.etf.ba/get/47a07bd62ea42dd3d447f060c01ccfb5.png

Answer 4

在此开发您的代码 ->http://www.barattalo.it/2010/03/01/php-curl-bot-to-update-facebook-status/

Answer 5

Use file_get_contents or curl if you want to get fancy.如果你想花哨，请使用 file_get_contents 或 curl。

<?php
$content = file_get_contents('http://speedywap.com');
echo $content; // or analyze, or whatever

Answer 6

You can use file_get_contents('http://www.speedywap.com/');您可以使用file_get_contents('http://www.speedywap.com/'); to get the page source and then use some filters/regular expressions to get the text you need.获取页面源，然后使用一些过滤器/正则表达式来获取您需要的文本。

Answer 7

您还可以使用strip_tags ： http : //php.net/manual/en/function.strip-tags.php

Answer 8

strip_tags will not remove or replace things like the HTML space (   ), £ strip_tags不会删除或替换诸如 HTML 空间 (   )、 £ , – , – , etc. from the content like you need, as you say, a browser copy ( Ctrl + A , Ctrl + C ) and paste into notepad.等从您需要的内容中提取，如您所说，浏览器复制（ Ctrl + A ， Ctrl + C ）并粘贴到记事本中。 You will have to write specific code to replace each one like:您必须编写特定的代码来替换每个代码，例如：

str_replace('& nbsp;',' ',$mytext); 
str_replace('& ndash;','-',$mytext);

etc. to handle these.等等来处理这些。 I needed to convert content created by users within TinyMCE , which allows formatting text, to plain text for a client.我需要将用户在TinyMCE创建的内容（允许格式化文本）转换为客户端的纯文本。 A PHP command that goes beyond strip_tags to do this would be great to have but I can't find one.一个超越strip_tags来执行此操作的 PHP 命令会很棒，但我找不到。

从网页复制文本

问题描述

8 个解决方案

解决方案1
5 已采纳 2011-03-01 21:14:32

解决方案2
2 2010-12-27 21:42:56

解决方案3
1 2010-12-27 21:41:03

解决方案4
1 2011-02-01 21:10:17

解决方案5
0 2010-12-27 21:37:00

解决方案6
0 2010-12-27 21:38:55

解决方案7
0 2010-12-27 21:42:20

解决方案8
0 2020-03-18 03:34:57

从网页复制文本

问题描述

8 个解决方案

解决方案1 5 已采纳 2011-03-01 21:14:32

解决方案2 2 2010-12-27 21:42:56

解决方案3 1 2010-12-27 21:41:03

解决方案4 1 2011-02-01 21:10:17

解决方案5 0 2010-12-27 21:37:00

解决方案6 0 2010-12-27 21:38:55

解决方案7 0 2010-12-27 21:42:20

解决方案8 0 2020-03-18 03:34:57

解决方案1
5 已采纳 2011-03-01 21:14:32

解决方案2
2 2010-12-27 21:42:56

解决方案3
1 2010-12-27 21:41:03

解决方案4
1 2011-02-01 21:10:17

解决方案5
0 2010-12-27 21:37:00

解决方案6
0 2010-12-27 21:38:55

解决方案7
0 2010-12-27 21:42:20

解决方案8
0 2020-03-18 03:34:57