简体   繁体   English

linux中是否有一种简单的方法可以从命令行中删除文本网站?

[英]Is there a simple way in linux to strip a website of text from command line?

I've been searching for a command line tool that would turn html code into just the text that would appear on the site... so it would be equivalent to in a web browser selecting everything and then pasting it into a text editor... 我一直在寻找一个命令行工具,它可以将html代码转换为网站上显示的文本......所以它等同于在Web浏览器中选择所有内容然后将其粘贴到文本编辑器中。 。

Anyone know of something in Ubuntu that would do this? 任何人都知道Ubuntu会做这件事吗? I'm trying to write a script to parse some webpages, but would prefer not to have to deal with the HTML and would prefer to just parse the text that appears on the website. 我正在尝试编写一个脚本来解析一些网页,但是他们不想处理HTML,而只是想解析网站上出现的文本。

Thanks, 谢谢,

Dan

lynx -dump http://example.com/

if you already have the html file: 如果你已经有html文件:

lynx -dump file.html > file.txt

otherwise use @Ignacio's 否则使用@ Ignacio's

我想你需要lynx:

lynx -dump http://stackoverflow.com > file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM