简体   繁体   中英

C# extracting html only

Basically i have a webpage with embedded css and JavaScript, so what i want to do is extract only the HTML itself, from texts to tables , images and what not.

So far i have the whole web page stored into a string called "html" the contents of this page is just the facebook hompepage for example,but as you will see there's all scripts and other embedded stuff which i don't want to have.

   HTMLEdit = //webpage I chose to store in here//
   string html = HTMLEdit.DocumentText;
   String result = "this i want to only contain the <head>,<body>,<foot>."

I am only interested in displaying the result witch only contains html, i don't want the JavaScript or css or any other stuff

I have looked at the agility pack but there's no documentation on there website to do this and this is my first ever c# project i have decided to make, so excuse my ignorance if i don't make sense.

See this question HTML Agility Pack strip tags NOT IN whitelist

Maybe adapt that answer, and drop link and script tags.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM