简体   繁体   中英

SyndicationFeed - item summary (RSS description) - extract only text from it

I'm using the SyndicationFeed class to consume some rss feeds for articles. I wonder how to get only the text from the item's Summary field, without the html tags. for example, sometimes (not always) it contains html tags such as: div, img, h, p tags:/a>/div> ,img src='http"

I want to get rid of all tags. Also, I'm not sure it brings the full description within the RSS feed.

Should I use regular expression for this matter? other methods?

XmlReader reader = XmlReader.Create(response.GetResponseStream());

SyndicationFeed feed = SyndicationFeed.Load(reader);

foreach (SyndicationItem item in feed.Items)
{

     string description= item.Summary;  //This contains tags and not only the article text

}

Yeah I suppose regexes are the easiest built-in way to achieve this...

// Get rid of the tags
description = Regex.Replace(description, @"<.+?>", String.Empty);

// Then decode the HTML entities
description = WebUtility.HtmlDecode(description);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM