简体   繁体   中英

How to read sitemap url text from robots.txt file

I want to read the text of robots.txt file(www.abcd.com/robots.txt) which contains the sitemap url from my C# application. I have to use if else to generate Alerts if sitemap url present in robots.txt file then it displays yes and it does not contain sitemap url then it will display no.

Robots.txt file text look like this:

# Crawlers Setup
User-agent: *
Disallow:
Crawl-delay: 10

# Website Sitemap
Sitemap: http://www.abcd.com/sitemap.xml

How can I read this sitemap text from robots.txt file as robots.txt is also a link not actually a text file. It is www.abcd.com/robots.txt

You can use a library like RobotsTxt (disclaimer: project owner here). Example:

string contentsOfRobotsTxtFile = new WebClient().DownloadString("uri");
Robots robots = Robots.Load(content);
var sitemaps = robots.Sitemaps;

It's available on Nuget as well ; http://www.nuget.org/packages/RobotsTxt/

你刚刚读了这个文件,就像这样:

string contentOfRobotTxt= new WebClient().DownloadString("http://www.abcd.com/robots.txt");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM