简体   繁体   中英

How to check if URL contains XML file or not?

I have a project about getting XML files from URL's, scraping them, pulling the data, then processing it. Also, I am creating the URL with user input. But I need to check if the URL contains XML file to scrape. Any ideas how to do that? So basically how to check if URL contains XML file or not?

Ways to know whether GETing a URL will retrieve XML...

Before retrieving the file

  • Have an out-of-band guarantee.
  • Inspect Content-Type HTTP header of response to a HEAD request 1 .

After retrieving the file

Note: Only parsing via a conforming XML parser is guaranteed to provide 100% determination.


1 MIME assignments for XML data :

  • application/xml (RFC 7303, previously RFC 3023)
  • text/xml (RFC 7303, previously RFC 3023)
  • Other MIME assignments used with XML applications.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM