简体   繁体   中英

getting real link from rss feed link

I am experimenting with scraping certain pages from an RSS feed using curl and php. The page scraping was working fine when I was just using actual links, not links from the rss feeds. However, I realize now that links in rss feeds are usually just redirects to the actual page (at least this is what it seems like). Because now when I scrape a page with the rss link, it doesn't actually get the information I am looking for.

Has anyone encountered this and know of a workaround. Is there anyway to see where the rss link is redirecting to and capturing that value?

I think you might need to use the -L switch to tell it to follow redirects. I'm not sure if you can do this directly from PHP or whether you need to follow this approach http://php.net/manual/en/function.curl-setopt.php#95027 . It is always possible that the site you are scraping blocks by user agent or something as well. Maybe try one of the links in a browser while running Fiddler or similar to see if any redirection is actually taking place.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM