简体   繁体   中英

Facebook debugger won't scrape my site

I'm creating the site http://Meer.li and when I run it through facebook debugger - http://developers.facebook.com/tools/debug/og/object?q=meer.li - it can't find my meta-tags.

When I look at the source of what facebook scrapes, it shows a stripped down version of my site, where it has changed the doc-type and there's no meta tags - http://developers.facebook.com/tools/debug/og/echo?q=http%3A%2F%2Fmeer.li%2F .

What am I doing wrong here?

I'm running rails 3.2, ruby 1.9.3 and the whole thing is running on Heroku with a mongo database.

Edit

It seems that I do have the right accept header in my app... if I do this in the different views:

<%= request.headers["Accept"] %>

I get:

text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Why can we scrape the whole site if we do curl -H and the right headers? Why doesn't facebook scrape my site?

Trying your url in the debugger it says that the response status code is 206 which means "Partial Content".

I tried to curl the url and indeed the response I got is partial, it's does not include the html, head and body tags (or their closing tags), and looks like jsonp response of html wrapped in

$("#designs_content").append

I'm not sure why that happens, maybe your server checks the user agent string of the requests and response according to that?


Edit

I'm not sure if this has anything to do with Heroku, I've never worked with them. Also, I know nothing about rails so I can't help with that.

Wget has nothing to do with this, it's the response that your web server returns based on the headers of the http request. When you make a request using a browser it adds some headers to the request to help the server figure out a few things. You can view the sent headers if you open firebug or the developers tools in chrome (safari, etc), in a networks tab (they all have that) or using a network sniffer.

To make life easier for you, I checked what's the header that causes this problem for you... try this:

curl "http://meer.li/"

And you'll see that the response is of a jsonp and not the entire html page. Now try this:

curl -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" "http://meer.li/"

And you'll get the full html version of your page.

Since facebook, when scrapping your page, does not send the "accept" header the response is not what you see when you view the source using the browser.

I have no idea how you can solves this since it's surely something about your specific setup, but now at least you know what the problem is.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM