简体   繁体   中英

Are AJAX sites crawlable by search engines?

I had always assumed that AJAX-driven content was invisible to search engines.

(ie content inserted into the DOM via XMLHTTPRequest)

For example, in this site, the main content is loaded via AJAX request by the browser:

http://www.trustedsource.org/query/terra.cl

...if you view this page with Javascript disabled , the main content area is blank.

However, Google cache shows the full content after the AJAX load:

http://74.125.155.132/search?q=cache:JqcT6EVDHBoJ:www.trustedsource.org/query/terra.cl+http://www.trustedsource.org/query/terra.cl&cd=1&hl=en&ct=clnk&gl=us

So, apparently search engines do index content loaded by AJAX.

Questions:

  • Is this a new feature in search engines? Most postings on the web indicate that you have to publish duplicate static HTML content for search engines to find them.
  • Are there any tricks to get an AJAX-driven content to be crawled by search engines (besides creating duplicate static HTML content).
  • Will the AJAX-driven content be indexed if it is loaded from a separate subdomain? How about a separate domain?

Following this guide from Google, AJAX sites may be made crawlable:

http://code.google.com/intl/sv-SE/web/ajaxcrawling/docs/getting-started.html

AJAX-driven are not crawled by search engines (or at least, not by Google).

The reason you can see the page in the google cache is because in the cache, there is the full page, including .js file. So when you see the page, your browser use the google cached .js file.

I don't think there is any trick to make it crawled by search engine, except using a static .html.

Edit at April, 27th 2010 : Google published a way to make AJAX crawlable

Google webmaster toolkit might help.

Search engines could run the JavaScript needed to index Ajax content, but it would be difficult and computationally expensive — I'm not aware of any that actually do.

A well written site will, if it uses Ajax, use it according to the principles of progressive enhancement . Any key functionality will still be available without needing to run the JavaScript.

On the other hand, sites which reinvent frames (and don't use progressive enhancement) using JavaScript will suffer from all the usual problems of frames, but trade orphan pages for search engine invisibility.

I have NoScript installed and active. Both links show the same content (+/- the google header bar). Therefore, the Google cache shows only what is statically there.

If you're using something like jQuery tabs, even if you're linking to HTML files within the same directory, it degrades nicely back to normal without the javascript, and the tabs just become likes to the actual pages. It's ugly, but it works. You can also style these versions, too.

Content that gets loaded immediately (say with a secondary HTTP request as in your example after the initial pageload) is usually visible to the search engine crawler.

However, if you have content that beyond this gets loaded via ajax following a user action, eg clicking a tab or button and such, won't be seen or indexed. Those will only be seen or indexed if they have 'real' anchor links.

Google just made their crawlers run Javascript without any developer changes!

http://googlewebmastercentral.blogspot.com/2015/10/deprecating-our-ajax-crawling-scheme.html

They state:

Today, as long as you're not blocking Googlebot from crawling your JavaScript or CSS files, we are generally able to render and understand your web pages like modern browsers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM