简体   繁体   中英

Ignore public folder files (robots.txt and sitemap.xml) for specific domain

I am running a Rails app on Heroku with a custom domain. Let's call my Heroku app myapp.herokuapp.com and the custom domain www.myapp.com . I have accidentally gotten myapp.herokuapp.com indexed (some 700-3000 indexed pages) by Google causing duplicate content between the two.

I recently discovered this and put a 301 in a before_filter in applications controller like this:

  def forward_from_heroku
    redirect_to "http://www.myapp.com#{request.path}", :status => 301  if request.host.include?('herokuapp')      
  end

This successfully redirects (almost) all traffic from myapp.herokuapp.com to www.myapp.com I have also requested an adress change to myapp.com in Google Webmaster Tools.

This works fine, except for files in the public folder (obviously). The problem is that it still access robots.txt and sitemap.xml, which in turn points to an external sitemap (at AWS). I could see how Google-bot interprets this as there is still content to be browsed (although everything is 301'd) on myapp.herokuapp.com.

What I would like to do is to add code to the app so that if Google access the website through myapp.herokuapp.com they get one sitemap.xml/robots.txt and another if it is accessed through www.myapp.com

How can I code this in my config.rb or elsewhere? Basically, I need to bypass the public folder for myapp.herokuapp.com.

you can constrain routes based on domain:

scope constraints: {host: /^regex-matching-your-domain/} do

then just return a 404 for robots.txt and sitemap.xml within that scope:

scope constraints: {host: /heroku.com$/} do
  get '/robots.txt' => Proc.new { |env|
    [404, {'Content-Type' => 'text/plain'}, ['Not Found']]
  }
end

also: you may consider using canonical urls. it may be a more effective solution for SEO, i'm not sure. https://support.google.com/webmasters/answer/139066?hl=en

This is what I did, it's not elegant but it works. I removed sitemap.xml and robots.txt from public folder and put them in the config folder. Then:

routes.rb

  get '/robots.txt' => 'home#robots'
  get '/sitemap.xml' => 'home#sitemaps'


  def robots 
    unless request.host.eql?('myapp.herokuapp.com')
      robots = File.read(Rails.root + "config/robots.txt")
      render :text => robots, :layout => false, :content_type => "text/plain"    
    end
  end

  def sitemaps
    unless request.host.eql?('myapp.herokuapp.com')
      sitemaps = File.read(Rails.root + "config/sitemap.xml")
      render :text => sitemaps, :layout => false, :content_type => "text/xml"    
    end    
  end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM