简体   繁体   中英

How do I bypass robots.txt AND sitemap.xml in my Varnish configuration?

Google is having a hard time rendering my robots.txt file due to Varnish. When I try to visit the robots.txt file, I get a 503 Service Unavailable page.

I have addressed bypassing my sitemap in the following manner:

# Bypass sitemap
    if (req.url ~ "/sitemap.xml") {
        return (pass);
    }

Is the following the appropriate syntax to bypass both items:

# Bypass sitemap
    if (req.url ~ "/sitemap.xml" || req.url ~ "/robots.txt") {
        return (pass);
    }

The syntax is indeed correct. You could also turn this into a single regex and match the patterns even closer.

Here's an example:

sub vcl_recv {
    if(req.url ~ "^/(sitemap.xml|robots.txt)(\?.*)?$") {
        return(pass);
    }
}

However, the fact that you get an HTTP 503 error means that Varnish cannot successfully fetch the contents from the backend for these requests. In that case it has nothing to do with the VCL code.

As described in https://www.varnish-software.com/developers/tutorials/troubleshooting-varnish/#backend-errors , you can run the following varnishlog command to figure out why these errors are being returned:

sudo varnishlog -g request -q "VCL_call eq 'BACKEND_ERROR'"

You can also tailor the command to match the /sitemap.xml and /robots.txt URLs:

sudo varnishlog -g request -q "ReqUrl ~ '^/(sitemap.xml|robots.txt)(\?.*)?$'"

If you still need help figuring out the varnishlog output, don't hesitate to add the relevant log transactions to your original question and I'll help you figure it out.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM