简体   繁体   中英

Python Logical Operation

I'm pretty new to python and I'm working on a web scraping project using the Scrapy library. I'm not using the built in domain restriction because I want to check if any of the links to pages outside the domain are dead. However, I still want to treat pages within the domain differently from those outside it and am trying to manually determine if a site is within the domain before parsing the response.

Response URL:

http://www.siteSection1.domainName.com

If Statement:

if 'domainName.com' and ('siteSection1' or 'siteSection2' or 'siteSection3') in response.url:
    parsePageInDomain()

The above statement is true (the page is parsed) if 'siteSection1' is the first to appear in the list of or's but it will not parse the page if the response url is the same but the if statement were the following:

if 'domainName.com' and ('siteSection2' or 'siteSection1' or 'siteSection3') in response.url:
        parsePageInDomain()

What am I doing wrong here? I haven't been able to think through what is going on with the logical operators very clearly and any guidance would be greatly appreciated. Thanks!

or doesn't work that way. Try any :

if 'domainName.com' in response.url and any(name in response.url for name in ('siteSection1', 'siteSection2', 'siteSection3')):

What's going on here is that or returns a logical or of its two arguments - x or y returns x if x evaluates to True , which for a string means it's not empty, or y if x does not evaluate to True . So ('siteSection1' or 'siteSection2' or 'siteSection3') evaluates to 'siteSection1' because 'siteSection1' is True when considered as a boolean.

Moreover, you're also using and to combine your criteria. and returns its first argument if that argument evaluates to False , or its second if the first argument evaluates to True . Therefore, if x and y in z does not test to see whether both x and y are in z . in has higher precedence than and - and I had to look that up - so that tests if x and (y in z) . Again, domainName.com evaluates as True, so this will return just y in z .

any , conversely, is a built in function that takes an iterable of booleans and returns True or False - True if any of them are True , False otherwise. It stops its work as soon as it hits a True value, so it's efficient. I'm using a generator expression to tell it to keep checking your three different possible strings to see if any of them are in your response url.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM