简体   繁体   中英

checking if any of multiple substrings is contained in a string - Python

I have a black list that contains banned substrings: I need to make an if statement that checks if ANY of the banned substrings are contained in given url. If it doesn't contain any of them, I want it to do A (and do it only once if any banned is present, not for each banned substring). If url contains one of the banned substrings I want it to do B.

black_list = ['linkedin.com', 'yellowpages.com', 'facebook.com', 'bizapedia.com', 'manta.com',
              'yelp.com', 'nextdoor.com', 'industrynet.com', 'twitter.com', 'zoominfo.com', 
              'google.com', 'yellow-listings.com', 'kompass.com', 'dnb.com', 'tripadvisor.com']

here are just two simple examples of urls that I'm using to check if it works. Url1 have banned substring inside, while url2 doesn't.

url1 = 'https://www.dnb.com/'
url2 = 'https://www.ok/'

I tried the code below that works but was wandering if there is better way (more computationally efficient) of doing it? I have a data frame of 100k+ urls so worried that this will be super slow.

mask = []
for banned in black_list:
    if banned in url:
        mask.append(True)
    else:
        mask.append(False)

if any(mask):
    print("there is a banned substring inside")
else:
    print("no banned substrings inside")      

Does anybody knows more efficient way of doing this?

Here is a possible one-line solution:

print('there is a banned substring inside'
      if any(banned_str in url for banned_str in black_list)
      else 'no banned substrings inside')

If you prefer a less pythonic approach:

if any(banned_str in url for banned_str in black_list):
    print('there is a banned substring inside')
else:
    print('no banned substrings inside')

You should add a flag depending on which perform either A or B .

ban_flag = False
for banned in black_list:
    if banned not in url:
        continue
    else:
        ban_flag = True
if ban_flag:
    print("there is a banned substring inside")
else:
    print("no banned substrings inside")

Code:

black_list = ['linkedin.com', 'yellowpages.com', 'facebook.com', 'bizapedia.com', 'manta.com',
              'yelp.com', 'nextdoor.com', 'industrynet.com', 'twitter.com', 'zoominfo.com', 
              'google.com', 'yellow-listings.com', 'kompass.com', 'dnb.com', 'tripadvisor.com']

def is_url_banned(url):
    for banned in black_list:
        if banned in url :
            print("there is a banned substring inside")
            return
    print("no banned substrings inside")

is_url_banned('https://www.dnb.com/')
is_url_banned('https://www.ok/')

Result:

there is a banned substring inside
no banned substrings inside

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM