[英]how to exclude all title with find?
i have function that get me all the titles from my website i dont want to get the title from some products is this the right way ? 我具有从我的网站获取所有标题的功能,我不想从某些产品获取标题,这是正确的方法吗? i dont want titles from products with the words "OLP NL" or "Arcserve" or "LicSAPk" or "symantec"
我不希望产品的标题为“ OLP NL”或“ Arcserve”或“ LicSAPk”或“ symantec”
def get_title ( u ):
html = requests.get ( u )
bsObj = BeautifulSoup ( html.content, 'xml' )
title = str ( bsObj.title ).replace ( '<title>', '' ).replace ( '</title>',
'' )
if (title.find ( 'Arcserve' ) or title.find ( 'OLP NL' ) or title.find (
'LicSAPk' ) or title.find (
'Symantec' ) is not -1):
return 'null'
else:
return title
if (title != 'null'):
ws1 [ 'B1' ] = title
meta_desc = get_metaDesc ( u )
ws1 [ 'C1' ] = meta_desc
meta_keyWrds = get_metaKeyWrds ( u )
ws1 [ 'D1' ] = meta_keyWrds
print ( "writing product no." + str ( i ) )
else:
print("skipped product no. " + str ( i ))
continue;
the problem is that the program exclude all my products and all i'm seeing is "skipped product no." 问题在于该程序排除了我所有的产品,而我所看到的只是“跳过的产品编号”。 ?
? whay ?
ay? not all of them have these words ...
并非所有人都有这些话...
You can change the if statement for (title.find ( 'Arcserve' )!=-1 or title.find ( 'OLP NL' )!=-1 or title.find ('LicSAPk' )!=-1 or title.find ('Symantec' )!=-1)
or you can create a function to evaluate the terms that you want to find 您可以更改
(title.find ( 'Arcserve' )!=-1 or title.find ( 'OLP NL' )!=-1 or title.find ('LicSAPk' )!=-1 or title.find ('Symantec' )!=-1)
,也可以创建一个函数来评估要查找的术语
def TermFind(Title):
terms=['Arcserve','OLP NL','LicSAPk','Symantec']
disc=False
for val in terms:
if Title.find(val)!=-1:
disc=True
break
return disc
When I used the if statement always returned True regardless of the title value. 当我使用if语句时,无论标题值如何,始终返回True。 I couldn't find an explanation for such behavior, but you can try checking this [ Python != operation vs "is not" and [ nested "and/or" if statements .
我找不到这种行为的解释,但是您可以尝试检查此[ Python!=操作vs“不是”和[ 嵌套“和/或” if语句 。 Hope it helps.
希望能帮助到你。
A similar idea using any
使用
any
类似的想法
import requests
from bs4 import BeautifulSoup
url = 'https://www.cdsoft.co.il/index.php?id_product=300610&controller=product'
html = requests.get(url)
bsObj = BeautifulSoup(html.content, 'lxml')
title = str ( bsObj.title ).replace ( '<title>', '' ).replace ( '</title>', '' )
items = ['Arcserve','OLP NL','LicSAPk','Symantec']
if not any(item in title for item in items):
print(title)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.