[英]Remove from list in python 3
我有一個從網站上刪除的列表。 我想刪除網站各個頁面的錨點鏈接,例如'/ about /'。 有很多。 有沒有一種方法可以構建一個查看文本的代碼,如果是“http”(不僅僅是https,就像下面的數據一樣,因為如果“s”不在那里)是在文本然后它會將其添加到列表中? 我的列表數據是這樣的:
['mailto:info@yourdomain.com', 'https://www.demodms.com/annuity/', 'https://www.demodms.com/annuity/', 'https://www.demodms.com/annuity/about/', 'https://www.demodms.com/annuity/services/', 'https://www.demodms.com/annuity/educational-courses/', 'https://www.demodms.com/annuity/events/', 'https://www.demodms.com/annuity/articles-and-downloads/', 'https://www.demodms.com/annuity/videos/', 'https://www.demodms.com/annuity/calculators/', 'https://www.demodms.com/annuity/news/', 'https://www.demodms.com/annuity/contact/', 'https://www.demodms.com/annuity/', 'https://www.demodms.com/annuity/about/', 'https://www.demodms.com/annuity/services/', 'https://www.demodms.com/annuity/educational-courses/', 'https://www.demodms.com/annuity/events/', 'https://www.demodms.com/annuity/articles-and-downloads/', 'https://www.demodms.com/annuity/videos/', 'https://www.demodms.com/annuity/calculators/', 'https://www.demodms.com/annuity/news/', 'https://www.demodms.com/annuity/contact/', '/events/', 'https://www.demodms.com/annuity/tips-for-back-to-school-season/', 'https://www.demodms.com/annuity/tips-for-back-to-school-season/', 'https://www.demodms.com/annuity/5-things-to-know-about-getting-life-insurance-for-your-child/', 'https://www.demodms.com/annuity/5-things-to-know-about-getting-life-insurance-for-your-child/', 'https://www.demodms.com/annuity/5-signs-you-need-to-up-your-life-insurance-coverage/', 'https://www.demodms.com/annuity/5-signs-you-need-to-up-your-life-insurance-coverage/', 'https://www.demodms.com/annuity/tips-for-summer-travel/', 'https://www.demodms.com/annuity/tips-for-summer-travel/', 'mailto:Info@yourdomain.com', '/about/', '/events/', '/news/', '/contact/', 'https://youtechassociates.com/', '/privacy-policy', '/terms-of-use', '/disclosure/']
您可以使用帶有正則表達式的list-comprehension來過濾掉不包含協議的鏈接:
[link for link in links if re.match('https?\:\/\/', link)]
贈送:
['https://www.demodms.com/annuity/', 'https://www.demodms.com/annuity/', 'https://www.demodms.com/annuity/about/', 'https://www.demodms.com/annuity/services/', 'https://www.demodms.com/annuity/educational-courses/', 'https://www.demodms.com/annuity/events/', 'https://www.demodms.com/annuity/articles-and-downloads/', 'https://www.demodms.com/annuity/videos/', 'https://www.demodms.com/annuity/calculators/', 'https://www.demodms.com/annuity/news/', 'https://www.demodms.com/annuity/contact/', 'https://www.demodms.com/annuity/', 'https://www.demodms.com/annuity/about/', 'https://www.demodms.com/annuity/services/', 'https://www.demodms.com/annuity/educational-courses/', 'https://www.demodms.com/annuity/events/', 'https://www.demodms.com/annuity/articles-and-downloads/', 'https://www.demodms.com/annuity/videos/', 'https://www.demodms.com/annuity/calculators/', 'https://www.demodms.com/annuity/news/', 'https://www.demodms.com/annuity/contact/', 'https://www.demodms.com/annuity/tips-for-back-to-school-season/', 'https://www.demodms.com/annuity/tips-for-back-to-school-season/', 'https://www.demodms.com/annuity/5-things-to-know-about-getting-life-insurance-for-your-child/', 'https://www.demodms.com/annuity/5-things-to-know-about-getting-life-insurance-for-your-child/', 'https://www.demodms.com/annuity/5-signs-you-need-to-up-your-life-insurance-coverage/', 'https://www.demodms.com/annuity/5-signs-you-need-to-up-your-life-insurance-coverage/', 'https://www.demodms.com/annuity/tips-for-summer-travel/', 'https://www.demodms.com/annuity/tips-for-summer-travel/', 'https://youtechassociates.com/']
您可以使用過濾器來獲得此結果
a = ['mailto:info@yourdomain.com', 'https://www.demodms.com/annuity/', 'https://www.demodms.com/annuity/', 'https://www.demodms.com/annuity/about/', 'https://www.demodms.com/annuity/services/', 'https://www.demodms.com/annuity/educational-courses/', 'https://www.demodms.com/annuity/events/', 'https://www.demodms.com/annuity/articles-and-downloads/', 'https://www.demodms.com/annuity/videos/', 'https://www.demodms.com/annuity/calculators/', 'https://www.demodms.com/annuity/news/', 'https://www.demodms.com/annuity/contact/', 'https://www.demodms.com/annuity/', 'https://www.demodms.com/annuity/about/', 'https://www.demodms.com/annuity/services/', 'https://www.demodms.com/annuity/educational-courses/', 'https://www.demodms.com/annuity/events/', 'https://www.demodms.com/annuity/articles-and-downloads/', 'https://www.demodms.com/annuity/videos/', 'https://www.demodms.com/annuity/calculators/', 'https://www.demodms.com/annuity/news/', 'https://www.demodms.com/annuity/contact/', '/events/', 'https://www.demodms.com/annuity/tips-for-back-to-school-season/', 'https://www.demodms.com/annuity/tips-for-back-to-school-season/', 'https://www.demodms.com/annuity/5-things-to-know-about-getting-life-insurance-for-your-child/', 'https://www.demodms.com/annuity/5-things-to-know-about-getting-life-insurance-for-your-child/', 'https://www.demodms.com/annuity/5-signs-you-need-to-up-your-life-insurance-coverage/', 'https://www.demodms.com/annuity/5-signs-you-need-to-up-your-life-insurance-coverage/', 'https://www.demodms.com/annuity/tips-for-summer-travel/', 'https://www.demodms.com/annuity/tips-for-summer-travel/', 'mailto:Info@yourdomain.com', '/about/', '/events/', '/news/', '/contact/', 'https://youtechassociates.com/', '/privacy-policy', '/terms-of-use', '/disclosure/']
b = filter(lambda x: 'http' not in x, a)
print(list(b))
輸出:
['mailto:info@yourdomain.com','/ events /','mailto:Info@yourdomain.com','/ about /','/ events /','/ news /','/ contact /' ,'/ privacy-policy','/ terms-of-use','/ disclosure /']
這是一個簡單的方法:
mlist = your-list-as-specified-above
newlist = []
for m in mlist:
if m.startswith('http'):
newlist.append(m)
我會使用list comprehension和startswith()
:
full_links = [link for link in links if link.startswith('http://') or link.startswith('https://')]
當你有這么簡單的任務時,我認為這比正則表達式更清晰。 另外,IMO你應該明確地要求http://
和https://
,因為如果遇到像http_stuff/foo.html
這樣的相對鏈接,只使用http
可能會給你誤報。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.