简体   繁体   English

根据关键字搜索列表到 append 特定列表内容

[英]Search a list based on key word to append specific list contents

Context语境

I have a list of links I scraped from this site: https://www.ons.gov.uk/economy/economicoutputandproductivity/output/datasets/economicactivityfasterindicatorsuk我有一个从这个网站上抓取的链接列表: https://www.ons.gov.uk/economy/economicoutputandproductivity/output/datasets/economicactivityfasterindicatorsuk

This list of links look like this;此链接列表如下所示;

['https://twitter.com/ONS',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fdecember2019/dataset1.xlsx',
 'https://www.facebook.com/ONS',
 'https://www.ons.gov.uk/peoplepopulationandcommunity/leisureandtourism',
 'https://www.ons.gov.uk/businessindustryandtrade/manufacturingandproductionindustry',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2ffebruary2020roadsdata/roadstables.xlsx',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjuly2019/economicactivityfasterindicatorsukjuly2019dataset.xlsx',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjanuary2020roadsdata/roadstables.xlsx'...

I want to now use Helium/Selenium to go to them and print them out.我现在想对他们使用氦/硒到 go 并将它们打印出来。 Only the list of links has a combination of links I don't need and the excel docs I need to download.只有链接列表包含我不需要的链接和我需要下载的 excel 文档的组合。 I want to be able to append just the links that contain xlsx.我希望能够 append 只是包含 xlsx 的链接。

I tried this solution but it did not work.我尝试了这个解决方案,但没有奏效。 I also tried the .remove function but this is more time consuming.我也尝试了.remove function 但这更耗时。 I also tried to collate a list of the links by slicing them but again this is time consuming.我还尝试通过切片来整理链接列表,但这又很耗时。

Problem问题

Is there any easier way to find a string in the list of links to them allow me to append to a list and loop through them via selenium (I can do the latter, just need help with the append).有没有更简单的方法可以在指向它们的链接列表中找到一个字符串,允许我将 append 转到一个列表并通过 selenium 循环遍历它们(我可以做后者,只需要附加帮助)。

Use a list comprhension.使用列表理解。

linklist = ['https://twitter.com/ONS',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fdecember2019/dataset1.xlsx',
 'https://www.facebook.com/ONS',
 'https://www.ons.gov.uk/peoplepopulationandcommunity/leisureandtourism',
 'https://www.ons.gov.uk/businessindustryandtrade/manufacturingandproductionindustry',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2ffebruary2020roadsdata/roadstables.xlsx',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjuly2019/economicactivityfasterindicatorsukjuly2019dataset.xlsx',
 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjanuary2020roadsdata/roadstables.xlsx']

relevant_links = [link for link in linklist if ".xlsx" in link]

Will output将 output

['https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fdecember2019/dataset1.xlsx', 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2ffebruary2020roadsdata/roadstables.xlsx', 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjuly2019/economicactivityfasterindicatorsukjuly2019dataset.xlsx', 'https://www.ons.gov.uk/file?uri=%2feconomy%2feconomicoutputandproductivity%2foutput%2fdatasets%2feconomicactivityfasterindicatorsuk%2fjanuary2020roadsdata/roadstables.xlsx']

Check the string termination:检查字符串终止:

new_list = [link for link in original_list if link.endswith(".xlsx")]

Then you can open each link in the new_list .然后您可以打开new_list中的每个链接。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM