简体   繁体   English

使用美丽的汤选择一系列的项目

[英]selecting a range of line items using beautiful soup

I am trying to scrape images off a website that has a list over 2000 images long. 我试图从一个列表超过2000张图片的网站上删除图像。 When I call the section of the site that links to the images, only ~1/2 of the info is displayed in the console, because too many lines are printed. 当我调用链接到图像的站点部分时,控制台中只显示约1/2的信息,因为打印的行太多。 I need to see the beginning info that is cut off and I'd like to display only some of the entries. 我需要看到切断的开始信息,我只想显示一些条目。 How do I show only a range (ex,from 1-10) of the 2000+ entries in the line items? 如何仅显示订单项中2000多个条目的范围(例如,1-10)?

I am using this: 我用这个:

containers = page_soup.findAll("div", {"class": "image_list"})
containers[0]

You have various possibilities here: 你有各种各样的可能性:

1. Do it inside your script 1.在你的脚本里面做

This will print first 10 containers: 这将打印前10个容器:

containers = page_soup.findAll("div", {"class": "image_list"})[0:10]
for c in containers:
    print(c)

2. Do it in the shell 2.在shell中做

You can print all all your containers and filter only lines you want to see with shell commands. 您可以打印所有容器并仅过滤要使用shell命令查看的行。 That way you have flexibility to change your output without editing your code again and again. 这样,您可以灵活地更改输出,而无需一次又一次地编辑代码。

Inside your script we will print all containers: 在你的脚本里面我们将打印所有容器:

containers = page_soup.findAll("div", {"class": "image_list"})
for c in containers:
    print(c)

In the shell: 在shell中:

This will print first 10 lines: 这将打印前10行:

python name_of_my_script.py | head

This will print lines 5 to 10: 这将打印第5到10行:

python name_of_my_script.py | sed -n '5,10p'

This will print last 10 lines: 这将打印最后10行:

 python name_of_my_script.py | tail

Look for manual pages for additional information. 查找手册页以获取更多信息。

Use indexing: 使用索引:

containers = page_soup.findAll("div", {"class": "image_list"})[0:10]

This will make containers have the first 10 elements (so you'll print 10 elements). 这将使容器具有前10个元素(因此您将打印10个元素)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM