嵌套 For 循环 - 美丽的汤文本

Question

I am trying to scrape company names from multiple pages on a site.我正在尝试从网站上的多个页面中抓取公司名称。 I am using a for loop to move through each page and find the company name.我正在使用 for 循环浏览每个页面并找到公司名称。

### CREATING LOOP TO GO THROUGH PAGES ###

results = [] #variable to store loop results
for i in range (4): #goes through 4 pages (0-3)
    url = 'https://clutch.co/it-services/msp?page={}'.format(i) #passes the number inside range through the {}
    session = HTMLSession() 
    resp = session.get(url)
    resp.html.render() #RENDERS INCASE ITS JAVASCRIPT SITE
    soup = BeautifulSoup(resp.html.html, features='lxml')
    print(url) #shows what page you are on as it is looping
    agencies = soup.find_all(class_='company-name')
    for a in agencies:
        text = (a.text)
    results.append(text)

print(results)

The results of the code above only display the last element of each page as text.上面代码的结果只将每个页面的最后一个元素显示为文本。

RESULTS:结果：

https://clutch.co/it-services/msp?page=0
https://clutch.co/it-services/msp?page=1
https://clutch.co/it-services/msp?page=2
https://clutch.co/it-services/msp?page=3
['\nAgency Partner Interactive LLC ', '\nTEAM International ', '\nAstute Technology Management ', '\nWP Tech Support ']

My understanding is that this is because of the nested for loop only displays one element?我的理解是这是因为嵌套的 for 循环只显示一个元素？ What would be the proper procedure to get the text of every element on all the pages?获取所有页面上每个元素的文本的正确程序是什么？

Thanks in advance.提前致谢。

Answer 1

This is because the statement where you are appending each entry to the results list is out of the internal for loop.这是因为将每个条目附加到结果列表的语句不在内部 for 循环中。

Try this:尝试这个：

### CREATING LOOP TO GO THROUGH PAGES ###

results = [] #variable to store loop results
for i in range (4): #goes through 4 pages (0-3)
    url = 'https://clutch.co/it-services/msp?page={}'.format(i) #passes the number inside range through the {}
    session = HTMLSession() 
    resp = session.get(url)
    resp.html.render() #RENDERS INCASE ITS JAVASCRIPT SITE
    soup = BeautifulSoup(resp.html.html, features='lxml')
    print(url) #shows what page you are on as it is looping
    agencies = soup.find_all(class_='company-name')
    for a in agencies:
        text = (a.text)
        results.append(text)

print(results)

嵌套 For 循环 - 美丽的汤文本

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-10 19:39:12

嵌套 For 循环 - 美丽的汤文本

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-10 19:39:12

解决方案1
1 已采纳 2020-06-10 19:39:12