简体   繁体   English

嵌套 For 循环 - 美丽的汤文本

[英]Nested For Loops - Beautiful Soup Text

I am trying to scrape company names from multiple pages on a site.我正在尝试从网站上的多个页面中抓取公司名称。 I am using a for loop to move through each page and find the company name.我正在使用 for 循环浏览每个页面并找到公司名称。

### CREATING LOOP TO GO THROUGH PAGES ###

results = [] #variable to store loop results
for i in range (4): #goes through 4 pages (0-3)
    url = 'https://clutch.co/it-services/msp?page={}'.format(i) #passes the number inside range through the {}
    session = HTMLSession() 
    resp = session.get(url)
    resp.html.render() #RENDERS INCASE ITS JAVASCRIPT SITE
    soup = BeautifulSoup(resp.html.html, features='lxml')
    print(url) #shows what page you are on as it is looping
    agencies = soup.find_all(class_='company-name')
    for a in agencies:
        text = (a.text)
    results.append(text)

print(results)

The results of the code above only display the last element of each page as text.上面代码的结果只将每个页面的最后一个元素显示为文本。

RESULTS:结果:

https://clutch.co/it-services/msp?page=0
https://clutch.co/it-services/msp?page=1
https://clutch.co/it-services/msp?page=2
https://clutch.co/it-services/msp?page=3
['\nAgency Partner Interactive LLC ', '\nTEAM International ', '\nAstute Technology Management ', '\nWP Tech Support ']

My understanding is that this is because of the nested for loop only displays one element?我的理解是这是因为嵌套的 for 循环只显示一个元素? What would be the proper procedure to get the text of every element on all the pages?获取所有页面上每个元素的文本的正确程序是什么?

Thanks in advance.提前致谢。

This is because the statement where you are appending each entry to the results list is out of the internal for loop.这是因为将每个条目附加到结果列表的语句不在内部 for 循环中。

Try this:尝试这个:

### CREATING LOOP TO GO THROUGH PAGES ###

results = [] #variable to store loop results
for i in range (4): #goes through 4 pages (0-3)
    url = 'https://clutch.co/it-services/msp?page={}'.format(i) #passes the number inside range through the {}
    session = HTMLSession() 
    resp = session.get(url)
    resp.html.render() #RENDERS INCASE ITS JAVASCRIPT SITE
    soup = BeautifulSoup(resp.html.html, features='lxml')
    print(url) #shows what page you are on as it is looping
    agencies = soup.find_all(class_='company-name')
    for a in agencies:
        text = (a.text)
        results.append(text)

print(results)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM