beautifulsoup - 无法从列表中获取信息

Question

Total python amateur here.在这里完全是 python 业余爱好者。 I've taken a few classes for some surface-level stuff but haven't come across solutions to this problem for a new website I'm trying to scrape (kijiji.ca for anyone wondering).我已经为一些表面级的东西上了几门课，但还没有遇到这个问题的解决方案，我正在尝试抓取一个新网站（kijiji.ca 为任何想知道的人）。 I'm trying to pull down info on rental housing for some PhD dissertation-related work.我正在尝试为一些与博士论文相关的工作提取有关出租房屋的信息。 Inspecting a sample page, I'm finding that some key info I need all has the same class.检查示例页面，我发现我需要的一些关键信息都具有相同的类。 For example:例如：

 <div class="titleAttributes-2381855425"> <li class="noLabelAttribute-1492730675"><svg class="icon-459822882 attributeIcon-1499443538 attributeIcon__condensed-4247835132" focusable="false" height="100%" role="img" width="100%"><use xlink:href="#icon-attributes-unittype"></use></svg><span class="noLabelValue-3861810455">Condo</span></li> <li class="noLabelAttribute-1492730675"><svg class="icon-459822882 attributeIcon-1499443538 attributeIcon__condensed-4247835132" focusable="false" height="100%" role="img" width="100%"><use xlink:href="#icon-attributes-numberbedrooms"></use></svg><span class="noLabelValue-3861810455">Bedrooms: 2</span></li> <li class="noLabelAttribute-1492730675"><svg class="icon-459822882 attributeIcon-1499443538 attributeIcon__condensed-4247835132" focusable="false" height="100%" role="img" width="100%"><use xlink:href="#icon-attributes-numberbathrooms"></use></svg><span class="noLabelValue-3861810455">Bathrooms: 1</span></li> </div>

I'm trying to get each piece of info, but when I run my code, it just shows up with nothing.我试图获取每条信息，但是当我运行我的代码时，它什么也没显示。

 def getDetails(urls): urls = urls[10:] print(len(urls)) i =0; try: for url in urls: print(url) listDetails = "" listDetailsTwo = [] url = url.rstrip('\\n') response = requests.get(url) soup = BeautifulSoup(response.text, "html.parser") try: infobar = soup.select_one("span[class*=noLabelValue-3861810455]").text infobar.append(infobar) print("Scraping listing : ",str(i))

(I know, my code must look like an absolute mess, but again, I'm a total amateur.) I know I must need to use something other than soup.select_one, but after a few days of trying, I'm really getting nowhere. （我知道，我的代码看起来肯定一团糟，但再说一次，我是个完全业余的人。）我知道我必须使用soup.select_one以外的东西，但经过几天的尝试，我真的一无所获。 Any help would be much appreciated!任何帮助将非常感激！

Thanks!谢谢！

Answer 1

You probably wantsoup.find_all() :你可能想要soup.find_all() ：

L = soup.find_all('span', {'class':'noLabelValue-3861810455'})
for item in L:
    print(item.text)

Prints:印刷：

Condo
Bedrooms: 2
Bathrooms: 1

Answer 2

Try this.尝试这个。

from simplified_scrapy import SimplifiedDoc,req,utils
html='''
<div class="titleAttributes-2381855425">
  <li class="noLabelAttribute-1492730675"><svg class="icon-459822882 attributeIcon-1499443538 attributeIcon__condensed-4247835132" focusable="false" height="100%" role="img" width="100%"><use xlink:href="#icon-attributes-unittype"></use></svg><span class="noLabelValue-3861810455">Condo</span></li>
  <li
    class="noLabelAttribute-1492730675"><svg class="icon-459822882 attributeIcon-1499443538 attributeIcon__condensed-4247835132" focusable="false" height="100%" role="img" width="100%"><use xlink:href="#icon-attributes-numberbedrooms"></use></svg><span class="noLabelValue-3861810455">Bedrooms: 2</span></li>
    <li
      class="noLabelAttribute-1492730675"><svg class="icon-459822882 attributeIcon-1499443538 attributeIcon__condensed-4247835132" focusable="false" height="100%" role="img" width="100%"><use xlink:href="#icon-attributes-numberbathrooms"></use></svg><span class="noLabelValue-3861810455">Bathrooms: 1</span></li>
</div>'''
doc = SimplifiedDoc(html)
spans = doc.selects('span.noLabelValue-3861810455>text()')
print (spans)
spans = doc.selects('div.titleAttributes-2381855425>li.noLabelAttribute-1492730675').select('span.noLabelValue-3861810455>text()')
print (spans)
utils.save2csv('test.csv',[spans])

Result:结果：

['Condo', 'Bedrooms: 2', 'Bathrooms: 1']
['Condo', 'Bedrooms: 2', 'Bathrooms: 1']

beautifulsoup - 无法从列表中获取信息

问题描述

2 个解决方案

解决方案1
0 2020-03-17 00:42:30

解决方案2
0 2020-03-17 05:46:23

beautifulsoup - 无法从列表中获取信息

问题描述

2 个解决方案

解决方案1 0 2020-03-17 00:42:30

解决方案2 0 2020-03-17 05:46:23

解决方案1
0 2020-03-17 00:42:30

解决方案2
0 2020-03-17 05:46:23