简体   繁体   English

Python Beautiful Soup 仅在文本匹配时刮?

[英]Python Beautiful Soup scrape only if text match?

I just started learn beautiful soup, been watching videos and getting a hold of it somewhat.我刚开始学习美味的汤,一直在看视频并掌握了一些。 But examples provided, they seem already have a well structure in the HTML and not searching specific word anywhere.但是提供的示例,它们似乎已经在 HTML 中具有良好的结构,并且没有在任何地方搜索特定的单词。 What I try to do, is to print only the information of specific country mentioned, if it doesn't mention - it shouldn't print.我尝试做的是只打印提到的特定国家的信息,如果没有提到 - 它不应该打印。 And later on will build so it append to text file.稍后将构建 append 到文本文件。 I simply would like to grab everyone who is from new zealand, but to experiment I've been using United States because it's posted more frequently.我只是想抓住所有来自新西兰的人,但为了进行实验,我一直在使用美国,因为它的发布频率更高。

At the moment my code looks like this, it simply grabs all of them.目前我的代码看起来像这样,它只是抓住了所有这些。

from bs4 import BeautifulSoup
import requests

source = requests.get('https://pogotrainer.club/?sort=worldwide').text
soup = BeautifulSoup(source, 'lxml')

trainer = soup.find('article')
for box in trainer.find_all('div', class_='media-body'):
    print(box.text)

In one tutorial I saw they used findNext, since anyway the important is the friend code listed.在一个教程中,我看到他们使用了 findNext,因为无论如何重要的是列出的朋友代码。 So I tried doing so所以我试着这样做

usa = box.find(text="United States").findNext(class_="TCLink")

however printing it with print(usa), gives me error但是用 print(usa) 打印它,给我错误

AttributeError: 'NoneType' object has no attribute 'findNext' AttributeError: 'NoneType' object 没有属性 'findNext'

Before as well, I've tried things like以前,我也尝试过类似的东西

usa = soup.find(text="United")

But it prints但它打印

None没有任何

Even if looking at the page, it does have it.即使看页面,它确实有它。 Does anyone have suggestions?有人有建议吗?

Thanks in advance提前致谢

AttributeError: 'NoneType' object has no attribute 'findNext' let's break this down: AttributeError: 'NoneType' object has no attribute 'findNext'让我们分解一下:

  • The NoneType object is box NoneType objectbox
  • You access the attribute with .findNext (which is actually a method) but because the object is None , the statement makes no sense.您使用.findNext (实际上是一种方法)访问该attribute ,但由于 object 是None ,因此该语句没有意义。

You're assuming that box is not None , so you have to make sure what you're working with.您假设该box不是None ,因此您必须确保您正在使用什么。 You might want to try this:你可能想试试这个:

for box in trainer.find_all('div', class_='media-body'):
    print(box)

Always try to know what you're working with by, for example, printing it explicitly.始终尝试通过例如显式打印来了解您正在使用的内容。

That's one of Python's weaknesses (or strengths, depends on what you work on), it leaves this part of the debugging to the user.这是 Python 的弱点之一(或优势,取决于您的工作),它将调试的这一部分留给用户。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM