简体   繁体   English

使用 python 和 BeautifulSoup 从 html 中提取链接:'NoneType' 对象没有属性 'attrs'

[英]Extracting a link from html using python and BeautifulSoup: 'NoneType' object has no attribute 'attrs'

Hi there I am using python 3 beautifulsoup to try and extract the link.嗨,我正在使用 python 3 beautifulsoup 尝试提取链接。 It works most of the time but every now and then it cant find the schema.它大部分时间都可以工作,但时不时地找不到模式。

Code I have looks like this(part of a larger body):我的代码看起来像这样(较大主体的一部分):

self.schema = self.soup.find(['link:schemaRef', 'schemaRef']).get('xlink:href')

self.namespaces = {}

for k in self.soup.find('html').attrs:
    if k.startswith("xmlns") or ":" in k:
        self.namespaces[k] = self.soup.find('html')[k].split(" ")

has no issue finding the schema in this kind of stuff:在这种东西中找到模式没有问题:

    <link:schemaRef xlink:type="simple" xlink:href="https://xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd" />

but it cant find xlink:href in these ones:但它无法在这些中找到 xlink:href :

    <schemaRef xlink:href="https://xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd" xlink:type="simple" xmlns="http://www.xbrl.org/2003/linkbase"/>

The error I get is:我得到的错误是:

AttributeError                            Traceback (most recent call last)
<ipython-input-8-da0992ab9ae8> in <module>
     97         with open(filename,encoding="utf8") as a:
---> 98             x = Parser(a)
     99             r = json.dumps(x.to_table(), indent=4)
    100             jsondata = json.loads(r)

~\OneDrive\Desktop\parser\core.py in __init__(self, f, raise_on_error)
     21         self.errors = []
---> 23         self._get_schema()
     25         self._get_contexts()

~\OneDrive\Desktop\parser\core.py in _get_schema(self)
     47         self.schema = self.soup.find(
---> 49             ['link:schemaRef', 'schemaRef']).get('xlink:href')
     51         self.namespaces = {}

AttributeError: 'NoneType' object has no attribute 'get'

Any help would be much appreciated任何帮助将非常感激

Thank you.谢谢你。

From your error trace back, the line call从您的错误回溯,该行调用

self.soup.find(['link:schemaRef', 'schemaRef'])

is returning None.正在返回无。 To protect against this, you should test the result before executing get , ie:为了防止这种情况,您应该在执行get之前测试结果,即:

data = self.soup.find(['link:schemaRef', 'schemaRef'])
if data is not None:
    self.schema = data.get('xlink:href')

@dspencer So this returns the correct schema. @dspencer 所以这会返回正确的模式。

from bs4 import BeautifulSoup

with open("F:\ErrorFolder\06647909.html", "r") as f:
    soup = BeautifulSoup(f, 'html.parser')
    resources = soup.find(['ix:references', 'references'])
    for s in resources.find_all(['link:schemaRef', 'schemaRef', 'schemaref']):
        x = s.get('xlink:href')

So I just need to change stuff around it seems the real issue might be the schemaref vs schemaRef所以我只需要改变周围的东西似乎真正的问题可能是 schemaref vs schemaRef

Thank you so much you've been really helpful非常感谢你真的很有帮助


声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 'NoneType' Object 没有属性 'attrs' - Python 'NoneType' Object Has No Attribute 'attrs' AttributeError: 'NoneType' object 在 Python 3 中没有属性 'get' 使用 beautifulsoup - AttributeError: 'NoneType' object has no attribute 'get' in Python 3 using beautifulsoup AttributeError:“ NoneType”对象在具有beautifulsoup的Python中没有属性*** - AttributeError: 'NoneType' object has no attribute *** in Python with beautifulsoup BeautifulSoup Python NoneType object 没有属性“文本” - BeautifulSoup Python NoneType object has no attribute 'text' 'NoneType' object 没有属性 'text' BeautifulSoup Python - 'NoneType' object has no attribute 'text' BeautifulSoup Python Python AttributeError: 'NoneType' object 没有属性 'find all' 与 BeautifulSoup - Python AttributeError: 'NoneType' object has no attribute 'find all' with BeautifulSoup AttributeError:&#39;NoneType&#39;对象没有属性&#39;text&#39;-Python,BeautifulSoup错误 - AttributeError: 'NoneType' object has no attribute 'text' - Python , BeautifulSoup Error AttributeError:&#39;NoneType&#39;对象没有属性&#39;text&#39;beautifulsoup python - AttributeError: 'NoneType' object has no attribute 'text' beautifulsoup python Python 3.9:BeautifulSoup:'NoneType' object 没有属性'文本' - Python 3.9 : BeautifulSoup: 'NoneType' object has no attribute 'text' AttributeError:&#39;NoneType&#39;对象不具有使用beautifulsoup进行编码的属性 - AttributeError: 'NoneType' object has no attribute 'encode using beautifulsoup
粤ICP备18138465号  © 2020-2024 STACKOOM.COM