简体   繁体   English

使用 python 和 BeautifulSoup 从 html 中提取链接:'NoneType' 对象没有属性 'attrs'

[英]Extracting a link from html using python and BeautifulSoup: 'NoneType' object has no attribute 'attrs'

Hi there I am using python 3 beautifulsoup to try and extract the link.嗨,我正在使用 python 3 beautifulsoup 尝试提取链接。 It works most of the time but every now and then it cant find the schema.它大部分时间都可以工作,但时不时地找不到模式。

Code I have looks like this(part of a larger body):我的代码看起来像这样(较大主体的一部分):

self.schema = self.soup.find(['link:schemaRef', 'schemaRef']).get('xlink:href')

self.namespaces = {}

for k in self.soup.find('html').attrs:
    if k.startswith("xmlns") or ":" in k:
        self.namespaces[k] = self.soup.find('html')[k].split(" ")

has no issue finding the schema in this kind of stuff:在这种东西中找到模式没有问题:

<ix:references>
    <link:schemaRef xlink:type="simple" xlink:href="https://xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd" />
</ix:references>

but it cant find xlink:href in these ones:但它无法在这些中找到 xlink:href :

<references>
    <schemaRef xlink:href="https://xbrl.frc.org.uk/FRS-102/2014-09-01/FRS-102-2014-09-01.xsd" xlink:type="simple" xmlns="http://www.xbrl.org/2003/linkbase"/>
</references>

The error I get is:我得到的错误是:

AttributeError                            Traceback (most recent call last)
<ipython-input-8-da0992ab9ae8> in <module>
     96 
     97         with open(filename,encoding="utf8") as a:
---> 98             x = Parser(a)
     99             r = json.dumps(x.to_table(), indent=4)
    100             jsondata = json.loads(r)

~\OneDrive\Desktop\parser\core.py in __init__(self, f, raise_on_error)
     21         self.errors = []
     22 
---> 23         self._get_schema()
     24 
     25         self._get_contexts()

~\OneDrive\Desktop\parser\core.py in _get_schema(self)
     47         self.schema = self.soup.find(
     48 
---> 49             ['link:schemaRef', 'schemaRef']).get('xlink:href')
     50 
     51         self.namespaces = {}

AttributeError: 'NoneType' object has no attribute 'get'

Any help would be much appreciated任何帮助将非常感激

Thank you.谢谢你。

From your error trace back, the line call从您的错误回溯,该行调用

self.soup.find(['link:schemaRef', 'schemaRef'])

is returning None.正在返回无。 To protect against this, you should test the result before executing get , ie:为了防止这种情况,您应该在执行get之前测试结果,即:

data = self.soup.find(['link:schemaRef', 'schemaRef'])
if data is not None:
    self.schema = data.get('xlink:href')

@dspencer So this returns the correct schema. @dspencer 所以这会返回正确的模式。

from bs4 import BeautifulSoup

with open("F:\ErrorFolder\06647909.html", "r") as f:
    soup = BeautifulSoup(f, 'html.parser')
    resources = soup.find(['ix:references', 'references'])
    #print(resources)
    for s in resources.find_all(['link:schemaRef', 'schemaRef', 'schemaref']):
        x = s.get('xlink:href')
        print(x)

So I just need to change stuff around it seems the real issue might be the schemaref vs schemaRef所以我只需要改变周围的东西似乎真正的问题可能是 schemaref vs schemaRef

Thank you so much you've been really helpful非常感谢你真的很有帮助

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 'NoneType' Object 没有属性 'attrs' - Python 'NoneType' Object Has No Attribute 'attrs' AttributeError: 'NoneType' object 在 Python 3 中没有属性 'get' 使用 beautifulsoup - AttributeError: 'NoneType' object has no attribute 'get' in Python 3 using beautifulsoup AttributeError:“ NoneType”对象在具有beautifulsoup的Python中没有属性*** - AttributeError: 'NoneType' object has no attribute *** in Python with beautifulsoup BeautifulSoup Python NoneType object 没有属性“文本” - BeautifulSoup Python NoneType object has no attribute 'text' 'NoneType' object 没有属性 'text' BeautifulSoup Python - 'NoneType' object has no attribute 'text' BeautifulSoup Python Python AttributeError: 'NoneType' object 没有属性 'find all' 与 BeautifulSoup - Python AttributeError: 'NoneType' object has no attribute 'find all' with BeautifulSoup AttributeError:&#39;NoneType&#39;对象没有属性&#39;text&#39;-Python,BeautifulSoup错误 - AttributeError: 'NoneType' object has no attribute 'text' - Python , BeautifulSoup Error AttributeError:&#39;NoneType&#39;对象没有属性&#39;text&#39;beautifulsoup python - AttributeError: 'NoneType' object has no attribute 'text' beautifulsoup python Python 3.9:BeautifulSoup:'NoneType' object 没有属性'文本' - Python 3.9 : BeautifulSoup: 'NoneType' object has no attribute 'text' AttributeError:&#39;NoneType&#39;对象不具有使用beautifulsoup进行编码的属性 - AttributeError: 'NoneType' object has no attribute 'encode using beautifulsoup
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM