简体   繁体   English

比较python列表和使用BeautifulSoup python3从html提取的字符串

[英]compare a python list with strings which are extracted from html using BeautifulSoup python3

i am working on a project in python using BeautifulSoup in which i am trying to extract data from html. 我正在使用BeautifulSoup在python中的项目中工作,我正在尝试从html提取数据。 i have already extracted some data but now i am facing some issues that is i have this list category=['author', 'pinglun', 'renqi', 'step', 'gongyi', 'nandu', 'renshu', 'kouwei', 'zbshijian', 'prshijian'] and in my html document, there is a value associated to each of the element in the above list. 我已经提取了一些数据,但是现在我遇到了一些问题,即我有以下列表类别= ['author','pinglun','renqi','step','gongyi','nandu','renshu', 'kouwei','zbshijian','prshijian'],并且在我的html文档中,有一个与上述列表中的每个元素相关联的值。 i have tried this code but this is just retrieving value of one element ie "author". 我试过这段代码,但这只是检索一个元素的值,即“作者”。 i want to extract values of all the elements in the above "category" list 我想提取上述“类别”列表中所有元素的值

for script in scripts:
if "_BFD.BFD_INFO" in script.text:
    text=script.text
    m_text=text.split('=')
    m_text = text.split('=')
    m_text = m_text[2].split(":")
    m_text = m_text[1].split(',')
    encoded = m_text[0].encode('utf-8')
    print(encoded.decode('utf-8'))
category=['author', 'pinglun', 'renqi', 'step', 'gongyi', 'nandu', 
'renshu', 'kouwei', 'zbshijian', 'prshijian']
for script in scripts:
text=script.text
m_text=text.split(',')
for n in m_text:
    if 'author'  in n:
        print(n)

Just use a for -loop to check for each category in the text-segment: 只需使用for -loop来检查文本段中的每个类别:

for script in scripts:
    if "_BFD.BFD_INFO" in script.text:
        text=script.text
        m_text=text.split('=')
        m_text = text.split('=')
        m_text = m_text[2].split(":")
        m_text = m_text[1].split(',')
        encoded = m_text[0].encode('utf-8')
        print(encoded.decode('utf-8'))
    category=['author', 'pinglun', 'renqi', 'step', 'gongyi', 'nandu', 
    'renshu', 'kouwei', 'zbshijian', 'prshijian']
    for script in scripts:
        text=script.text
        m_text=text.split(',')
        for n in m_text:
            for c in category:
                if c in n:
                   print(n)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM