简体   繁体   English

(Beautiful Soup) 获取按钮标签内的数据

[英](Beautiful Soup) Get data inside a button tag

I try to scrape out an ImageId inside a button tag, want to have the result:我尝试在按钮标签内刮出一个 ImageId,想要得到结果:

"25511e1fd64e99acd991a22d6c2d6b6c".

When I try:当我尝试:

drawing_url = drawing_url.find_all('button', class_='inspectBut')['onclick'] 

it doesn't work.它不起作用。 Giving an error-给出一个错误-

TypeError: list indices must be integers or slices, not str

Input =输入 =

for article in soup.find_all('div', class_='dojoxGridRow'):
drawing_url = article.find('td', class_='dojoxGridCell', idx='3')
drawing_url = drawing_url.find_all('button', class_='inspectBut')
if drawing_url:
    for e in drawing_url:
        print(e)

Output =输出 =

    <button class="inspectBut" href="#" 
        onclick="window.open('getImg?imageId=25511e1fd64e99acd991a22d6c2d6b6c&amp;
                 timestamp=1552011572288','_blank', 'toolbar=0, 
                 menubar=0, modal=yes, scrollbars=1, resizable=1, 
                 height='+$(window).height()+', width='+$(window).width())" 
         title="Open Image" type="button">
    </button>
... 
...

You should be searching for你应该寻找

button_list = soup.find_all('button', {'class': 'inspectBut'})

That will give you the button array and you can later get url field by这将为您提供按钮数组,稍后您可以通过以下方式获取 url 字段

 [button['getimg?imageid'] for button in button_list]

You will still need to do some parsing, but I hope this can get you on the right track.您仍然需要进行一些解析,但我希望这能让您走上正轨。

Your mistake here was that you need to search correct property class and look for correct html tag, which is, ironically, getimg?imageid .你在这里的错误是你需要搜索正确的属性class并寻找正确的 html 标签,具有讽刺意味的是, getimg?imageid

Try this one.试试这个。

import re

#for all the buttons
btn_onlclick_list = [a.get('onclick') for a in soup.find_all('button')]
for click in btn_onlclick_list:
     a = re.findall("imageId=(\w+)", click)[0]
     print(a)

You first need to check whether the attribute is present or not.您首先需要检查该属性是否存在。 tag.attrs returns a list of attributes present in the current tag tag.attrs返回当前标签中存在的属性列表

Consider the following Code.考虑以下代码。

Code:代码:

from bs4 import BeautifulSoup
a="""
<td>
<button class='hi' onclick="This Data">
<button class='hi' onclick="This Second">
</td>"""
soup = BeautifulSoup(a,'lxml')
print([btn['onclick'] for btn in soup.find_all('button',class_='hi') if 'onclick' in btn.attrs])

Output:输出:

['This Data','This Second']

or you can simply do this或者你可以简单地这样做

[btn['onclick'] for btn in soup.find_all('button', attrs={'class' : 'hi', 'onclick' : True})]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM