简体   繁体   English

从多个标签中获取数据 BeautifulSoup/Python

[英]Get data from Multiple tags BeautifulSoup/Python

I am trying to get as much data as possible from this site including the title, time and the event per time.我试图从该站点获取尽可能多的数据,包括每次的标题、时间和事件。

For example for the Aniversary Awards Ceremony, it should print that title, 8:30, and the name of event that happens at 8:30.例如,对于周年纪念颁奖典礼,它应该打印标题,8:30,以及在 8:30 发生的事件的名称。

The tags after the title, (such as the time) start to repeat/change multiple times not allowing to pull with accuracy.标题后面的标签(例如时间)开始多次重复/更改,无法准确提取。 Is there a better way to approach this?有没有更好的方法来解决这个问题? Pull all data as accurate to the site?将所有数据尽可能准确地提取到站点?

Thanks谢谢

import requests
from bs4 import BeautifulSoup
import pandas as pd
productlinks=[]

url='https://www.sitcancer.org/2020/program/annual-meeting-schedule-2020'
r=requests.get(url)
soup=BeautifulSoup(r.content,'html.parser')
productlist=soup.find_all('div',class_='HtmlContent')
for section in productlist:
    title=section.find('span',style="font-family: helvetica; color: #ef4136;")
    if title is not None:
        title=title.text
    else:
        title='No'
    print(title)

This should get you close enough to what you need, and you can take from there:这应该让你足够接近你需要的东西,你可以从那里得到:

from bs4 import Tag, NavigableString
tabs = soup.select('div.HtmlContent table[style="height: 740px; width: 100%;"] tbody')
for t in tabs[0]:    
    if not isinstance(t, NavigableString ) and len(t.text.strip())>0:        
        if 'Session' in t.text:
            print(t.text)
        else:
            if t.select_one('h3') is not None:
                print(t.text.strip())
            else:                
                for w in t.select('td span strong'):
                    print(w.text.strip())
                for f in t.select('span[style="color: #000000;"]'):
                    print(f.text.strip())  

    

Output:输出:

9 a.m.–3:15 p.m. EST* 
*Dates, times and program schedule subject to change. 


Immunotherapy Resistance and Failure


Session I: Defining Immune Checkpoint Inhibitor Resistance 


9 a.m.
Introduction

9:05 a.m.
Definitions of Resistance and Gaps in Current Understanding
Ryan J. Sullivan, MD - Massachusetts General Hospital

etc.等等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM