简体   繁体   English

使用 python 和 bs4 从 html div 标签中提取文本

[英]extracting text from html div tag using python and bs4

<div class="available-date-component">Available 4/1 at 9:00 AM</div>

I am trying to extract text from the html div above using the code below我正在尝试使用下面的代码从上面的 html div 中提取文本

date = page.find('div', {'class' : 'available-date-component'}) 
print(date.string) # this line needs to be fixed

Does anyone have any suggestsions on how I can only print 'Available 4/1 at 9:00 AM' isntead of the entire div?有没有人对我如何只能打印“4/1 上午 9:00 可用”而不是整个 div 有任何建议?

def clean(raw_html):
    cleaner = re.compile('<.*?>')
    cleantext = re.sub(cleaner, '', raw_html)
    return cleantext

I ended up passing the html div through this function that removes all of the tags and leaves the text that I wanted.我最终通过这个 function 传递了 html div,它删除了所有标签并留下了我想要的文本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM