简体   繁体   English

如何使用beautifulsoup提取特定句子后的文本?

[英]How to extract text after a specific sentence with beautifulsoup?

我想提取文件中特定句子之后的文本。

Do you specifically require BeautifulSoup? 您是否特别需要BeautifulSoup? If not use the following: 如果不使用以下内容:

To split the text right after a specific sentence try this, since I am not sure what you specifically want to extract after the sentence I will assume you mean everything after the sentence, 要在特定句子后立即分割文本,请尝试此操作,因为我不确定您在句子后要具体提取什么,因此我假设您指的是句子后的所有内容,

For example, if I had a file file.txt: 例如,如果我有一个文件file.txt:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Lorem ipsum dolor坐下,一直保持着安静的状态。 Vivamus congue mattis risus, sit amet elementum lorem gravida eu. Vivamus congue mattis risus,amet elementum lorem gravida eu。 Cras vitae ante vel erat feugiat scelerisque. Cras vitae ante vel ert feugiat scelerisque。 Etiam nec urna sed enim blandit blandit non nec odio. Etiam nec urna sed enim blandit blandit非其他。 Quisque lacinia tempus rhoncus. 魁北克的藤黄属天疱性红景天。 Mauris euismod leo ut velit lobortis feugiat. 毛里斯euismod leo ut velit lobortis feugiat。 Phasellus ultrices nunc sit amet tortor pretium eu mollis neque condimentum. 菜豆未裂化,腐霉变质。 Fusce placerat bibendum diam eget euismod. Fusce placerat bibendum diam eget euismod。 Phasellus ultricies erat nibh, sed volutpat quam. 菜豆菌种消除,sed volutpat quam。 Nunc quis mauris sed purus aliquet aliquam. Nunc quis mauris sed purus aliquet aliquam。 Integer viverra rutrum arcu ac tempor. 整数vitru rutrum arcu ac tempor。

And my sentence was, Mauris euismod leo ut velit lobortis feugiat. 我的判决是, Mauris euismod leo ut velit lobortis feugiat.

You could do this: 您可以这样做:

with open("file.txt") as file: #open a file securily, then automitaclly close it
    seperator = "Mauris euismod leo ut velit lobortis feugiat." #assign pre-opt variable for the sentence
    for line in file:
        text = line.split(seperator,1)[1]
    print text

>>> Phasellus ultrices nunc sit amet tortor pretium eu mollis neque condimentum. Fusce placerat bibendum diam eget euismod. Phasellus ultricies erat nibh, sed volutpat quam. Nunc quis mauris sed purus aliquet aliquam. Integer viverra rutrum arcu ac tempor.

Using BeautifulSoup you could extract all the text from the file, if you need something more specific let me know. 使用BeautifulSoup您可以从文件中提取所有文本,如果您需要更具体的信息,请告诉我。

from bs4 import BeautifulSoup

soup = """<html><body><div style="DISPLAY: block; TEXT-INDENT: 0pt"><br/></div> <div align="justify" style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Arial">Our Earnings are Significantly Affected by General Business and Economic Conditions</font></div></body></html>"""

print(soup.get_text())

Output: 输出:

 Our Earnings are Significantly Affected by General Business and Economic Conditions

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM