如何在漂亮的汤对象中打印两个短语之间的所有行？

Question

I have a html document converted into a soup object and I am trying to print all the lines of text between two key phrases.我有一个 html 文档转换成一个汤对象，我试图打印两个关键短语之间的所有文本行。 I am using soup.find to search for the two phrases but I don't know how to print all the lines between them.我正在使用soup.find 来搜索这两个短语，但我不知道如何打印它们之间的所有行。 Here is my code so far:到目前为止，这是我的代码：

file = open(r'PDFs/murrumbidgee/Murrumbidgee Unregulated River Water Sources 2012_20200815.html', 'r', encoding='utf8')

contents = file.read()

soup = BS(contents, 'lxml')

textStart = soup.find(text=re.compile("19  domestic and stock rights"))
textEnd = soup.find(text = re.compile('20  native title rights'))

print(textStart)
print(textEnd)

An example of the html is here: html的一个例子在这里：

 <br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:72px; top:22061px; width:169px; height:11px;"><span style="font-family: Arial-BoldMT; font-size:11px">19  Domestic and stock rights 
<br>unsuitable for human consumption. Water from these water sources should not be 

<br>consumed without first being tested and if necessary, appropriately treated. Such testing 
<br>and treatment is the responsibility of the water user. 
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:72px; top:24038px; width:31px; height:8px;"><span style="font-family: ArialMT; font-size:8px">Page 27 
<br></span></div>

<div style="position:absolute; top:24131px;"></div>
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:72px; top:24202px; width:331px; height:19px;"><span style="font-family: ArialMT; font-size:9px"> 
<br>Water Sharing Plan for the Murrumbidgee Unregulated River Water Sources 2012  
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:72px; top:24231px; width:2px; height:22px;"><span style="font-family: ArialMT; font-size:9px"> 
<br></span><span style="font-family: TimesNewRomanPSMT; font-size:11px"> 
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:72px; top:24261px; width:120px; height:11px;"><span style="font-family: Arial-BoldMT; font-size:11px">20  Native title rights

Answer 1

You can use re module to extract the text.您可以使用re模块来提取文本。 For example:例如：

import re
from bs4 import BeautifulSoup


txt = '''
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:72px; top:22061px; width:169px; height:11px;"><span style="font-family: Arial-BoldMT; font-size:11px">19  Domestic and stock rights
<br>unsuitable for human consumption. Water from these water sources should not be

<br>consumed without first being tested and if necessary, appropriately treated. Such testing
<br>and treatment is the responsibility of the water user.
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:72px; top:24038px; width:31px; height:8px;"><span style="font-family: ArialMT; font-size:8px">Page 27
<br></span></div>

<div style="position:absolute; top:24131px;"></div>
<div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:72px; top:24202px; width:331px; height:19px;"><span style="font-family: ArialMT; font-size:9px">
<br>Water Sharing Plan for the Murrumbidgee Unregulated River Water Sources 2012
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:72px; top:24231px; width:2px; height:22px;"><span style="font-family: ArialMT; font-size:9px">
<br></span><span style="font-family: TimesNewRomanPSMT; font-size:11px">
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:72px; top:24261px; width:120px; height:11px;"><span style="font-family: Arial-BoldMT; font-size:11px">20  Native title rights
'''

soup = BeautifulSoup(txt, 'html.parser')
raw_text = soup.get_text(strip=True, separator='\n')
t = re.search(r'19\s+domestic and stock rights(.*?)20\s+native title rights', raw_text, flags=re.S|re.I).group(1)
print(t)

Prints:印刷：

unsuitable for human consumption. Water from these water sources should not be
consumed without first being tested and if necessary, appropriately treated. Such testing
and treatment is the responsibility of the water user.
Page 27
Water Sharing Plan for the Murrumbidgee Unregulated River Water Sources 2012

如何在漂亮的汤对象中打印两个短语之间的所有行？

问题描述

1 个解决方案

解决方案1
1 2020-08-28 07:06:04

如何在漂亮的汤对象中打印两个短语之间的所有行？

问题描述

1 个解决方案

解决方案1 1 2020-08-28 07:06:04

解决方案1
1 2020-08-28 07:06:04