BeautifulSoup HTML 提取文本

Question

I am working the first time with BeautifulSoup and am trying to extract a joke from a html (which is downloaded).我第一次使用 BeautifulSoup 并试图从 html （已下载）中提取一个笑话。 But unfortunately, there are no classes I can use to extract the information.但不幸的是，没有可以用来提取信息的类。

There is the line "beginning" and "end of the joke" and what I want is the title as well as the text of the joke.有“笑话的开头”和“笑话的结尾”这行，我想要的是笑话的标题和文本。 Attached you can find my code as well as the output.附上你可以找到我的代码以及 output。

from bs4 import BeautifulSoup

with open('init1.html', 'r') as f:
    contents = f.read()
    soup = BeautifulSoup(contents, 'lxml')   
    print(soup.prettify)

Output:
<bound method Tag.prettify of <html>
<head>
<title>Joke 1 of 25</title>
</head>
<body bgcolor="#fddf84" text="black">
<center>
<table cellpadding="0" cellspacing="0" width="620">
<td width="470">
<font size="+1"> <br/>
<!--begin of joke -->
A man visits the doctor. The doctor says "I have bad news for you.You have
cancer and Alzheimer's disease". <p>
The man replies "Well,thank God I don't have cancer!"
<!--end of joke -->
</p></font></td></table>
</center>
</body>
</html>
>

Answer 1

This is simple and worked:这很简单并且有效：

soup.table.td.text.strip()
# -> 'A man visits the doctor. The doctor says "I have bad news for you.You have\ncancer and Alzheimer\'s disease". \nThe man replies "Well,thank God I don\'t have cancer!"

BeautifulSoup HTML 提取文本

问题描述

1 个解决方案

解决方案1
0 2020-04-12 22:14:36

BeautifulSoup HTML 提取文本

问题描述

1 个解决方案

解决方案1 0 2020-04-12 22:14:36

解决方案1
0 2020-04-12 22:14:36