删除python3中html中标记之间的换行符

Question

I want trim down all the white spaces and new line and turn the result from 我想要修剪所有的空白区域和新线条，然后转动结果

<title>

     Asian Case Research Journal (World Scientific)

</title>

to this 对此

<title>Asian Case Research Journal (World Scientific)</title>

My code: 我的代码：

for link in url_list:
    try:
    r = requests.get(link)
    soup = BeautifulSoup(r.content,"html.parser")
    print(soup.title)
except:
    print("No Title Found ")
    continue

Answer 1

import bs4

html = '''<title>

     Asian Case Research Journal (World Scientific)

</title>'''
soup = bs4.BeautifulSoup(html, 'lxml')
title = soup.title
title.string = title.get_text(strip=True)
print(str(title))

out: 出：

<title>Asian Case Research Journal (World Scientific)</title>

In bs4, tag is an Object which has string attribute, you can access or modify it with . 在bs4中，tag是一个具有string属性的Object，你可以使用它来访问或修改它. notation, and convert the tag object to python str object by using str(tag) 表示法，并使用str(tag)将标记对象转换为python str对象

Document: modifying-string 文档：修改字符串

Answer 2

试试这个并根据您的用例进行修改。

desired_string = ''.join([x.strip() for x in str(soup.title).split('\r\n')])

Answer 3

soup.title.text.strip()应该这样做

删除python3中html中标记之间的换行符

问题描述

3 个解决方案

解决方案1
2 已采纳 2017-02-17 07:00:04

解决方案2
1 2017-02-17 04:33:44

解决方案3
0 2017-02-17 05:39:21

删除python3中html中标记之间的换行符

问题描述

3 个解决方案

解决方案1 2 已采纳 2017-02-17 07:00:04

解决方案2 1 2017-02-17 04:33:44

解决方案3 0 2017-02-17 05:39:21

解决方案1
2 已采纳 2017-02-17 07:00:04

解决方案2
1 2017-02-17 04:33:44

解决方案3
0 2017-02-17 05:39:21