[英]Removing newlines between tags in html in python3
I want trim down all the white spaces and new line and turn the result from 我想要修剪所有的空白区域和新线条,然后转动结果
<title>
Asian Case Research Journal (World Scientific)
</title>
to this 对此
<title>Asian Case Research Journal (World Scientific)</title>
My code: 我的代码:
for link in url_list:
try:
r = requests.get(link)
soup = BeautifulSoup(r.content,"html.parser")
print(soup.title)
except:
print("No Title Found ")
continue
import bs4
html = '''<title>
Asian Case Research Journal (World Scientific)
</title>'''
soup = bs4.BeautifulSoup(html, 'lxml')
title = soup.title
title.string = title.get_text(strip=True)
print(str(title))
out: 出:
<title>Asian Case Research Journal (World Scientific)</title>
In bs4, tag is an Object which has string attribute, you can access or modify it with .
在bs4中,tag是一个具有string属性的Object,你可以使用它来访问或修改它.
notation, and convert the tag object to python str object by using str(tag)
表示法,并使用str(tag)
将标记对象转换为python str对象
Document: modifying-string 文档: 修改字符串
试试这个并根据您的用例进行修改。
desired_string = ''.join([x.strip() for x in str(soup.title).split('\r\n')])
soup.title.text.strip()
应该这样做
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.