简体   繁体   English

删除python3中html中标记之间的换行符

[英]Removing newlines between tags in html in python3

I want trim down all the white spaces and new line and turn the result from 我想要修剪所有的空白区域和新线条,然后转动结果

<title>

     Asian Case Research Journal (World Scientific)

</title>

to this 对此

<title>Asian Case Research Journal (World Scientific)</title>

My code: 我的代码:

for link in url_list:
    try:
    r = requests.get(link)
    soup = BeautifulSoup(r.content,"html.parser")
    print(soup.title)
except:
    print("No Title Found ")
    continue
import bs4

html = '''<title>

     Asian Case Research Journal (World Scientific)

</title>'''
soup = bs4.BeautifulSoup(html, 'lxml')
title = soup.title
title.string = title.get_text(strip=True)
print(str(title))

out: 出:

<title>Asian Case Research Journal (World Scientific)</title>

In bs4, tag is an Object which has string attribute, you can access or modify it with . 在bs4中,tag是一个具有string属性的Object,你可以使用它来访问或修改它. notation, and convert the tag object to python str object by using str(tag) 表示法,并使用str(tag)将标记对象转换为python str对象

Document: modifying-string 文档: 修改字符串

试试这个并根据您的用例进行修改。

desired_string = ''.join([x.strip() for x in str(soup.title).split('\r\n')])

soup.title.text.strip()应该这样做

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM