[英]How to remove extra space or gap between the tag in python
Hello I am scraping an li tag within a div website from website. 您好,我正在从网站的div网站中抓取li标签。 And I get the output with lots a space in it how can I remove the extra space from the tag I am using python 3.5.1 and BeautifulSoup for scraping My Output:
而且我得到的输出中有很多空格,如何从使用python 3.5.1和BeautifulSoup刮刮我的输出的标签中删除多余的空间:
[<li>
GUANGZHOU ADS AUDIO SCIENCE & TECHNOLOGY CO.,LTD.
</li>, <li>
SHIMA ADS INDUSTRIAL DISTRICT GUANGZHOU GUANGDONG CHINA
</li>, <li>
GUANGDONGGUANGZHOU
</li>, <li>
510440
</li>, <li>
http://www.adsaudio.cc
</li>]
[<li>
GUANGDONG TEXTILES IMPORT & EXPORT COMPANY LTD.
</li>, <li>
GUANGDONG ,NO.168 XIAO BEI RD.,GUANGZHOU
</li>, <li>
GUANGDONGGUANGZHOU
</li>, <li>
510045
</li>, <li>
http://www.gdtex.com
</li>]
And I want the output like 我想要的输出像
GUANGZHOU ADS AUDIO SCIENCE & TECHNOLOGY CO.,LTD.
SHIMA ADS INDUSTRIAL DISTRICT GUANGZHOU GUANGDONG CHINA
GUANGDONG TEXTILES MANSION,NO.168 XIAO BEI RD.,GUANGZHOU
GUANGDONG ,NO.168 XIAO BEI RD.,GUANGZHOU
How can I remove the extra space or gap 如何去除多余的空间或间隙
You can use get_text
method from BeautifulSoup 您可以使用BeautifulSoup中的
get_text
方法
items = soup.find_all("li")
for item in items:
print item.get_text().strip()
Try using strip on the text you are getting back from Beautiful Soup
. 尝试在您从
Beautiful Soup
返回的文本上使用strip 。
Let's say you are using something like this to extract the text from the li
tag: text = soup.find('li').get_text()
, then add a call to strip()
on text text.strip()
and that should remove the whitespaces at both ends. 假设您正在使用类似的方法从
li
标签中提取文本: text = soup.find('li').get_text()
,然后在text.strip()
文本上添加对strip()
的调用删除两端的空格。
from bs4 import BeautifulSoup
def get_li_texts(html):
soup = BeautifulSoup(html)
li_list = soup.findAll('li')
li_texts = []
for li in li_list:
text = li.get_text().strip()
li_texts.append(text)
return li_texts
html = '<li>\n\n GUANGZHOU ADS AUDIO SCIENCE & TECHNOLOGY CO.,LTD.\n\n </li>, <li>\n\n SHIMA ADS INDUSTRIAL DISTRICT GUANGZHOU GUANGDONG CHINA\n\n </li>, <li>\n\n GUANGDONGGUANGZHOU\n\n </li>, <li>\n\n 510440\n\n </li>, <li>\n\n http://www.adsaudio.cc\n\n </li>'
texts = get_li_texts(html)
>> [u'GUANGZHOU ADS AUDIO SCIENCE & TECHNOLOGY CO.,LTD.',
>> u'SHIMA ADS INDUSTRIAL DISTRICT GUANGZHOU GUANGDONG CHINA',
>> u'GUANGDONGGUANGZHOU',
>> u'510440',
>> u'http://www.adsaudio.cc']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.