簡體   English   中英

刪除空格和換行符-BeautifulSoup Python

[英]Remove whitespace and newlines - beautifulsoup python

使用Beautifulsoup,我正在抓取以下Web源:

<div>
<p class="introduction">    Manchester City&#039;s Fabian Delph limped off in     the first minute of England Euro 2016 qualifier against Switzerland with a suspected hamstring injury. </p>
<p>    The 25-year-old midfielder, who signed for City from Aston Villa in the summer, pulled up suddenly during Tuesday&#039;s game at Wembley. </p>
<p>    Delph was picked in Roy Hodgson&#039;s first XI having been left out of the starting line-up against San Marino on Saturday.</p>
<p>    Delph was making his eighth appearance for England.</p>
</div>

我使用以下代碼:

for item in soup.find_all('div'):
    print item.find('p').text.replace('\n','')

這可行,但是結果看起來像這樣(更像是四個單獨的值):

Manchester City's Fabian Delph limped off in the first minute of England's Euro 2016 qualifier against Switzerland with a suspected hamstring injury.

The 25-year-old midfielder, who signed for City from Aston Villa in the summer, pulled up suddenly during Tuesday's game at Wembley.

Delph was picked in Roy Hodgson's first XI having been left out of the starting line-up against San Marino on Saturday.

Delph was making his eighth appearance for England.

如何獲得以下格式的輸出(更像是單個值):

Manchester City's Fabian Delph limped off in the first minute of England's Euro 2016 qualifier against Switzerland with a suspected hamstring injury. The 25-year-old midfielder, who signed for City from Aston Villa in the summer, pulled up suddenly during Tuesday's game at Wembley. Delph was picked in Roy Hodgson's first XI having been left out of the starting line-up against San Marino on Saturday. Delph was making his eighth appearance for England.

最終,我想將此數據保存在一個csv文件中。 以上內容應視為csv文件中的單個值(不是四個值)。

您正在做的是調用打印功能。 print只是將字符串打印到控制台,然后打印換行符。 您可以像下面這樣一個大字符串

big_string = ""
for item in soup.find_all('div'):
  big_string += item.find('p').text.replace('\n','')

你可以試試:

divs = soup.find_all('div')
result = ''.join([div.find('p').text.replace('\n','') for div in divs])
print result

第二行將所有div段落文本放在列表中,並將它們逐個連接。 您可以檢查str.join函數。

這種方法比求和所有字符串(這也是有效,正確和足夠好)相加的速度更快,因為它不會在進程中創建額外的字符串。

您要調用打印語句四次,所以它顯示在四行上。

試試這個修改

single_string_answer = ''
for item in soup.find_all('div'): 
    item.find('p').text.replace('\n','')
    single_string_answer += str(item)
print single_string_answer

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM