如何在python中獲取html2text的清晰輸出？

Question

我有以下python程序：

import urllib.request as urllib2
import html2text

html = urllib2.urlopen("http://www.stern.de/")
page_source = html.read()

h = html2text.HTML2Text()
h.ignore_links = True
h.ignore_images = True

text = h.handle(str(page_source))

print (text)

輸出為：

\n \n\n

    * \n Anmelden
\n\n

    * \n 

Sie haben noch keinen Account?

\n Kostenlos neu registrieren

\n \n

\n

如何過濾出“ \\ n”？

我以這種方式嘗試了例如，但它不起作用：

wordList = text.split()

for word in wordList:
    if word != "\n":
        print (word)

這是拆分后的輸出：

\n\n
*
\n
Anmelden
\n\n
*
\n
Sie
haben
noch
keinen
Account?
\n
Kostenlos
neu
registrieren
\n
\n
\n

因此我的支票無效。 如何檢查\\ n換行符？

Answer 1

好的，我這樣解決了，因為我調試了它，發現\\ n處於調試模式\\ n。

text = text.replace('\\n', '')

Answer 2

您嘗試過用replace嗎？

text.replace('\n', '')

如何在python中獲取html2text的清晰輸出？

問題描述

2 個解決方案

解決方案1
2 2015-08-28 16:25:40

解決方案2
-2 2015-08-28 15:59:57

如何在python中獲取html2text的清晰輸出？

問題描述

2 個解決方案

解決方案1 2 2015-08-28 16:25:40

解決方案2 -2 2015-08-28 15:59:57

解決方案1
2 2015-08-28 16:25:40

解決方案2
-2 2015-08-28 15:59:57