Python POST請求編碼

Question

這是情況，我發送POST請求並試圖獲取Python的響應問題是它扭曲非拉丁字母，當我用直接鏈接（沒有搜索結果）獲取同一頁面時不會發生這種情況，但POST請求不會生成鏈接

這是我的所作所為：

import urllib
import urllib2
url = 'http://donelaitis.vdu.lt/main_helper.php?id=4&nr=1_2_11'
data = 'q=bus&ieskoti=true&lang1=en&lang2=en+-%3E+lt+%28+71813+lygiagre%C4%8Di%C5%B3+sakini%C5%B3+%29&lentele=vertikalus&reg=false&rodyti=dalis&rusiuoti=freq' 
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
file = open("pagesource.txt", "w")
file.write(the_page)
file.close()

每當我嘗試

thepage = the_page.encode('utf-8')

我收到此錯誤：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1008: ordinal not in range(128)

每當我嘗試更改響應頭內容類型：text / html; charset = utf-8，我做

response['Content-Type'] = 'text/html;charset=utf-8'

我收到此錯誤：

AttributeError: addinfourl instance has no attribute '__setitem__'

我的問題：是否可以編輯或刪除響應或請求標頭？ 如果不是，除了將源復制到notepad ++並手動修復編碼之外，還有其他方法可以解決此問題嗎？

我是python和數據挖掘的新手，真的希望你能讓我知道我是否做錯了什么

謝謝

Answer 1

為什么不嘗試thepage = the_page.decode('utf-8')而不是encode因為你想要的是從utf-8編碼文本轉移到unicode - 編碼不可知 - 內部字符串？

Answer 2

兩件事情。 首先，您不想對響應進行編碼，而是希望對其進行解碼：

thepage = the_page.decode('utf-8')

其次，您不希望在響應上設置標頭，您可以使用add_header方法在請求上設置標頭：

req.add_header('Content-Type', 'text/html;charset=utf-8')

Python POST請求編碼

問題描述

2 個解決方案

解決方案1
2 2012-02-27 11:26:06

解決方案2
1 2012-02-27 11:25:43

Python POST請求編碼

問題描述

2 個解決方案

解決方案1 2 2012-02-27 11:26:06

解決方案2 1 2012-02-27 11:25:43

解決方案1
2 2012-02-27 11:26:06

解決方案2
1 2012-02-27 11:25:43