[英]ValueError(“No JSON object could be decoded”) using Python 2.6 and utf-8
我正在嘗試編寫一組映射器/縮減器代碼供hadoop計算推文中的單詞數,但我遇到了一個問題。 我輸入的文件是收集的tweet信息的JSON文件。 我從設置默認編碼utf-8開始,但是在運行代碼時收到以下錯誤:
回溯(最近通話最后一個):文件“./mapperworks2.py”,線路211,在my_json_dict = json.loads(線)文件“/usr/lib/python2.6/json/ 初始化的.py”,線路307,在加載中返回_default_decoder.decode(s)文件“ /usr/lib/python2.6/json/decoder.py”,第319行,在解碼obj中,end = self.raw_decode(s,idx = _w(s,0) .end())raw_decode中的文件“ /usr/lib/python2.6/json/decoder.py”,行338引發ValueError(“無法解碼JSON對象”)ValueError:無法解碼JSON對象
該程序的代碼在哪里
#!/usr/bin/python
import sys
import json
import string
reload(sys)
sys.setdefaultencoding('utf8')
stop_words = ['a',
'about',
'above',
'after',
'again',
'against',
'all',
'am',
'an',
'and',
'any',
'are',
"aren't",
'as',
'at',
'be',
'because',
'been',
'before',
'being',
'below',
'between',
'both',
'but',
'by',
"can't",
'cannot',
'could',
"couldn't",
'did',
"didn't",
'do',
'does',
"doesn't",
'yourselves']
numbers = ["0","1","2","3","4","5","6","7","8","9"]
def clean_word(word):
for c in string.punctuation:
word = word.replace(c,"")
for c in numbers:
word = word.replace(c,"")
return word
def dont_stop(word):
if word in stop_words or word == "":
return False
else:
return True
# input comes from STDIN (standard input)
for line in sys.stdin:
############
############
############
############
my_json_dict = json.loads(line)
line = my_json_dict['text'].lower()
############
############
############
############
# remove leading and trailing whitespace
line = line.strip()
# split the line into words
words = line.split()
# increase counters
for word in words:
##################
##################
word = clean_word(word)
##################
##################
# write the results to STDOUT (standard output);
# what we output here will be the input for the
# Reduce step, i.e. the input for reducer.py
#
# tab-delimited; the trivial word count is 1
##################
##################
if dont_stop(word):
print '%s\t%s' % (word, 1)
當我不切換編碼時(也就是說,注釋掉reload(sys)和sys.setdefaultencoding(),我會遇到以下錯誤:
追溯(最近一次通話最近):文件“ ./mapperworks2.py”,行236,打印'%s \\ t%s'%(word,1)UnicodeEncodeError:'ascii'編解碼器無法編碼字符u'\\ u2026'位置> 3:序數不在范圍內(128)
不確定如何解決此問題,感謝您的幫助。
請參閱此處的討論: 在Python中管道輸出stdout時設置正確的編碼
您的錯誤是嘗試打印Unicode字符串以輸出。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.