![](/img/trans.png)
[英]UnicodeDecodeError: 'charmap' codec can't decode byte 0x83 in position 7458: character maps to <undefined>
[英]UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 7240: character maps to <undefined>
我是學生在做碩士論文。 作為論文的一部分,我正在使用python 。 我正在讀取.csv
格式的日志文件,並以.csv
良好的方式將提取的數據寫入另一個.csv
文件。 但是,當讀取文件時,我收到此錯誤:
回溯(最近一次調用最后一次):文件“C:\\ Users \\ SGADI \\ workspace \\ DAB_Trace \\ my_code \\ trace_parcer.py”,第19行,讀取行中的行:
- 文件“C:\\ Users \\ SGADI \\ Desktop \\ Python-32bit-3.4.3.2 \\ python-3.4.3 \\ lib \\ encodings \\ cp1252.py”,第23行,解碼返回
codecs.charmap_decode(input,self.errors,decoding_table)[0]
- UnicodeDecodeError:'charmap'編解碼器無法解碼7240位的字節0x8d:字符映射到
<undefined>
import csv
import re
#import matplotlib
#import matplotlib.pyplot as plt
import datetime
#import pandas
#from dateutil.parser import parse
#def parse_csv_file():
timestamp = datetime.datetime.strptime('00:00:00.000', '%H:%M:%S.%f')
timestamp_list = []
snr_list = []
freq_list = []
rssi_list = []
dab_present_list = []
counter = 0
f = open("output.txt","w")
with open('test_log_20150325_gps.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=';')
for row in reader:
#timestamp = datetime.datetime.strptime(row[0], '%M:%S.%f')
#timestamp.split(" ",1)
timestamp = row[0]
timestamp_list.append(timestamp)
#timestamp = row[0]
details = row[-1]
counter += 1
print (counter)
#if(counter > 25000):
# break
#timestamp = datetime.datetime.strptime(row[0], '%M:%S.%f')
#timestamp_list.append(float(timestamp))
#search for SNRLevel=\d+
snr = re.findall('SNRLevel=(\d+)', details)
if snr == []:
snr = 0
else:
snr = snr[0]
snr_list.append(int(snr))
#search for Frequency=09ABC
freq = re.findall('Frequency=([0-9a-fA-F]+)', details)
if freq == []:
freq = 0
else:
freq = int(freq[0], 16)
freq_list.append(int(freq))
#search for RSSI=\d+
rssi = re.findall('RSSI=(\d+)', details)
if rssi == []:
rssi = 0
else:
rssi = rssi[0]
rssi_list.append(int(rssi))
#search for DABSignalPresent=\d+
dab_present = re.findall('DABSignalPresent=(\d+)', details)
if dab_present== []:
dab_present = 0
else:
dab_present = dab_present[0]
dab_present_list.append(int(dab_present))
f.write(str(timestamp) + "\t")
f.write(str(freq) + "\t")
f.write(str(snr) + "\t")
f.write(str(rssi) + "\t")
f.write(str(dab_present) + "\n")
print (timestamp, freq, snr, rssi, dab_present)
#print (index+1)
#print(timestamp,freq,snr)
#print (counter)
#print(timestamp_list,freq_list,snr_list,rssi_list)
'''if snr != []:
if freq != []:
timestamp_list.append(timestamp)
snr_list.append(snr)
freq_list.append(freq)
f.write(str(timestamp_list) + "\t")
f.write(str(freq_list) + "\t")
f.write(str(snr_list) + "\n")
print(timestamp_list,freq_list,snr_list)'''
f.close()
我搜索了這個特殊的角色,但沒有找到任何特征。 我搜索了互聯網,建議更改格式:我嘗試了ut8,latin1和其他一些格式,但我仍然收到此錯誤。 你能幫我解決一下pandas
問題嗎? 我也試過pandas
但我仍然得到錯誤。 我甚至刪除了日志文件中的一行,但錯誤發生在下一行。
請幫我找一個解決方案,謝謝。
我已經解決了這個問題。 我們可以使用這段代碼
import codecs
types_of_encoding = ["utf8", "cp1252"]
for encoding_type in types_of_encoding:
with codecs.open(filename, encoding = encoding_type, errors ='replace') as csvfile:
your code
....
....
with open('input.tsv','rb') as f:
for ln in f:
decoded=False
line=''
for cp in ('cp1252', 'cp850','utf-8','utf8'):
try:
line = ln.decode(cp)
decoded=True
break
except UnicodeDecodeError:
pass
if decoded:
# use 'line'
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.