跳过行后在Python中打开.txt文件-编码问题

Question

I am trying to open a .txt file in Python. 我正在尝试在Python中打开.txt文件。

Before flagging this of as repeat, please do take a look at the code and the file below. 在将此标记为重复之前，请先查看下面的代码和文件。

I have used this snippet to read similar files before, however this particular batch of files does not work. 我之前曾使用此代码段读取类似的文件，但是这批特定的文件不起作用。

location="sample/sample2/"
filename=location+"Detector_-3000um.txt"
skip=25 #Skip the first 25 lines

The code to open it is - 打开它的代码是-

f=open(filename)
num_lines = sum(1 for line in f)
print "Skipping the first "+str(skip)+" lines"
data=np.zeros((num_lines-skip+1,num_lines-skip+1))
f.close()
f=open(filename)
i=0
for _ in range(skip):  #skip unwanted rows
    next(f)
for line in f:
    data[i,:]=line.split()
    i+=1
f.close()

Its a 501x501 data set with the first row and column being the row and column numbers resp. 它是一个501x501数据集，其中第一行和第一列分别是行号和列号。

The data set is attached here . 数据集附在此处。

I also tried using panda - pd.read_csv(filename,skiprows) however it gives this error - 我也尝试使用熊猫-pd.read_csv（filename，skiprows）但是它给出了这个错误-

CParserError: Error tokenizing data. C error: Expected 1 fields in line 49, saw 501

Answer 1

I think, there is nothing wrong with your code, the problem is the file encoding. 我认为您的代码没有错，问题在于文件编码。

I converted your file encoding to 'utf-8', then both your code and read_csv() from pandas work properly. 我将您的文件编码转换为“ utf-8”，然后您的代码和来自pandas的read_csv（）都可以正常工作。

pd.read_csv(myfile, skiprows=24, header=0, index_col=0,sep='\t')

There are many ways to convert the encoding, for example use notepad++(windows), the way I did or please see here: How to convert a file to utf-8 in Python? 有很多转换编码的方法，例如使用notepad ++（windows），我所做的方法或请参阅此处：如何在Python中将文件转换为utf-8？

跳过行后在Python中打开.txt文件-编码问题

问题描述

1 个解决方案

解决方案1
1 2017-03-30 23:34:39

跳过行后在Python中打开.txt文件-编码问题

问题描述

1 个解决方案

解决方案1 1 2017-03-30 23:34:39

解决方案1
1 2017-03-30 23:34:39