简体   繁体   English

跳过行后在Python中打开.txt文件-编码问题

[英]Open .txt file in Python after skipping lines - Encoding issue

I am trying to open a .txt file in Python. 我正在尝试在Python中打开.txt文件。

Before flagging this of as repeat, please do take a look at the code and the file below. 在将此标记为重复之前,请先查看下面的代码和文件。

I have used this snippet to read similar files before, however this particular batch of files does not work. 我之前曾使用此代码段读取类似的文件,但是这批特定的文件不起作用。

location="sample/sample2/"
filename=location+"Detector_-3000um.txt"
skip=25 #Skip the first 25 lines

The code to open it is - 打开它的代码是-

f=open(filename)
num_lines = sum(1 for line in f)
print "Skipping the first "+str(skip)+" lines"
data=np.zeros((num_lines-skip+1,num_lines-skip+1))
f.close()
f=open(filename)
i=0
for _ in range(skip):  #skip unwanted rows
    next(f)
for line in f:
    data[i,:]=line.split()
    i+=1
f.close()

Its a 501x501 data set with the first row and column being the row and column numbers resp. 它是一个501x501数据集,其中第一行和第一列分别是行号和列号。

The data set is attached here . 数据集附在此处

I also tried using panda - pd.read_csv(filename,skiprows) however it gives this error - 我也尝试使用熊猫-pd.read_csv(filename,skiprows)但是它给出了这个错误-

CParserError: Error tokenizing data. C error: Expected 1 fields in line 49, saw 501

I think, there is nothing wrong with your code, the problem is the file encoding. 我认为您的代码没有错,问题在于文件编码。

I converted your file encoding to 'utf-8', then both your code and read_csv() from pandas work properly. 我将您的文件编码转换为“ utf-8”,然后您的代码和来自pandas的read_csv()都可以正常工作。

pd.read_csv(myfile, skiprows=24, header=0, index_col=0,sep='\t')

There are many ways to convert the encoding, for example use notepad++(windows), the way I did or please see here: How to convert a file to utf-8 in Python? 有很多转换编码的方法,例如使用notepad ++(windows),我所做的方法或请参阅此处: 如何在Python中将文件转换为utf-8?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM