[英]Python: Read and write the file of complex and reapeating format
To begin with, sorry for poor Engish. 首先,对英语不好表示抱歉。 I have a file with repeating format.
我有一个重复格式的文件。 Such as
如
326 Iteration: 0 #Bonds: 10
1 6 7 14 54 70 77 0 0 0 0 0 1 0.693 0.632 0.847 0.750 0.644 0.000 0.000 0.000 0.000 0.000 3.566 0.000 0.028
2 6 3 6 15 55 0 0 0 0 0 0 1 0.925 0.920 0.909 0.892 0.000 0.000 0.000 0.000 0.000 0.000 3.645 0.000 -0.040
3 6 2 8 10 52 0 0 0 0 0 0 1 0.925 0.910 0.920 0.898 0.000 0.000 0.000 0.000 0.000 0.000 3.653 0.000 0.000
...
324 8 323 0 0 0 0 0 0 0 0 0 100 0.871 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.871 3.000 -0.493
325 2 326 0 0 0 0 0 0 0 0 0 101 0.930 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.930 0.000 0.334
326 8 325 0 0 0 0 0 0 0 0 0 101 0.930 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.930 3.000 -0.611
637.916060425841 306.094529423257 1250.10511927236
6.782126993565285E-006
326 (repeating from here) Iteration: 100 #Bonds: 10
1 6 7 14 54 64 70 77 0 0 0 0 1 0.885 0.580 0.819 0.335 0.784 0.709 0.000 0.000 0.000 0.000 4.111 0.000 0.025
2 6 3 6 15 55 0 0 0 0 0 0 1 0.812 0.992 0.869 0.966 0.000 0.000 0.000 0.000 0.000 0.000 3.639 0.000 -0.034
3 6 2 8 10 52 0 0 0 0 0 0 1 0.812 0.966 0.989 0.926 0.000 0.000 0.000 0.000 0.000 0.000 3.692 0.000 0.004
I want to analyze the data, 3th~12th column of each 2nd~327th line of all repeating "frames", printing number of 0s and number of non-0s data from of target matrix of each frames. 我想分析数据,所有重复的“帧”的第二行至第327行的第3列至第12列,从每个帧的目标矩阵中打印0s和非0s数据的数量。 Also print some 1st, 2nd and 13th column as well.
还要打印第一,第二和第十三列。 So the expected output file become like
因此,预期的输出文件变为
326 1 1 6 5 5 1 2 6 4 6 1 ... 325 2 1 9 101 326 8 1 9 101 326 (Next frame starts from here) 2 1 6 5 5 1 2 6 4 6 1 ... 326 3 1 6 5 5 1 2 6 4 6 1 ...
So, the result file have 2 header line, and analyzed data of 326 lines, total 328 line per each frame. 因此,结果文件具有2个标题行,并分析了326行的数据,每帧总共328行。 Same format repeats for next frame too.
下一个帧也重复相同的格式。 Using that format of result data (5 spaces each) is recommended to use the file for other purpose.
建议使用该格式的结果数据(每个5个空格)将文件用于其他目的。
The way I'm using is, Creating 13 arrays for 13 columns -> store data using double for loops for each frame, and each 328 lines. 我使用的方式是为13列创建13个数组->使用double for循环为每帧和每328行存储数据。 But I have no idea how can I deal with output.
但是我不知道如何处理输出。
Following is the my trial code (unfinished, only for read the input), but this code have a lot of problems. 以下是我的试用代码(未完成,仅用于读取输入),但是此代码有很多问题。 Linecache reads whole line, not the first number of every first line.
Linecache读取整行,而不是每第一行的第一个数字。 Every frame have 326+3=329 lines, but it seems like my code is not properly working for frame-wise workings.
每帧有326 + 3 = 329行,但是看来我的代码无法正确地用于逐帧工作。 I welcomes any help and assist to analyze this data.
我欢迎任何帮助和协助来分析这些数据。 Thank you very much in advance.
提前非常感谢您。
# Read the file
filename = raw_input("Enter the file name \n")
file = open(filename, 'r')
# Read the number of atom from header
import linecache
nnn = linecache.getline(filename, 1)
natoms = int(nnn)
singleframe = natoms + 3
# get number of frames
nlines = 0
for i1 in file:
nlines = nlines +1
file.close()
nframes = nlines / singleframe
print 'no of lines are: ', nlines
print 'no of frames are: ', nframes
print 'no of atoms are:', natoms
# Create 1d string array
nrange = range(nlines)
data_lines = [None]*(nlines)
# Store whole input file into string array
file = open(filename, 'r')
i1=0
for i1 in nrange:
data_lines[i1] = file.readline()
file.close()
# Create 1d array to store atomic data
at_index = [None]*natoms
at_type = [None]*natoms
n1 = [None]*natoms
n2 = [None]*natoms
n3 = [None]*natoms
n4 = [None]*natoms
n5 = [None]*natoms
n6 = [None]*natoms
n7 = [None]*natoms
n8 = [None]*natoms
n9 = [None]*natoms
n10 = [None]*natoms
molnr = [None]*natoms
nrange1= range(natoms)
nframe = range(nframes)
file = open('output_force','w')
print data_lines[9]
for j1 in nframe:
start = j1*(natoms + 3) + 3
for i1 in nrange1:
line = data_lines[i1+start].split() #Split each line based on spaces
at_index[i1] = int(line[0])
at_type[i1] = int(line[1])
n1[i1]= int(line[2])
n2[i1]= int(line[3])
n3[i1]= int(line[4])
n4[i1]= int(line[5])
n5[i1]= int(line[6])
n6[i1]= int(line[7])
n7[i1]= int(line[8])
n8[i1]= int(line[9])
n9[i1]= int(line[10])
n10[i1]= int(line[11])
molnr[i1]= int(line[12])
When you are working with csv files, you should look into the csv module . 使用csv文件时,应查看csv模块 。 I wrote a code that are should do the trick.
我写了一个应该可以解决问题的代码。
This code assumes "good data". 该代码假定“良好数据”。 If your data set may contain errors (such as less columns than 13, or less data rows than 326) some alterations should be done.
如果您的数据集可能包含错误(例如,列少于13个,或数据行少于326个),则应进行一些更改。
(changed to comply with Python 2.6.6) (已更改为符合Python 2.6.6)
import csv
with open('mydata.csv') as in_file:
with open('outfile.csv', 'wb') as out_file:
csv_reader = csv.reader(in_file, delimiter=' ', skipinitialspace=True)
csv_writer = csv.writer(out_file, delimiter = '\t')
# Iterate over all rows in the file
for i, header in enumerate(csv_reader):
# Get the header data
num = header[0]
csv_writer.writerow([num])
# Write frame number, starting with 1 (hence the +1 part)
csv_writer.writerow([i+1])
# Iterate over all data rows
for _ in xrange(326):
# Call next(csv_reader) to get the next row
# Put inside a try ... except to avoid StopIteration exception
# if end of file is found before reaching 326 lines
try:
row = next(csv_reader)
except StopIteration:
break
# Use list comprehension to extract number of zeros
zeros = sum([1 for x in row[2:12] if x.strip() == '0'])
not_zeros = 10 - zeros
# Write the data to output file
out = [row[0].strip(), row[1].strip(),not_zeros, zeros, row[12].strip()]
csv_writer.writerow(out)
# If the
else:
# Skip the last two lines of the file
next(csv_reader)
next(csv_reader)
For the first three lines, this yields: 对于前三行,将得出:
326
1
1 6 5 5 1
2 6 4 6 1
3 6 4 6 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.