[英]Take column average of one file, repeat for all .txt files, write all averages to one file in python
I have approximately 40 *.txt files, which all contain two columns of data - they have been converted to str format from a previous script and are space delimited. 我大约有40个* .txt文件,它们都包含两列数据-它们已从以前的脚本转换为str格式,并以空格分隔。 I would like to take the average of the second column for each .txt file, and then put all the averages into one output file.
我想获取每个.txt文件第二列的平均值,然后将所有平均值放入一个输出文件中。 They also need to be read in numerical order, eg file1.txt, file2.txt.
还需要按数字顺序读取它们,例如file1.txt,file2.txt。
I have the following scripts at the moment a) to read in the last line of all .txt files and b) to take the average of one file, but when I try to combine them I either get an error saying that it cannot convert the string to a float or that the list index is out of range. 目前,我有以下脚本:a)读取所有.txt文件的最后一行,b)取一个文件的平均值,但是当我尝试将它们组合时,我得到一个错误,提示它无法转换字符串到浮点数或列表索引超出范围。 I've also tried to do the line.strip() method to confirm that there are no blank lines in the .txt file to sort out the latter problem, to no avail.
我也尝试过使用line.strip()方法来确认.txt文件中没有空行来解决后一个问题,但无济于事。
a) code that reads in the last line of all .txt files: a)读取所有.txt文件最后一行的代码:
import sys
import os
import re
import glob
numbers = re.compile(r'(\d+)')
def numericalSort(value):
parts = numbers.split(value)
parts[1::2] = map(int, parts[1::2])
return parts
list_of_files = glob.glob("./*.txt")
for file in sorted(list_of_files, key=numericalSort):
infiles = open(file, "r")
outfile = open("t-range","a")
contents = [""]
columns = []
counter = 0
for line in infiles:
counter += 1
contents.append(line)
for line in contents:
if line.startswith("20000"):
columns.append(float(line.split()[1]))
print columns
counter1 = 0
for line in columns:
counter1 += 1
outfile.write(','.join(map(str,columns))+"\n")
infiles.close()
outfile.close()
b) script that takes average value of one file: b)取一个文件平均值的脚本:
data = open("file.txt","r").read().split()
s = sum ([float(i) for i in data])
average = s / len(data)
c) combined script c)组合脚本
import sys
import os
import re
import glob
numbers = re.compile(r'(\d+)')
def numericalSort(value):
parts = numbers.split(value)
parts[1::2] = map(int, parts[1::2])
return parts
list_of_files = glob.glob("./totalenergies*")
for file in sorted(list_of_files, key=numericalSort):
infiles = open(file, "r")
outfile = open("t-range","a")
contents = [""]
columns = []
counter = 0
for line in infiles:
counter += 1
contents.append(float(line.split()[1]))
contents = ([float(i) for i in contents])
s = sum(contents)
average = s / len(contents)
columns.append(average)
counter1 = 0
for line in columns:
counter1 += 1
outfile.write("\n".join(map(str,columns)) + "\n")
infiles.close()
outfile.close()
This last part give an error of could not convert string to float - the traceback shows there is a problem with contents = ([float(i) for i in contents]) 这最后一部分给出了无法将字符串转换为浮点的错误-追溯显示内容=([内容中的i的float(i)])
This is the line that is giving you the error: 这是给您错误的行:
outfile.write("".join(map(str,columns)+"\n"))
When you read the Traceback error, the last part will usually show you the line number in your script that generated the issue, so you should check that first. 当您阅读回溯错误时,最后一部分通常会向您显示生成问题的脚本中的行号,因此您应该首先检查该行号。
The way the line currently reads, +"\\n"
is part of the join
function, when it should be part of the write
method: 当前行的读取方式,
+"\\n"
是join
函数的一部分,当它应该是write
方法的一部分时:
outfile.write("".join(map(str,columns)) + "\n")
If your intention was to write each average from columns
on a new line, and also insert a new line at the end of the list, you need this: 如果您打算从新行的各
columns
写入每个平均值,并且还在列表的末尾插入新行,则需要这样做:
outfile.write("\n".join(map(str,columns)) + "\n")
More: http://docs.python.org/2/library/stdtypes.html#str.join 更多: http : //docs.python.org/2/library/stdtypes.html#str.join
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.