[英]How to calculate the average of several .dat files using python?
所以我有50-60個.dat文件,所有文件都包含m行和n列數字。 我需要取所有文件的平均值,並以相同的格式創建一個新文件。 我必須在python中這樣做。 誰能幫我這個?
我寫了一些代碼..我知道這里有一些不兼容的類型,但我想不出另類,所以我還沒有改變任何東西。
#! /usr/bin/python
import os
CC = 1.96
average = []
total = []
count = 0
os.chdir("./")
for files in os.listdir("."):
if files.endswith(".dat"):
infile = open(files)
cur = []
cur = infile.readlines()
for i in xrange(0, len(cur)):
cur[i] = cur[i].split()
total += cur
count += 1
average = [x/count for x in total]
#calculate uncertainty
uncert = []
for files in os.listdir("."):
if files.endswith(".dat"):
infile = open(files)
cur = []
cur = infile.readlines
for i in xrange(0, len(cur)):
cur[i] = cur[i].split()
uncert += (cur - average)**2
uncert = uncert**.5
uncert = uncert*CC
這是一種相當時間和資源有效的方法,它讀取值並並行計算所有文件的平均值,但每次只讀取一行文件 - 但它會暫時讀取整個第一個.dat
文件進入內存以確定每個文件中將有多少行和每列數字。
你沒有說你的“數字”是整數還是浮點數或什么,所以這將它們作為浮點讀取(即使它們不存在也會起作用)。 無論如何,平均值被計算並輸出為浮點數。
更新
我已經修改了我的原始答案,還根據您的評論計算了每行和每列中值的總體標准差( sigma
)。 它在計算它們的平均值之后立即執行此操作,因此不需要再次讀取所有數據。 此外,為了響應注釋中的建議,添加了上下文管理器以確保關閉所有輸入文件。
請注意,標准偏差僅打印並且不會寫入輸出文件,但對相同或單獨的文件執行此操作應該很容易添加。
from contextlib import contextmanager
from itertools import izip
from glob import iglob
from math import sqrt
from sys import exit
@contextmanager
def multi_file_manager(files, mode='rt'):
files = [open(file, mode) for file in files]
yield files
for file in files:
file.close()
# generator function to read, convert, and yield each value from a text file
def read_values(file, datatype=float):
for line in file:
for value in (datatype(word) for word in line.split()):
yield value
# enumerate multiple egual length iterables simultaneously as (i, n0, n1, ...)
def multi_enumerate(*iterables, **kwds):
start = kwds.get('start', 0)
return ((n,)+t for n, t in enumerate(izip(*iterables), start))
DATA_FILE_PATTERN = 'data*.dat'
MIN_DATA_FILES = 2
with multi_file_manager(iglob(DATA_FILE_PATTERN)) as datfiles:
num_files = len(datfiles)
if num_files < MIN_DATA_FILES:
print('Less than {} .dat files were found to process, '
'terminating.'.format(MIN_DATA_FILES))
exit(1)
# determine number of rows and cols from first file
temp = [line.split() for line in datfiles[0]]
num_rows = len(temp)
num_cols = len(temp[0])
datfiles[0].seek(0) # rewind first file
del temp # no longer needed
print '{} .dat files found, each must have {} rows x {} cols\n'.format(
num_files, num_rows, num_cols)
means = []
std_devs = []
divisor = float(num_files-1) # Bessel's correction for sample standard dev
generators = [read_values(file) for file in datfiles]
for _ in xrange(num_rows): # main processing loop
for _ in xrange(num_cols):
# create a sequence of next cell values from each file
values = tuple(next(g) for g in generators)
mean = float(sum(values)) / num_files
means.append(mean)
means_diff_sq = ((value-mean)**2 for value in values)
std_dev = sqrt(sum(means_diff_sq) / divisor)
std_devs.append(std_dev)
print 'Average and (standard deviation) of values:'
with open('means.txt', 'wt') as averages:
for i, mean, std_dev in multi_enumerate(means, std_devs):
print '{:.2f} ({:.2f})'.format(mean, std_dev),
averages.write('{:.2f}'.format(mean)) # note std dev not written
if i % num_cols != num_cols-1: # not last column?
averages.write(' ') # delimiter between values on line
else:
print # newline
averages.write('\n')
我不確定該過程的哪個方面可以解決您的問題,但我將特別回答有關獲取所有dat文件的平均值的問題。
假設這樣的數據結構:
72 12 94 79 76 5 30 98 97 48
79 95 63 74 70 18 92 20 32 50
77 88 60 98 19 17 14 66 80 24
...
獲取文件的平均值:
import glob
import itertools
avgs = []
for datpath in glob.iglob("*.dat"):
with open(datpath, 'r') as f:
str_nums = itertools.chain.from_iterable(i.strip().split() for i in f)
nums = map(int, str_nums)
avg = sum(nums) / len(nums)
avgs.append(avg)
print avgs
它遍歷每個.dat
文件,讀取和連接行。 將它們轉換為int(如果需要可以浮動)並附加平均值。
如果這些文件非常龐大並且您在閱讀它們時會關注內存量,那么您可以更明確地遍歷每一行並且只保留計數器,就像您的原始示例所做的那樣:
for datpath in glob.iglob("*.dat"):
with open(datpath, 'r') as f:
count = 0
total = 0
for line in f:
nums = [int(i) for i in line.strip().split()]
count += len(nums)
total += sum(nums)
avgs.append(total / count)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.