[英]How to calculate each string length that belongs to a list of strings by python?
Suppose I have a file with n
DNA sequences, each one in a line. 假设我有一个包含
n
DNA序列的文件,每个序列一行。 I need to turn them into a list and then calculate each sequence's length and then total length of all of them together. 我需要将它们变成一个列表,然后计算每个序列的长度,然后计算所有序列的总长度。 I am not sure how to do that before they are into a list.
在将它们列入清单之前,我不确定该怎么做。
# open file and writing each sequences' length
f= open('seq.txt' , 'r')
for line in f:
line= line.strip()
print (line)
print ('this is the length of the given sequence', len(line))
# turning into a list:
lines = [line.strip() for line in open('seq.txt')]
print (lines)
How can I do math calculations from the list? 如何从列表中进行数学计算? Ex.
例如 the total length of all sequences together?
所有序列的总长度加在一起? Standard deviation from their different lengths etc.
不同长度等的标准偏差
Try this to output the individual length and calculate the total length: 尝试这样输出单个长度并计算总长度:
lines = [line.strip() for line in open('seq.txt')]
total = 0
for line in lines:
print 'this is the length of the given sequence: {}'.format(len(line))
total += len(line)
print 'this is the total length: {}'.format(total)
Look into the statistics
module. 查看
statistics
模块。 You'll find all kinds of measures of averages and spreads. 您会发现各种平均值和点差的量度。
You'll get the length of any sequence using len
. 您将使用
len
获得任何序列的长度。
In your case, you'll want to map the sequences to their lengths: 对于您的情况,您需要将序列映射为其长度:
from statistics import stdev
with open("seq.txt") as f:
lengths = [len(line.strip()) for line in f]
print("Number of sequences:", len(lengths))
print("Standard deviation:", stdev(lengths))
edit: Because it was asked in the comments: Here's how to cluster the instances into different files depending on their lengths: 编辑:因为在注释中被询问:这是如何根据实例的长度将实例群集到不同的文件中:
from statistics import stdev, mean
with open("seq.txt") as f:
sequences = [line.strip() for line in f]
lengths = [len(sequence) for sequence in sequences]
mean_ = mean(lengths)
stdev_ = stdev(lengths)
with open("below.txt", "w") as below, open("above.txt", "w") as above, open("normal.txt", "w") as normal:
for sequence in sequences:
if len(sequence) > mean+stdev_:
above.write(sequence + "\n")
elif mean+stdev_ > len(sequence > mean-stdev_: #inbetween
normal.write(sequence + "\n")
else:
below.write(sequence + "\n")
The map and reduce functions can be useful to work on collections. 映射和归约功能对于处理集合很有用。
import operator
f= open('seq.txt' , 'r')
for line in f:
line= line.strip()
print (line)
print ('this is the length of the given sequence', len(line))
# turning into a list:
lines = [line.strip() for line in open('seq.txt')]
print (lines)
print('The total length is 'reduce(operator.add,map(len,lines)))
Just a couple of remarks. 只是几句话。 Use
with
to handle files so you don't have to worry about closing them after you are done reading\\writing, flushing, etc. Also, since you are looping through the file once, why not create the list too? 使用
with
处理文件,这样您就不必担心在完成读,写,刷新等操作后关闭它们。而且,由于您一次循环浏览文件,为什么不创建列表呢? You don't need to go through it again. 您无需再次经历。
# open file and writing each sequences' length
with open('seq.txt', 'r') as f:
sequences = []
total_len = 0
for line in f:
new_seq = line.strip()
sequences.append(new_seq)
new_seq_len = len(new_seq)
total_len += new_seq_len
print('number of sequences: {}'.format(len(sequences)))
print('total lenght: {}'.format(total_len))
print('biggest sequence: {}'.format(max(sequences, key=lambda x: len(x))))
print('\t with length {}'.format(len(sorted(sequences, key=lambda x: len(x))[-1])))
print('smallest sequence: {}'.format(min(sequences, key=lambda x: len(x))))
print('\t with length {}'.format(len(sorted(sequences, key=lambda x: len(x))[0])))
I have included some post-processing info to give you an idea of how to go about it. 我提供了一些后期处理信息,以使您了解如何进行处理。 If you have any questions just ask.
如果你有问题,就问吧。
You have already seen how to get the list of sequences and a list of the lengths using append. 您已经了解了如何使用append获取序列列表和长度列表。
lines = [line.strip() for line in open('seq.txt')]
total = 0
sizes = []
for line in lines:
mysize = len(line)
total += mysize
sizes.append(mysize)
Note that you can also use a for loop to read each line and append to the two lists rather than read every line into lists and then loop through lists. 请注意,您也可以使用for循环读取每一行并追加到两个列表中,而不是将每一行读入列表然后循环遍历列表。 It is a matter of which you would prefer.
这是您希望的问题。
You can use the statistics library (as of Python 3.4) for the statistics on the list of lengths. 您可以使用统计信息库(从Python 3.4开始)来获取长度列表上的统计信息。
statistics — Mathematical statistics functions 统计—数学统计功能
mean() Arithmetic mean (“average”) of data.
mean()数据的算术平均值(“平均值”)。 median() Median (middle value) of data.
中位数()数据的中位数(中间值)。 median_low() Low median of data.
mean_low()低数据中位数。
median_high() High median of data.mean_high()数据的高中位数。 median_grouped() Median, or 50th percentile, of grouped data.
mean_grouped()分组数据的中位数,即第50个百分点。 mode() Mode (most common value) of discrete data.
mode()离散数据的模式(最常用的值)。 pstdev() Population standard deviation of data.
pstdev()数据的总体标准偏差。
pvariance() Population variance of data.pvariance()数据的总体方差。 stdev() Sample standard deviation of data.
stdev()数据的样本标准偏差。 variance() Sample variance of data.
variant()数据的样本方差。
You can also use the answers at Standard deviation of a list 您还可以使用列表的标准偏差下的答案
Note that there is an answer that actually shows the code that was added to Python 3.4 for the statistics module. 请注意,有一个答案实际上显示了为统计模块添加到Python 3.4的代码。 If you have an older version, you can use that code or get the statistics module code for your own system.
如果您使用的是旧版本,则可以使用该代码或获取自己系统的统计信息模块代码。
This will do what you require. 这将满足您的要求。 To do additional calculations you may want to save your results from the text file into a list or set so you won't need to read from a file again.
要进行其他计算,您可能需要将结果从文本文件保存到列表或设置中,这样就无需再次读取文件。
total_length = 0 # Create a variable that will save our total length of lines read
with open('filename.txt', 'r') as f:
for line in f:
line = line.strip()
total_length += len(line) # Add the length to our total
print("Line Length: {}".format(len(line)))
print("Total Length: {}".format(total_length))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.