简体   繁体   English

如何通过python计算属于字符串列表的每个字符串长度?

[英]How to calculate each string length that belongs to a list of strings by python?

Suppose I have a file with n DNA sequences, each one in a line. 假设我有一个包含n DNA序列的文件,每个序列一行。 I need to turn them into a list and then calculate each sequence's length and then total length of all of them together. 我需要将它们变成一个列表,然后计算每个序列的长度,然后计算所有序列的总长度。 I am not sure how to do that before they are into a list. 在将它们列入清单之前,我不确定该怎么做。

# open file and writing each sequences' length
f= open('seq.txt' , 'r')
for line in f:
    line= line.strip()
    print (line)
    print ('this is the length of the given sequence', len(line))

# turning into a list:  
lines = [line.strip() for line in open('seq.txt')]
print (lines)

How can I do math calculations from the list? 如何从列表中进行数学计算? Ex. 例如 the total length of all sequences together? 所有序列的总长度加在一起? Standard deviation from their different lengths etc. 不同长度等的标准偏差

Try this to output the individual length and calculate the total length: 尝试这样输出单个长度并计算总长度:

    lines = [line.strip() for line in open('seq.txt')]
    total = 0
    for line in lines:
       print 'this is the length of the given sequence: {}'.format(len(line))
       total += len(line)
    print 'this is the total length: {}'.format(total)

Look into the statistics module. 查看statistics模块。 You'll find all kinds of measures of averages and spreads. 您会发现各种平均值和点差的量度。

You'll get the length of any sequence using len . 您将使用len获得任何序列的长度。

In your case, you'll want to map the sequences to their lengths: 对于您的情况,您需要将序列映射为其长度:

from statistics import stdev

with open("seq.txt") as f:
    lengths = [len(line.strip()) for line in f]

print("Number of sequences:", len(lengths))
print("Standard deviation:", stdev(lengths))

edit: Because it was asked in the comments: Here's how to cluster the instances into different files depending on their lengths: 编辑:因为在注释中被询问:这是如何根据实例的长度将实例群集到不同的文件中:

from statistics import stdev, mean
with open("seq.txt") as f:
    sequences = [line.strip() for line in f]
lengths = [len(sequence) for sequence in sequences]

mean_ = mean(lengths)
stdev_ = stdev(lengths)

with open("below.txt", "w") as below, open("above.txt", "w") as above, open("normal.txt", "w") as normal:
    for sequence in sequences:
        if len(sequence) > mean+stdev_:
            above.write(sequence + "\n")
        elif mean+stdev_ > len(sequence > mean-stdev_: #inbetween
            normal.write(sequence + "\n")
        else:
            below.write(sequence + "\n")

The map and reduce functions can be useful to work on collections. 映射和归约功能对于处理集合很有用。

import operator

f= open('seq.txt' , 'r')
for line in f:
  line= line.strip()
  print (line)
  print ('this is the length of the given sequence', len(line))

# turning into a list:
lines = [line.strip() for line in open('seq.txt')]
print (lines)

print('The total length is 'reduce(operator.add,map(len,lines)))

Just a couple of remarks. 只是几句话。 Use with to handle files so you don't have to worry about closing them after you are done reading\\writing, flushing, etc. Also, since you are looping through the file once, why not create the list too? 使用with处理文件,这样您就不必担心在完成读,写,刷新等操作后关闭它们。而且,由于您一次循环浏览文件,为什么不创建列表呢? You don't need to go through it again. 您无需再次经历。

# open file and writing each sequences' length
with open('seq.txt', 'r') as f:
    sequences = []
    total_len = 0
    for line in f:
        new_seq = line.strip()
        sequences.append(new_seq)
        new_seq_len = len(new_seq)
        total_len += new_seq_len

print('number of sequences: {}'.format(len(sequences)))
print('total lenght: {}'.format(total_len))
print('biggest sequence: {}'.format(max(sequences, key=lambda x: len(x))))
print('\t with length {}'.format(len(sorted(sequences, key=lambda x: len(x))[-1])))
print('smallest sequence: {}'.format(min(sequences, key=lambda x: len(x))))
print('\t with length {}'.format(len(sorted(sequences, key=lambda x: len(x))[0])))

I have included some post-processing info to give you an idea of how to go about it. 我提供了一些后期处理信息,以使您了解如何进行处理。 If you have any questions just ask. 如果你有问题,就问吧。

You have already seen how to get the list of sequences and a list of the lengths using append. 您已经了解了如何使用append获取序列列表和长度列表。

    lines = [line.strip() for line in open('seq.txt')]
    total = 0
    sizes = []
    for line in lines:
       mysize = len(line)
       total += mysize
       sizes.append(mysize)

Note that you can also use a for loop to read each line and append to the two lists rather than read every line into lists and then loop through lists. 请注意,您也可以使用for循环读取每一行并追加到两个列表中,而不是将每一行读入列表然后循环遍历列表。 It is a matter of which you would prefer. 这是您希望的问题。

You can use the statistics library (as of Python 3.4) for the statistics on the list of lengths. 您可以使用统计信息库(从Python 3.4开始)来获取长度列表上的统计信息。

statistics — Mathematical statistics functions 统计—数学统计功能

mean() Arithmetic mean (“average”) of data. mean()数据的算术平均值(“平均值”)。 median() Median (middle value) of data. 中位数()数据的中位数(中间值)。 median_low() Low median of data. mean_low()低数据中位数。
median_high() High median of data. mean_high()数据的高中位数。 median_grouped() Median, or 50th percentile, of grouped data. mean_grouped()分组数据的中位数,即第50个百分点。 mode() Mode (most common value) of discrete data. mode()离散数据的模式(最常用的值)。 pstdev() Population standard deviation of data. pstdev()数据的总体标准偏差。
pvariance() Population variance of data. pvariance()数据的总体方差。 stdev() Sample standard deviation of data. stdev()数据的样本标准偏差。 variance() Sample variance of data. variant()数据的样本方差。

You can also use the answers at Standard deviation of a list 您还可以使用列表的标准偏差下的答案

Note that there is an answer that actually shows the code that was added to Python 3.4 for the statistics module. 请注意,有一个答案实际上显示了为统计模块添加到Python 3.4的代码。 If you have an older version, you can use that code or get the statistics module code for your own system. 如果您使用的是旧版本,则可以使用该代码或获取自己系统的统计信息模块代码。

This will do what you require. 这将满足您的要求。 To do additional calculations you may want to save your results from the text file into a list or set so you won't need to read from a file again. 要进行其他计算,您可能需要将结果从文本文件保存到列表或设置中,这样就无需再次读取文件。

total_length = 0  # Create a variable that will save our total length of lines read

with open('filename.txt', 'r') as f:
    for line in f:
        line = line.strip()
        total_length += len(line)  # Add the length to our total
        print("Line Length: {}".format(len(line)))

print("Total Length: {}".format(total_length))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在python中计算字符串的长度 - How to calculate the length of the string in python 字符串列表python中字符串的长度 - Length of string within a list of strings python 我有一个 3d 字符串和数字列表。 我需要计算每个字符串的总和,然后计算 python 中的百分比 - I have a 3d list of strings and numbers. I need to calculate the sum of each string and then percent in python 如何打印 python 字符串列表的制表符分隔值,指定最大行长度并避免字符串中断 - how to print tab separated values of a python list of strings specifying a maximum line length and avoiding string break python - 如何从不同长度的字符串列表中找到最接近的字符串匹配? - How to find closest match of a string from a list of different length strings python? 如何将字符串连接到字符串列表的每个元素? - How to concatenate a string to each element of a list of strings? 如何将字符串插入字符串列表的每个标记? - How to insert string to each token of a list of strings? 如何对字符串列表中的每个字符串进行排序? - How to sort each individual string in a list of strings? 如何在python中获取字符串列表并将每个字符串乘以2? - How to take a list of strings and multiply each by 2 in python? Python列表长度为字符串 - Python List length as a string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM