如何有条件地替换子列表中的项目。

Question

I am attempting to generate a list-within-a-list. 我正在尝试生成列表中的列表。 I am cycling through a file to update the list if one sublist element is greater. 如果一个子列表元素更大，我正在循环浏览文件以更新列表。 I have written this code: 我写了这段代码：

targets = open(file)

longest_UTR = []

for line in targets:

    chromosome, locus, mir, gene, transcript, UTR_length = line.strip("\n").split("\t")

    length_as_integer = int(UTR_length)

    if not any(x[:3] == [locus, mir, gene] for x in longest_UTR):

        longest_UTR.append([locus, mir, gene, transcript, length_as_integer])

    elif length_as_integer > [int(x[4]) for x in longest_UTR]: ##x[4] = previous length_as_integer

        longest_UTR.append([locus, mir, gene, transcript, length_as_integer])

print (longest_UTR)

However, I get this error: 但是，我收到此错误：

elif len_as_int > (int(x[4]) for x in longest_UTR):

TypeError: '>' not supported between instances of 'int' and 'generator'

How can I convert x[4] to an integer so as to compare to length_as_integer ? 如何将x[4]转换为整数以便与length_as_integer比较？

Thank you 谢谢

Answer 1

If I get this right, try replacing the elif line with the following: 如果我正确，请尝试将elif行替换为以下内容：

else:
    longest_UTR = [[locus, mir, gene, transcript, length_as_integer] for x in longest_UTR if x[:3] == [locus, mir, gene] and length_as_integer > int(x[4]) else x]:

You pass through all your list updating the ones matching the condition and doing nothing if it doesn't match. 您遍历所有列表，更新匹配条件的列表，如果不匹配则不执行任何操作。

Answer 2

So, there's been a bit of back and forth regarding your requirements, but my final understanding is this: You are looping over a data set. 因此，关于您的需求有一些来回的回想，但是我的最终理解是：您正在遍历数据集。 Each target in this data set has a locus , mri , and gene as well as a UTR_length attribute. 此数据集中的每个target都有一个locus ， mri和gene以及一个UTR_length属性。 For every unique combination of locus , mri , and gene you are trying to find all targets that have the maximum UTR_Length ? 对于locus ， mri和gene每个独特组合，您都试图找到具有最大UTR_Length所有targets ？

Given that you are wanting to find the maximum value in the dataset there are two approaches. 鉴于您要在数据集中找到最大值，有两种方法。
1) You could simply convert your input file to a pandas dataframe, group by you locus , mri and gene values, and return all values with max( UTR_Length ). 1）您可以简单地将输入文件转换为pandas数据框，按locus ， mri和gene值分组，并返回所有带有max（ UTR_Length ）的值。 From ease of implementation this is probably your best bet. 从易于实现的角度来看，这可能是您最好的选择。 However, pandas is not always the right tool, and carries a lot of overhead, especially if you want to Dockerise your project. 但是，pandas并不总是正确的工具，并且会带来很多开销，特别是如果您要对项目进行Dockerise。

2) If you want to use base python packages, I would recommend taking advantage of sets and dictionaries: 2）如果您想使用基本的python包，我建议您利用集合和字典：

targets = open(file)
list_of_targets = []    
for line in targets:

          chromosome, locus, mir, gene, transcript, UTR_length = line.strip("\n").split("\t")
          length_as_integer = int(UTR_length)

          list_of_targets.append((chromosome, locus, mir, gene, transcript, UTR_length))

# Generate Set of unqiue locus, mri, gene (lmg) combinations
set_of_locus_mri_gene = {(i[1], i[2], i[3]) for i in list_of_targets}

# Generate dictionary of maximum lengths for each distinct lmg combo
dict_of_max_lengths = {lmg: max([targets[5] for targets in list_of_targets if 
                                    (targets[1], targets[2], targets[3]) == lmg]) for 
                                    lmg in set_of_locus_mri_gene}

# Generate dictionary with lmg keys and all targets with corresponding max length
final_output = {lmg: [target for target in list_of_targets if target[5] == max_length] for
                        lmg, max_length in dict_of_max_lengths.items()}

Answer 3

Since you want to replace the longest_UTR variable and keep things nicely named you could use a dictionary instead of a list: 由于要替换longest_UTR变量并使名称保持良好的名称，可以使用字典而不是列表：

targets = open(file)
longest_UTR = {}

for line in targets: 
    chromosome, locus, mir, gene, transcript, UTR_length = line.strip("\n").split("\t")    
    length_as_integer = int(UTR_length)

    # Your condition works for initializing the dictionary because of the default value.
    if length_as_integer > longest_UTR.get("Length", -1):
        longest_UTR["Chromosome"] = chromosome
        longest_UTR["Locus"] = locus
        longest_UTR["Mir"] = mir
        longest_UTR["Gene"] = gene
        longest_UTR["Transcript"] = transcript
        longest_UTR["Length"] = length_as_integer

print (longest_UTR)

Edit: here is also the version of the code using a list, just in case you are interested to see the difference. 编辑：这也是使用列表的代码版本，以防万一您有兴趣看到不同之处。 Personally I find the dictionary approch cleaner to read. 我个人觉得字典更容易阅读。

targets = open(file)
longest_UTR = [None, None, None, None, None, -1]

for line in targets: 
    chromosome, locus, mir, gene, transcript, UTR_length = line.strip("\n").split("\t")    
    length_as_integer = int(UTR_length)

    # Your condition works for initializing the list because of the default value.
    if length_as_integer > longest_UTR[5]:
        longest_UTR[0] = chromosome
        longest_UTR[1] = locus
        longest_UTR[2] = mir
        longest_UTR[3] = gene
        longest_UTR[4] = transcript
        longest_UTR[5] = length_as_integer

print (longest_UTR)

如何有条件地替换子列表中的项目。

问题描述

3 个解决方案

解决方案1
0 2018-11-01 14:44:54

解决方案2
0 2018-11-01 14:44:59

解决方案3
0 2018-11-01 15:17:48

如何有条件地替换子列表中的项目。

问题描述

3 个解决方案

解决方案1 0 2018-11-01 14:44:54

解决方案2 0 2018-11-01 14:44:59

解决方案3 0 2018-11-01 15:17:48

解决方案1
0 2018-11-01 14:44:54

解决方案2
0 2018-11-01 14:44:59

解决方案3
0 2018-11-01 15:17:48