简体   繁体   English

如何从另一个文件填充字典值?

[英]How to fill dictionary values from another file?

I have two files (each indices are separated by a space) : 我有两个文件(每个索引都用空格隔开):

file1.txt file1.txt

OTU0001 Archaea
OTU0002 Archaea;Aenigmarchaeota;Deep Sea Euryarchaeotic Group(DSEG);uncultured archaeon
OTU0003 Archaea;Altiarchaeales;uncultured euryarchaeote
OTU0004 Archaea;Bathyarchaeota;uncultured archaeon
OTU0005 Archaea;Diapherotrites;uncultured euryarchaeote
OTU0006 Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured
OTU0007 Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome

file2.txt file2.txt

UniRef90_1 OTU0001 OTU0004 OTU0005 OTU0007 
UniRef90_2 OTU0002 OTU0003 OTU0005 
UniRef90_3 OTU0004 OTU0006 OTU0007 

I would like, in the second file, replace the OTUXXXX by their values from the first file . 我想在第二个文件中,将OTUXXXX替换为第一个文件中的值。 And I need to keep the Uniref90_X at the beginning of each line. 我需要将Uniref90_X保留在每行的开头。 It should like this for the first line of the second file : 第二个文件的第一行应该这样:

UniRef90_1 Archaea (#OTU0001) Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004) Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007) 

For the moment, I have created a dictionary for the second file, with the UniRef90_X as keys and the OTUXXXX as values. 目前,我已经为第二个文件创建了一个字典,其中UniRef90_X作为键,而OTUXXXX作为值。

f1=open("file1.txt", "r")
f2=open("file2.txt", "r")

dict={}
for i in f2:
    i=i.split(" ")
    dict[i[0]]=i[1:]
    for j in f1:
        j=j.split(" ")
        if j[0] in dict.values():
            dico[i[0]]=j[1:]

But I don't know how to replace the OTUXXXX with the corresponding values from the first fileny idea? 但是我不知道如何用第一个文件想法中的相应值替换OTUXXXX?

First of all, DO NOT NAME YOUR VARIABLES EXACTLY LIKE CLASSES. 首先,不要为变量精确命名。 EVER. 永远 Use something like d2 instead. 使用类似d2的东西。

Then, replace the [1] with [1:] 然后,将[1]替换为[1:]

Then, after importing the first file in a dictionary just like you did with the second one - let's name it d1 - you can combine the values like this: 然后,将第一个文件导入字典后,就像处理第二个文件一样-我们将其命名为d1-您可以合并以下值:

d3=dict()
for e in d2:
    L=list()
    for f in d2[e]:
        L.append(d1[f])
    d3[e]=f(L) #format your list here

Finally, turn it back into a string and write it in a file. 最后,将其转换为字符串并将其写入文件。

I would suggest putting the first file into a dictionary. 我建议将第一个文件放入字典中。 That way, as you read file2, you can look up ids you captured from file1. 这样,在读取file2时,您可以查找从file1捕获的ID

The way you have your loops set up, you will read the first record from file2 and enter it into a hash. 设置循环的方式是,您将从file2中读取第一条记录,并将其输入到哈希中。 The key will never match anything from file1. 密钥永远不会匹配file1中的任何内容。 Then you read from file1 and do something there. 然后,您从file1中读取内容并在那里进行操作。 The next time you read from file2, all of file1 will be exhausted from the first iteration of file2. 下次您从文件2读取时,文件1的所有迭代将耗尽所有文件1。

Here is an approach that reads file 1 into a dictionary, and when it finds matches in file 2, prints them out. 这是一种将文件1读入字典的方法,当它在文件2中找到匹配项时,将其打印出来。

file1 = {} # declare a dictionary

fin = open('f1.txt', 'r')

for line in fin:
    # strip the ending newline
    line = line.rstrip()

    # only split once
    # first part into _id and second part into data
    _id, data = line.split(' ', 1)

    # data here is a single string possibly containing spaces
    # because only split once (above)
    file1[_id] = data

fin.close()

fin = open('f2.txt', 'r')

for line in fin:
    uniref, *ids = line.split() # here ids is a list (because prepended by *)

    print(uniref, end='')
    for _id in ids:
        if _id in file1:
            print(' ', file1[_id], '(#' + _id + ')', end='')
    print()

fin.close()

The printout is: 打印输出为:

UniRef90_1  Archaea (#OTU0001)  Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004)  Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005)  Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)
UniRef90_2  Archaea;Aenigmarchaeota;Deep Sea Euryarchaeotic Group(DSEG);uncultured archaeon (#OTU0002)  Archaea;Altiarchaeales;uncultured euryarchaeote (#OTU0003)  Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005)
UniRef90_3  Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004)  Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured (#OTU0006)  Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM