简体   繁体   中英

How to fill dictionary values from another file?

I have two files (each indices are separated by a space) :

file1.txt

OTU0001 Archaea
OTU0002 Archaea;Aenigmarchaeota;Deep Sea Euryarchaeotic Group(DSEG);uncultured archaeon
OTU0003 Archaea;Altiarchaeales;uncultured euryarchaeote
OTU0004 Archaea;Bathyarchaeota;uncultured archaeon
OTU0005 Archaea;Diapherotrites;uncultured euryarchaeote
OTU0006 Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured
OTU0007 Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome

file2.txt

UniRef90_1 OTU0001 OTU0004 OTU0005 OTU0007 
UniRef90_2 OTU0002 OTU0003 OTU0005 
UniRef90_3 OTU0004 OTU0006 OTU0007 

I would like, in the second file, replace the OTUXXXX by their values from the first file . And I need to keep the Uniref90_X at the beginning of each line. It should like this for the first line of the second file :

UniRef90_1 Archaea (#OTU0001) Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004) Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007) 

For the moment, I have created a dictionary for the second file, with the UniRef90_X as keys and the OTUXXXX as values.

f1=open("file1.txt", "r")
f2=open("file2.txt", "r")

dict={}
for i in f2:
    i=i.split(" ")
    dict[i[0]]=i[1:]
    for j in f1:
        j=j.split(" ")
        if j[0] in dict.values():
            dico[i[0]]=j[1:]

But I don't know how to replace the OTUXXXX with the corresponding values from the first fileny idea?

First of all, DO NOT NAME YOUR VARIABLES EXACTLY LIKE CLASSES. EVER. Use something like d2 instead.

Then, replace the [1] with [1:]

Then, after importing the first file in a dictionary just like you did with the second one - let's name it d1 - you can combine the values like this:

d3=dict()
for e in d2:
    L=list()
    for f in d2[e]:
        L.append(d1[f])
    d3[e]=f(L) #format your list here

Finally, turn it back into a string and write it in a file.

I would suggest putting the first file into a dictionary. That way, as you read file2, you can look up ids you captured from file1.

The way you have your loops set up, you will read the first record from file2 and enter it into a hash. The key will never match anything from file1. Then you read from file1 and do something there. The next time you read from file2, all of file1 will be exhausted from the first iteration of file2.

Here is an approach that reads file 1 into a dictionary, and when it finds matches in file 2, prints them out.

file1 = {} # declare a dictionary

fin = open('f1.txt', 'r')

for line in fin:
    # strip the ending newline
    line = line.rstrip()

    # only split once
    # first part into _id and second part into data
    _id, data = line.split(' ', 1)

    # data here is a single string possibly containing spaces
    # because only split once (above)
    file1[_id] = data

fin.close()

fin = open('f2.txt', 'r')

for line in fin:
    uniref, *ids = line.split() # here ids is a list (because prepended by *)

    print(uniref, end='')
    for _id in ids:
        if _id in file1:
            print(' ', file1[_id], '(#' + _id + ')', end='')
    print()

fin.close()

The printout is:

UniRef90_1  Archaea (#OTU0001)  Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004)  Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005)  Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)
UniRef90_2  Archaea;Aenigmarchaeota;Deep Sea Euryarchaeotic Group(DSEG);uncultured archaeon (#OTU0002)  Archaea;Altiarchaeales;uncultured euryarchaeote (#OTU0003)  Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005)
UniRef90_3  Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004)  Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured (#OTU0006)  Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM