in a text file with two columns, I have some rows like the following:
N 20
CA 20
C 20
O 20
CB 20
CG 20
CD 20
CE 20
NZ 20
N 21
CA 21
C 21
O 21
CB 21
SG 21
I created a nested dictionary in this way:
r_list = ['20', '21']
dictionary = {}
r_dict = {}
a_dict = {}
for r in range(0,len(r_list)):
r = r_list[r]
dictionary['C'] = r_dict
r_dict[r] = a_dict
print dictionary
"""output:
{'C': {'20': {}, '21': {}}}
equal to:
dictionary = {'C': {
'20': {},
'21': {}
}
}
"""
Now, how to split the first column of the text file based on the reading of the relative second column? I would like to add the elements of the first column to a new list, until the counter finds '20' in the second column; after that, when counter finds the '21', it starts to add elements of the first column related with '21' in a new list, and so on ... In this way, I can then use these new sublists of elements like for "r_list", with other nested dictionaries, obtaining a final structure such as the following:
sublist_1 = ['N', 'CA', 'C', 'O', 'CB', 'CG', 'CD', 'CE', 'NZ']
sublist_2 = ['N', 'CA', 'C', 'O', 'CB', 'SG']
dictionary = {'C' : {
'20': {
'N': {},
'CA': {},
'C': {},
'O': {},
'CB': {},
'CG': {},
'CD': {},
'CE': {},
'NZ': {}
},
'21': {
'N': {},
'CA': {},
'C': {},
'O': {},
'CB': {},
'SG': {}
}
}
}
How to do that?
Thanks a lot,
Riccardo
EDIT:
I applied all the solutions to an original cif file with success but, for the "label_atom_id" column (third column), in some cif file and for some atoms, there are quotes, like in the following eighth row and third column (starting from zero: "O5'") which remain in the dictionary:
ATOM 588 O O4 . DT B 2 10 ? 33.096 42.342 26.554 1.00 4.81 ? ? ? ? ? ? 29 DT E O4 1
ATOM 589 C C5 . DT B 2 10 ? 32.273 42.719 24.308 1.00 8.22 ? ? ? ? ? ? 29 DT E C5 1
ATOM 590 C C7 . DT B 2 10 ? 33.654 42.972 23.700 1.00 10.91 ? ? ? ? ? ? 29 DT E C7 1
ATOM 591 C C6 . DT B 2 10 ? 31.207 42.767 23.502 1.00 2.00 ? ? ? ? ? ? 29 DT E C6 1
ATOM 592 P P . DG B 2 11 ? 25.446 44.301 21.417 1.00 28.24 ? ? ? ? ? ? 30 DG E P 1
ATOM 593 O OP1 . DG B 2 11 ? 24.109 43.692 21.128 1.00 19.20 ? ? ? ? ? ? 30 DG E OP1 1
ATOM 594 O OP2 . DG B 2 11 ? 26.212 45.060 20.381 1.00 24.94 ? ? ? ? ? ? 30 DG E OP2 1
ATOM 595 O "O5'" . DG B 2 11 ? 25.303 45.130 22.804 1.00 27.92 ? ? ? ? ? ? 30 DG E "O5'" 1
ATOM 596 C "C5'" . DG B 2 11 ? 24.694 44.453 23.923 1.00 19.87 ? ? ? ? ? ? 30 DG E "C5'" 1
ATOM 597 C "C4'" . DG B 2 11 ? 25.160 44.958 25.273 1.00 19.56 ? ? ? ? ? ? 30 DG E "C4'" 1
ATOM 598 O "O4'" . DG B 2 11 ? 26.506 44.513 25.519 1.00 22.77 ? ? ? ? ? ? 30 DG E "O4'" 1
ATOM 599 C "C3'" . DG B 2 11 ? 25.135 46.521 25.375 1.00 19.23 ? ? ? ? ? ? 30 DG E "C3'" 1
ATOM 600 O "O3'" . DG B 2 11 ? 24.620 46.792 26.672 1.00 20.19 ? ? ? ? ? ? 30 DG E "O3'" 1
ATOM 601 C "C2'" . DG B 2 11 ? 26.605 46.795 25.327 1.00 18.78 ? ? ? ? ? ? 30 DG E "C2'" 1
ATOM 602 C "C1'" . DG B 2 11 ? 27.116 45.634 26.159 1.00 21.24 ? ? ? ? ? ? 30 DG E "C1'" 1
ATOM 603 N N9 . DG B 2 11 ? 28.583 45.580 26.153 1.00 21.14 ? ? ? ? ? ? 30 DG E N9 1
I tried to remove them from the file, to have only (O5), without success in this way:
with open(filename,"r") as f:
lines = f.readlines()
for line in lines:
column = line.split(None)
atom = column[3]
#print atom
no_double_quotes = atom.replace('"', "").strip()
#print no_double_quotes
atom_cleaned = no_double_quotes.replace("'", "").strip()
atom = atom_cleaned
print atom
# and write everything back
with open(filename, 'w') as f:
f.writelines(lines)
The console output is correct, but nothing is written into the file parsed for the dictionary... Is there a more efficient and working method?
EDIT 2 (FINAL):
I understood: the double quotation marks (when in the console is written '"O5 \\'") are embedding the apostrophe character (\\') used for the numbering of the atoms of the sugar (deoxyribose in that case) in the nucleotide, so I can not delete them, having a functional significance. Understood this, I solved then replacing the apostrophe character with its ASCII character (chr(39)), in this way:
for x in atom_record_rows_list:
atom = x[3]
#print atom
no_double_quotes = atom.replace('"', "").strip()
#print no_double_quotes
atom_cleaned = no_double_quotes.replace("'", chr(39)).strip()
x[3] = atom_cleaned
print x[3]
dict = {"C": {y:{x[3]:{} for x in atom_record_rows_list if x[8] == y} for y in rlist}}
print dict
It sounds like you are making this more difficult than it needs to be. Can you just iterate over the lines in the file splitting them and just adding them to the dictionary:
dictionary = { 'C': { r : {} for r in ['20', '21'] }}
with open('<filename>', 'r') as file:
for line in file:
words = line.split()
dictionary['C'][words[1]][words[0]] = {}
You can extract the sublists if you really need them:
sublist_1 = dictionary['C']['20'].keys()
sublist_2 = dictionary['C']['21'].keys()
However you have to remember that dictionaries are not ordered, so they will come out in a different order to what you have.
You can use dict comprehension to do this for you
inp = """N 20
CA 20
C 20
O 20
CB 20
CG 20
CD 20
CE 20
NZ 20
N 21
CA 21
C 21
O 21
CB 21
SG 21"""
mappings = [i.split() for i in inp.split("\n")]
rlist = set(x[1] for x in mappings)
dicts = {"C": {y:{x[0]:{} for x in mappings if x[1] == y} for y in rlist}}
>>> print dicts
{'C':
{'20': {'C': {},
'CA': {},
'CB': {},
'CD': {},
'CE': {},
'CG': {},
'N': {},
'NZ': {},
'O': {}},
'21': {'C': {},
'CA': {},
'CB': {},
'N': {},
'O': {},
'SG': {}}
}
}
\\n
ie split('\\n')
split(" ")
code:
with open("/home/infogrid/Desktop/Work/stack/input.txt", "r") as fp:
data = fp.read()
result = {'C':{}}
for i in data.strip().split('\n'):
val_count = [j for j in i.split(' ') if j]
try:
result['C'][val_count[1]][val_count[0]] = {}
except KeyError:
result['C'][val_count[1]] = {}
result['C'][val_count[1]][val_count[0]] = {}
import pprint
pprint.pprint(result)
output:
{'C': {'20': {'C': {},
'CA': {},
'CB': {},
'CD': {},
'CE': {},
'CG': {},
'N': {},
'NZ': {},
'O': {}},
'21': {'C': {}, 'CA': {}, 'CB': {}, 'N': {}, 'O': {}, 'SG': {}}}}
Use defaultdict
module to remove try-except block from code. more info
>>> from collections import defaultdict
>>> result = {'C':defaultdict(dict)}
>>> result['C']['20']['CB'] = {}
>>> result['C']['20']
{'CB': {}}
>>> result['C']['21']
{}
>>>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.