简体   繁体   English

Python-如何使用计数器将列元素添加到列表

[英]Python - How to add column elements to a list with a counter

in a text file with two columns, I have some rows like the following: 在具有两列的文本文件中,我有一些行,如下所示:

N     20
CA    20
C     20
O     20
CB    20
CG    20
CD    20
CE    20
NZ    20
N     21
CA    21
C     21
O     21
CB    21
SG    21

I created a nested dictionary in this way: 我以这种方式创建了一个嵌套字典:

r_list = ['20', '21']
dictionary = {}
r_dict = {}
a_dict = {}
for r in range(0,len(r_list)):
    r = r_list[r]
    dictionary['C'] = r_dict
    r_dict[r] = a_dict

print dictionary

"""output:

{'C': {'20': {}, '21': {}}}

equal to:

dictionary = {'C': {
                    '20': {},
                    '21': {}
                }
            }
"""

Now, how to split the first column of the text file based on the reading of the relative second column? 现在,如何根据相对第二列的读数拆分文本文件的第一列? I would like to add the elements of the first column to a new list, until the counter finds '20' in the second column; 我想将第一列的元素添加到新列表中,直到计数器在第二列中找到“ 20”为止; after that, when counter finds the '21', it starts to add elements of the first column related with '21' in a new list, and so on ... In this way, I can then use these new sublists of elements like for "r_list", with other nested dictionaries, obtaining a final structure such as the following: 之后,当counter找到'21'时,它开始在新列表中添加与'21'相关的第一列元素,依此类推……这样,我便可以使用这些新的元素子列表,例如对于“ r_list”以及其他嵌套词典,获得最终结构,如下所示:

sublist_1 = ['N', 'CA', 'C', 'O', 'CB', 'CG', 'CD', 'CE', 'NZ']
sublist_2 = ['N', 'CA', 'C', 'O', 'CB', 'SG']

dictionary =    {'C' : {
                    '20': {
                        'N': {},
                        'CA': {},
                        'C': {},
                        'O': {},
                        'CB': {},
                        'CG': {},
                        'CD': {},
                        'CE': {},
                        'NZ': {}
                    },
                    '21': {
                        'N': {},
                        'CA': {},
                        'C': {},
                        'O': {},
                        'CB': {},
                        'SG': {}
                    }
                }
            }

How to do that? 怎么做?

Thanks a lot, 非常感谢,

Riccardo 里卡多

EDIT: 编辑:

I applied all the solutions to an original cif file with success but, for the "label_atom_id" column (third column), in some cif file and for some atoms, there are quotes, like in the following eighth row and third column (starting from zero: "O5'") which remain in the dictionary: 我将所有解决方案成功地应用于了原始的cif文件,但是对于“ label_atom_id”列(第三列),在某些cif文件和某些原子中都带有引号,例如下面的第八行和第三列(从零:“ O5'”),但仍保留在字典中:

ATOM   588  O  O4    . DT  B 2 10 ? 33.096 42.342 26.554 1.00 4.81  ? ? ? ? ? ? 29  DT  E O4    1 
ATOM   589  C  C5    . DT  B 2 10 ? 32.273 42.719 24.308 1.00 8.22  ? ? ? ? ? ? 29  DT  E C5    1 
ATOM   590  C  C7    . DT  B 2 10 ? 33.654 42.972 23.700 1.00 10.91 ? ? ? ? ? ? 29  DT  E C7    1 
ATOM   591  C  C6    . DT  B 2 10 ? 31.207 42.767 23.502 1.00 2.00  ? ? ? ? ? ? 29  DT  E C6    1 
ATOM   592  P  P     . DG  B 2 11 ? 25.446 44.301 21.417 1.00 28.24 ? ? ? ? ? ? 30  DG  E P     1 
ATOM   593  O  OP1   . DG  B 2 11 ? 24.109 43.692 21.128 1.00 19.20 ? ? ? ? ? ? 30  DG  E OP1   1 
ATOM   594  O  OP2   . DG  B 2 11 ? 26.212 45.060 20.381 1.00 24.94 ? ? ? ? ? ? 30  DG  E OP2   1 
ATOM   595  O  "O5'" . DG  B 2 11 ? 25.303 45.130 22.804 1.00 27.92 ? ? ? ? ? ? 30  DG  E "O5'" 1 
ATOM   596  C  "C5'" . DG  B 2 11 ? 24.694 44.453 23.923 1.00 19.87 ? ? ? ? ? ? 30  DG  E "C5'" 1 
ATOM   597  C  "C4'" . DG  B 2 11 ? 25.160 44.958 25.273 1.00 19.56 ? ? ? ? ? ? 30  DG  E "C4'" 1 
ATOM   598  O  "O4'" . DG  B 2 11 ? 26.506 44.513 25.519 1.00 22.77 ? ? ? ? ? ? 30  DG  E "O4'" 1 
ATOM   599  C  "C3'" . DG  B 2 11 ? 25.135 46.521 25.375 1.00 19.23 ? ? ? ? ? ? 30  DG  E "C3'" 1 
ATOM   600  O  "O3'" . DG  B 2 11 ? 24.620 46.792 26.672 1.00 20.19 ? ? ? ? ? ? 30  DG  E "O3'" 1 
ATOM   601  C  "C2'" . DG  B 2 11 ? 26.605 46.795 25.327 1.00 18.78 ? ? ? ? ? ? 30  DG  E "C2'" 1 
ATOM   602  C  "C1'" . DG  B 2 11 ? 27.116 45.634 26.159 1.00 21.24 ? ? ? ? ? ? 30  DG  E "C1'" 1 
ATOM   603  N  N9    . DG  B 2 11 ? 28.583 45.580 26.153 1.00 21.14 ? ? ? ? ? ? 30  DG  E N9    1

I tried to remove them from the file, to have only (O5), without success in this way: 我试图从文件中删除它们,使其只有(O5),但这种方式没有成功:

with open(filename,"r") as f:
    lines = f.readlines()

for line in lines:
    column = line.split(None)
    atom = column[3]
    #print atom
    no_double_quotes = atom.replace('"', "").strip()
    #print no_double_quotes
    atom_cleaned = no_double_quotes.replace("'", "").strip()
    atom = atom_cleaned
    print atom

# and write everything back
with open(filename, 'w') as f:
    f.writelines(lines)

The console output is correct, but nothing is written into the file parsed for the dictionary... Is there a more efficient and working method? 控制台输出是正确的,但是没有任何内容写入解析为字典的文件中……是否有更有效,更有效的方法?

EDIT 2 (FINAL): 编辑2(最终):

I understood: the double quotation marks (when in the console is written '"O5 \\'") are embedding the apostrophe character (\\') used for the numbering of the atoms of the sugar (deoxyribose in that case) in the nucleotide, so I can not delete them, having a functional significance. 我了解:双引号(在控制台中写为“ O5 \\”时)是在核苷酸中嵌入用于对糖(在这种情况下为脱氧核糖)的原子编号的撇号(\\'),因此,我无法删除它们,但具有功能意义。 Understood this, I solved then replacing the apostrophe character with its ASCII character (chr(39)), in this way: 理解了这一点之后,我解决了用以下方式将单引号替换为其ASCII字符(chr(39)):

for x in atom_record_rows_list:
    atom = x[3]
    #print atom
    no_double_quotes = atom.replace('"', "").strip()
    #print no_double_quotes
    atom_cleaned = no_double_quotes.replace("'", chr(39)).strip()
    x[3] = atom_cleaned
    print x[3]

dict = {"C": {y:{x[3]:{} for x in atom_record_rows_list if x[8] == y} for y in rlist}}
print dict

It sounds like you are making this more difficult than it needs to be. 听起来您正在使这一过程变得比原来更困难。 Can you just iterate over the lines in the file splitting them and just adding them to the dictionary: 您是否可以遍历文件中的行以将它们拆分并将其添加到字典中:

dictionary = { 'C': { r : {} for r in ['20', '21'] }}
with open('<filename>', 'r') as file:
    for line in file:
        words = line.split()
        dictionary['C'][words[1]][words[0]] = {}

You can extract the sublists if you really need them: 如果确实需要子列表,则可以提取它们:

sublist_1 = dictionary['C']['20'].keys()
sublist_2 = dictionary['C']['21'].keys()

However you have to remember that dictionaries are not ordered, so they will come out in a different order to what you have. 但是,您必须记住,字典不是按顺序排列的,因此它们将以与您所拥有的顺序不同的顺序出现。

You can use dict comprehension to do this for you 您可以使用dict理解来为您完成此操作

inp = """N     20
CA    20
C     20
O     20
CB    20
CG    20
CD    20
CE    20
NZ    20
N     21
CA    21
C     21
O     21
CB    21
SG    21"""

mappings = [i.split() for i in inp.split("\n")]
rlist = set(x[1] for x in mappings)
dicts = {"C": {y:{x[0]:{} for x in mappings if  x[1] == y} for y in rlist}}

>>> print dicts
{'C': 
 {'20': {'C': {},
   'CA': {},
   'CB': {},
   'CD': {},
   'CE': {},
   'CG': {},
   'N': {},
   'NZ': {},
   'O': {}},
  '21': {'C': {}, 
   'CA': {}, 
   'CB': {}, 
   'N': {}, 
   'O': {}, 
   'SG': {}}
 }
}
  1. Read file by read method. 通过读取方法读取文件。
  2. Create result dictionary. 创建结果字典。
  3. Split file content by \\n ie split('\\n') \\n分割文件内容,即split('\\n')
  4. Iterate every element from step 2 by for loop. 通过for循环迭代步骤2中的每个元素。
  5. Get two column value from every elements by split(" ") 通过split(" ")从每个元素获取两列值
  6. Add counter key in the result dictionary. 在结果字典中添加计数器键。 in except block. 在除了块。
  7. Add element name dictionary in counter dictionary. 在计数器字典中添加元素名称字典。

code: 码:

with open("/home/infogrid/Desktop/Work/stack/input.txt", "r") as fp:
    data = fp.read()

result = {'C':{}}
for i in data.strip().split('\n'):
    val_count = [j for j in i.split(' ') if j]
    try:
        result['C'][val_count[1]][val_count[0]] = {}
    except KeyError:
        result['C'][val_count[1]] = {}
        result['C'][val_count[1]][val_count[0]] = {}

import pprint
pprint.pprint(result)

output: 输出:

{'C': {'20': {'C': {},
              'CA': {},
              'CB': {},
              'CD': {},
              'CE': {},
              'CG': {},
              'N': {},
              'NZ': {},
              'O': {}},
       '21': {'C': {}, 'CA': {}, 'CB': {}, 'N': {}, 'O': {}, 'SG': {}}}}

Use defaultdict module to remove try-except block from code. 使用defaultdict模块从代码中删除try-except块。 more info 更多信息

>>> from collections import defaultdict
>>> result = {'C':defaultdict(dict)}
>>> result['C']['20']['CB'] = {}
>>> result['C']['20']
{'CB': {}}
>>> result['C']['21']
{}
>>> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM