简体   繁体   English

将列表中的元素转换为字典

[英]convert elements in a list to dictionary

I would like to convert the data into a dictionary to work with.我想将数据转换成字典来使用。 The data looks like keys and values in a dictionary, but they are combined into a single element.数据看起来像字典中的键和值,但它们组合成一个元素。

here's a sample of the data这是数据样本

['"acetic anydride": "[CX3](=[OX1])[OX2][CX3](=[OX1])",\n',
 '"acetylenic carbon": "[$([CX2]#C)]",\n',
 '"acyl bromide": "[CX3](=[OX1])[Br]",\n',
 '"acyl chloride": "[CX3](=[OX1])[Cl]",\n',
 '"acyl fluoride": "[CX3](=[OX1])[F]",\n',
 '"acyl iodide": "[CX3](=[OX1])[I]",\n',
 '"aldehyde": "[CX3H1](=O)[#6]",\n',
 '"alkane": "[CX4]",\n',
 '"allenic carbon": "[$([CX2](=C)=C)]",\n',
 '"amide": "[NX3][CX3](=[OX1])[#6]",\n',
 '"amidium": "[NX3][CX3]=[NX3+]",\n',
 '"amino acid": "[$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]",\n',
 '"azide": "[$(-[NX2-]-[NX2+]#[NX1]),$(-[NX2]=[NX2+]=[NX1-])]",\n',
 '"azo nitrogen": "[NX2]=N",\n',
 '"azole": "[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]",\n',
 '"azoxy nitrogen": "[$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]",\n',
 '"diazene": "[NX2]=[NX2]",\n',
 '"diazo nitrogen": "[$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]",\n',
 '"bromine": "[Br]",\n']

I have tried removing the: in the data using the replace command, but it didn't work.我尝试使用替换命令删除数据中的:,但没有成功。

i=0
for line in lines:
    a = lines[i]
    a.replace(":", "")
    lines[i] = a
    i+=1
d = {}
for line in lines:
    s = line.split(":")
    d[s[0].strip(' "')] = s[1].strip(' ",\n')

You can use eval :您可以使用eval

ll = ['"acetic anydride": "[CX3](=[OX1])[OX2][CX3](=[OX1])",\n',
 '"acetylenic carbon": "[$([CX2]#C)]",\n',
 '"acyl bromide": "[CX3](=[OX1])[Br]",\n',
 '"acyl chloride": "[CX3](=[OX1])[Cl]",\n',
 '"acyl fluoride": "[CX3](=[OX1])[F]",\n',
 '"acyl iodide": "[CX3](=[OX1])[I]",\n',
 '"aldehyde": "[CX3H1](=O)[#6]",\n',
 '"alkane": "[CX4]",\n',
 '"allenic carbon": "[$([CX2](=C)=C)]",\n',
 '"amide": "[NX3][CX3](=[OX1])[#6]",\n',
 '"amidium": "[NX3][CX3]=[NX3+]",\n',
 '"amino acid": "[$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]",\n',
 '"azide": "[$(-[NX2-]-[NX2+]#[NX1]),$(-[NX2]=[NX2+]=[NX1-])]",\n',
 '"azo nitrogen": "[NX2]=N",\n',
 '"azole": "[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]",\n',
 '"azoxy nitrogen": "[$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]",\n',
 '"diazene": "[NX2]=[NX2]",\n',
 '"diazo nitrogen": "[$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]",\n',
 '"bromine": "[Br]",\n']

dd = eval('{' + ' '.join(ll).replace('\n', '') + '}')

This converts your list to a single string, removes the \n and adds the curly braces, you then have a str that can be evaluated as it's valid python code to form a dictionary.这会将您的列表转换为单个字符串,删除\n并添加大括号,然后您将拥有一个可以评估的 str,因为它是有效的 python 代码以形成字典。

This is just a problem of formatting or more precisely data cleaning.这只是格式化或更准确地说是数据清理的问题。 I am not sure why you are using an increment variable.我不确定您为什么使用增量变量。 The foremost thing I will like to handle is the newline character at the end of each element, then split it based on ': ' and create a dictionary using the values.我要处理的最重要的事情是每个元素末尾的换行符,然后根据“:”拆分它并使用这些值创建一个字典。 You can try the code below.你可以试试下面的代码。

d = {}
for element in lines:
    element = element.rstrip(",\n")
    key, value = element.split(": ")
    d[key.strip('"')] = value.strip('"')
d   

I have used to strip('"') to remove multiple quotation marks.我曾经使用 strip('"') 删除多个引号。

Each element in the list is a string ending in ',\n'.列表中的每个元素都是一个以 ',\n' 结尾的字符串。 These should be removed.这些应该被删除。 The keys and values have unnecessary double-quotes.键和值有不必要的双引号。 These should also be removed.这些也应该被删除。 I think this should give you what you need:我认为这应该给你你需要的东西:

mylist = ['"acetic anydride": "[CX3](=[OX1])[OX2][CX3](=[OX1])",\n',
 '"acetylenic carbon": "[$([CX2]#C)]",\n',
 '"acyl bromide": "[CX3](=[OX1])[Br]",\n',
 '"acyl chloride": "[CX3](=[OX1])[Cl]",\n',
 '"acyl fluoride": "[CX3](=[OX1])[F]",\n',
 '"acyl iodide": "[CX3](=[OX1])[I]",\n',
 '"aldehyde": "[CX3H1](=O)[#6]",\n',
 '"alkane": "[CX4]",\n',
 '"allenic carbon": "[$([CX2](=C)=C)]",\n',
 '"amide": "[NX3][CX3](=[OX1])[#6]",\n',
 '"amidium": "[NX3][CX3]=[NX3+]",\n',
 '"amino acid": "[$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]",\n',
 '"azide": "[$(-[NX2-]-[NX2+]#[NX1]),$(-[NX2]=[NX2+]=[NX1-])]",\n',
 '"azo nitrogen": "[NX2]=N",\n',
 '"azole": "[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]",\n',
 '"azoxy nitrogen": "[$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]",\n',
 '"diazene": "[NX2]=[NX2]",\n',
 '"diazo nitrogen": "[$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]",\n',
 '"bromine": "[Br]",\n']

mydict = dict()
for e in mylist:
    t = e.replace('"', '').split(':')
    mydict[t[0]] = t[1][:-2].strip()

print(mydict)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM