如何使用正在读取的文本文件中的值创建python字典

Question

I have a file 'peaks_ee.xpk' and I'm trying to create a dictionary in my python code using the values in that file. 我有一个文件“ peaks_ee.xpk”，我正在尝试使用该文件中的值在python代码中创建一个字典。

j = 0;
contents_atom = []
atom_lines=[]
with open ("peaks_ee.xpk","r") as atomName:
    for name in atomName.readlines():
        float_str = re.findall("\d\.H\d'?", name)
        if (len(float_str)>1):
            j = j+1
            value1 = ('Atom ' + str(j) + ' ' + str(float_str[0]) + ' ' + str(float_str[1]) + '\n')
            atom_lines.insert(-1,value1)                     
tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
    tclust_atom.write(value1)
tclust_atom.close()

I'm reading in the file peaks_ee.xpk. 我正在读取文件peaks_ee.xpk。 This is what peaks_ee.xpk looks like: 这是peaks_ee.xpk的样子：

peaks_ee

This is a sample snippet from peaks_ee.xpk: 这是peaks_ee.xpk的样本片段：

label dataset sw sf
1H 1H_2
NOESY_F1eF2e.nv
4807.69238281 4803.07373047
600.402832031 600.402832031
1H.L 1H.P 1H.W 1H.B 1H.E 1H.J 1H.U 1H_2.L 1H_2.P 1H_2.W 1H_2.B 1H_2.E 1H_2.J 1H_2.U vol int stat comment flag0 flag8 flag9
0 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
1 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
2 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
3 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
4 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
5 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
6 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
7 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
8 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
9 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
10 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
11 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
12 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
13 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
14 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
15 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
16 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
17 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
18 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
19 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
20 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
21 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
22 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
23 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
24 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0

I want to make a dictionary which takes in the atom name as the key. 我想制作一个以原子名称为键的字典。 The atom name in peaks_ee.xpk are "1.H1'","2.H8", etc.. and I would like the value to be the chemical shifts which are for example "5.82020" and "7.61004" (this is coming from the 0 line in peaks_ee.xpk) So for example, I would want the dictionary to look like: peaks_ee.xpk中的原子名称是“ 1.H1'”，“ 2.H8”等。我希望该值是化学位移，例如“ 5.82020”和“ 7.61004”（即将到来）从peaks_ee.xpk中的0行开始）因此，例如，我希望字典看起来像：

dict = { "1.H1'":"5.82020", "2.H8":"7.61004"...}

But the next line repeats by having 2.H8 and 1.H1' again, so it doesn't need to be added to the dictionary. 但是，下一行通过再次具有2.H8和1.H1'来重复，因此不需要将其添加到字典中。 The line after that (line 2) should add to the dictionary because it has a new atom called 1.H8, so it should be 该行之后的行（第2行）应添加到字典中，因为它有一个名为1.H8的新原子，因此应为

dict = {"1.H1'":"5.82020", "2.H8":"7.61004", "1.H8:8.13712", ...}

How can I do this? 我怎样才能做到这一点？

Edit: If I have another file "ee_pinkH1.xpk" and I want to read it in and see if the chemical shift values from there are in a certain range, then print out those values, would this be the code? 编辑：如果我还有另一个文件“ ee_pinkH1.xpk”，我想读入它，看看从那里的化学位移值是否在一定范围内，然后打印出这些值，这是代码吗？

This is my entire code: 这是我的整个代码：

import os
import sys
import re

i = 0;
contents_peak = []
peak_lines=[]
with open ("ee_pinkH1.xpk","r") as peakPPM:
    for PPM in peakPPM.readlines():
        float_num = re.findall("[\s][1-9]{1}\.[0-9]+",PPM)
        if (len(float_num)>1):
            i=i+1
            value = ('Peak ' + str(i) + ' '+  str(float_num[0])+ ' 0.05 ' + str(float_num[1])+ ' 0.05 ' + '\n')
            peak_lines.insert(-1,value)
tclust_peak = open("tclust.txt","w+")
tclust_peak.write('rbclust \n')
for value in peak_lines:
    tclust_peak.write(value)
tclust_peak.close()

j = 0;
contents_atom = []
atom_lines=[]
result = {}
with open ("peaks_ee.xpk","r") as atomName:
    for name in atomName.readlines():
        for match in rex.finditer(line):
            name,shift = match.groups()
        if name not in result: 
            result[name] = float(shift)
            float_str = re.findall("\d\.H\d'?", name)
            if (len(float_str)>1):
                j = j+1
                if peakPPM = 'ee_pinkH1.xpk':
                    if 5<=float_num<=6.25:
                        value1 = ('Atom ' + str(j) + ' ' + str(float_str[0]) + ' ' + str(float_str[1]) + '\n')
                    atom_lines.insert(-1,value1)

tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
    tclust_atom.write(value1)
tclust_atom.close()

Answer 1

Just check if a key is already in the dictionary before adding it, using in . 只需使用in在添加字典之前检查字典中是否已存在某个键。

dict = {}
for line in atomName.readlines()
    atom_name = line.split()[1][1:-1]
    if (atom_name in dict):
        atom_value = float(line.split()[2])
        dict[atom_name] = atom_value

Since it looks like you have multiple key-value pairs to check for each row, you can repeat the function in every line like this: 由于看起来您要检查每一行有多个键值对，因此可以在每一行中重复执行该函数，如下所示：

dict = {}
for line in atomName.readlines()
    atom_name = line.split()[1][1:-1]
    if (atom_name in dict):
        atom_value = float(line.split()[2])
        dict[atom_name] = atom_value
    atom_name = line.split()[8][1:-1]
    if (atom_name in dict):
        atom_value = float(line.split()[9])
        dict[atom_name] = atom_value

Did you mean to edit this post, by the way? 您是要编辑这篇文章吗？ I also answered on your older duplicate post. 我也回答了您的旧重复帖子。

Answer 2

You can extend your regex pattern to include the chemical shift and get what you need in each match. 您可以扩展正则表达式模式以包括化学位移，并在每次比赛中获得所需的东西。 Put parenthesis around the parts of the pattern you want to keep so they will be captured. 将括号放在要保留的模式部分周围，以便将其捕获。

pattern = '''{(\d\.H\d'?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)

Iterate over all the matches; 遍历所有比赛； the name and shift will be in the match.groups() tuple; 名称和班次将在match.groups（）元组中； if the name hasn't been seen yet add it to the dictionary. 如果尚未看到该名称，则将其添加到词典中。

with open(filepath) as atom_name:
    data = atom_name.read()
result = {}
for match in rex.finditer(data):
    name, shift = match.groups()
    #print(name,shift)
    if name not in result:
        result[name] = float(shift)

If the file is too big to read at once, extract the info one line at a time. 如果文件太大而无法一次读取，则一次提取一行信息。

with open(filepath) as atom_name:
    for line in atom_name:
        for match in rex.finditer(line):
            name, shift = match.groups()
            #print(name,shift)
            if name not in result:
                result[name] = float(shift)

如何使用正在读取的文本文件中的值创建python字典

问题描述

2 个解决方案

解决方案1
1 2017-07-25 00:17:57

解决方案2
1 已采纳 2017-07-25 04:06:28

如何使用正在读取的文本文件中的值创建python字典

问题描述

2 个解决方案

解决方案1 1 2017-07-25 00:17:57

解决方案2 1 已采纳 2017-07-25 04:06:28

解决方案1
1 2017-07-25 00:17:57

解决方案2
1 已采纳 2017-07-25 04:06:28