简体   繁体   English

如何使用正在读取的文本文件中的值创建python字典

[英]How can I create a python dictionary using values from a text file that I'm reading in

I have a file 'peaks_ee.xpk' and I'm trying to create a dictionary in my python code using the values in that file. 我有一个文件“ peaks_ee.xpk”,我正在尝试使用该文件中的值在python代码中创建一个字典。

j = 0;
contents_atom = []
atom_lines=[]
with open ("peaks_ee.xpk","r") as atomName:
    for name in atomName.readlines():
        float_str = re.findall("\d\.H\d'?", name)
        if (len(float_str)>1):
            j = j+1
            value1 = ('Atom ' + str(j) + ' ' + str(float_str[0]) + ' ' + str(float_str[1]) + '\n')
            atom_lines.insert(-1,value1)                     
tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
    tclust_atom.write(value1)
tclust_atom.close()

I'm reading in the file peaks_ee.xpk. 我正在读取文件peaks_ee.xpk。 This is what peaks_ee.xpk looks like: 这是peaks_ee.xpk的样子:

peaks_ee

This is a sample snippet from peaks_ee.xpk: 这是peaks_ee.xpk的样本片段:

label dataset sw sf
1H 1H_2
NOESY_F1eF2e.nv
4807.69238281 4803.07373047
600.402832031 600.402832031
1H.L 1H.P 1H.W 1H.B 1H.E 1H.J 1H.U 1H_2.L 1H_2.P 1H_2.W 1H_2.B 1H_2.E 1H_2.J 1H_2.U vol int stat comment flag0 flag8 flag9
0 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
1 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
2 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
3 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
4 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
5 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
6 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
7 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
8 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
9 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
10 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
11 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
12 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
13 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
14 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
15 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
16 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
17 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
18 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
19 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
20 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
21 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
22 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
23 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
24 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0

I want to make a dictionary which takes in the atom name as the key. 我想制作一个以原子名称为键的字典。 The atom name in peaks_ee.xpk are "1.H1'","2.H8", etc.. and I would like the value to be the chemical shifts which are for example "5.82020" and "7.61004" (this is coming from the 0 line in peaks_ee.xpk) So for example, I would want the dictionary to look like: peaks_ee.xpk中的原子名称是“ 1.H1'”,“ 2.H8”等。我希望该值是化学位移,例如“ 5.82020”和“ 7.61004”(即将到来)从peaks_ee.xpk中的0行开始)因此,例如,我希望字典看起来像:

dict = { "1.H1'":"5.82020", "2.H8":"7.61004"...}

But the next line repeats by having 2.H8 and 1.H1' again, so it doesn't need to be added to the dictionary. 但是,下一行通过再次具有2.H8和1.H1'来重复,因此不需要将其添加到字典中。 The line after that (line 2) should add to the dictionary because it has a new atom called 1.H8, so it should be 该行之后的行(第2行)应添加到字典中,因为它有一个名为1.H8的新原子,因此应为

dict = {"1.H1'":"5.82020", "2.H8":"7.61004", "1.H8:8.13712", ...}

How can I do this? 我怎样才能做到这一点?

Edit: If I have another file "ee_pinkH1.xpk" and I want to read it in and see if the chemical shift values from there are in a certain range, then print out those values, would this be the code? 编辑:如果我还有另一个文件“ ee_pinkH1.xpk”,我想读入它,看看从那里的化学位移值是否在一定范围内,然后打印出这些值,这是代码吗?

This is my entire code: 这是我的整个代码:

import os
import sys
import re

i = 0;
contents_peak = []
peak_lines=[]
with open ("ee_pinkH1.xpk","r") as peakPPM:
    for PPM in peakPPM.readlines():
        float_num = re.findall("[\s][1-9]{1}\.[0-9]+",PPM)
        if (len(float_num)>1):
            i=i+1
            value = ('Peak ' + str(i) + ' '+  str(float_num[0])+ ' 0.05 ' + str(float_num[1])+ ' 0.05 ' + '\n')
            peak_lines.insert(-1,value)
tclust_peak = open("tclust.txt","w+")
tclust_peak.write('rbclust \n')
for value in peak_lines:
    tclust_peak.write(value)
tclust_peak.close()

j = 0;
contents_atom = []
atom_lines=[]
result = {}
with open ("peaks_ee.xpk","r") as atomName:
    for name in atomName.readlines():
        for match in rex.finditer(line):
            name,shift = match.groups()
        if name not in result: 
            result[name] = float(shift)
            float_str = re.findall("\d\.H\d'?", name)
            if (len(float_str)>1):
                j = j+1
                if peakPPM = 'ee_pinkH1.xpk':
                    if 5<=float_num<=6.25:
                        value1 = ('Atom ' + str(j) + ' ' + str(float_str[0]) + ' ' + str(float_str[1]) + '\n')
                    atom_lines.insert(-1,value1)

tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
    tclust_atom.write(value1)
tclust_atom.close()

Just check if a key is already in the dictionary before adding it, using in . 只需使用in在添加字典之前检查字典中是否已存在某个键。

dict = {}
for line in atomName.readlines()
    atom_name = line.split()[1][1:-1]
    if (atom_name in dict):
        atom_value = float(line.split()[2])
        dict[atom_name] = atom_value

Since it looks like you have multiple key-value pairs to check for each row, you can repeat the function in every line like this: 由于看起来您要检查每一行有多个键值对,因此可以在每一行中重复执行该函数,如下所示:

dict = {}
for line in atomName.readlines()
    atom_name = line.split()[1][1:-1]
    if (atom_name in dict):
        atom_value = float(line.split()[2])
        dict[atom_name] = atom_value
    atom_name = line.split()[8][1:-1]
    if (atom_name in dict):
        atom_value = float(line.split()[9])
        dict[atom_name] = atom_value

Did you mean to edit this post, by the way? 您是要编辑这篇文章吗? I also answered on your older duplicate post. 我也回答了您的旧重复帖子。

You can extend your regex pattern to include the chemical shift and get what you need in each match. 您可以扩展正则表达式模式以包括化学位移,并在每次比赛中获得所需的东西。 Put parenthesis around the parts of the pattern you want to keep so they will be captured. 将括号放在要保留的模式部分周围,以便将其捕获。

pattern = '''{(\d\.H\d'?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)

Iterate over all the matches; 遍历所有比赛; the name and shift will be in the match.groups() tuple; 名称和班次将在match.groups()元组中; if the name hasn't been seen yet add it to the dictionary. 如果尚未看到该名称,则将其添加到词典中。

with open(filepath) as atom_name:
    data = atom_name.read()
result = {}
for match in rex.finditer(data):
    name, shift = match.groups()
    #print(name,shift)
    if name not in result:
        result[name] = float(shift)

If the file is too big to read at once, extract the info one line at a time. 如果文件太大而无法一次读取,则一次提取一行信息。

with open(filepath) as atom_name:
    for line in atom_name:
        for match in rex.finditer(line):
            name, shift = match.groups()
            #print(name,shift)
            if name not in result:
                result[name] = float(shift)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我如何创建一个Python字典,通过tkinter文本框从用户那里获取其值和键? - How can I create a python dictionary that gets its values and keys from the user through tkinter text boxes? 如何从非结构化文本创建 python 字典? - How can I create a python dictionary from unstructured text? 我应该如何使用字典作为 python 中的查找表从文本文件创建二进制文件 - How should I create a binary file from a text file using a dictionary as lookup table in python 如何从文件(其中值是列表)在Python中创建字典 - How can I create Dictionary in Python from file, where values are a list 如何从txt文件在python中创建字典? - How can I create a dictionary in python from a txt file? 如何使用 Python 解析 txt 文件并从 txt 文件的特定部分创建字典? - How can I parse a txt file and create dictionary from a specific part of the txt file using Python? 如何使用 Python 从文件创建字典? - How do I create a dictionary from a file using Python? 如何从 python 文件中访问字典(使用 python) - How can i access a dictionary from a python file (using python) 如何通过读取.txt文件的每一行来将键值添加到字典中并更新值? - How can I add key values into a dictionary from reading in each line of a .txt file and update values? 在Python中,如何通过读取CSV文件的过程来创建和组织字典 - In Python how can I create and organized dictionary through the process of reading a CSV file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM