繁体   English   中英

如何检查字典中的第二个值是否在特定范围内?

[英]How can I check whether every second value from a dictionary is in a specific range?

我有一本字典,它从一个名为peaks_ee.xpk的文件中读取。

来自peaks_ee.xpk的样本:

label dataset sw sf
1H 1H_2
NOESY_F1eF2e.nv
4807.69238281 4803.07373047
600.402832031 600.402832031
1H.L 1H.P 1H.W 1H.B 1H.E 1H.J 1H.U 1H_2.L 1H_2.P 1H_2.W 1H_2.B 1H_2.E 1H_2.J 1H_2.U vol int stat comment flag0 flag8 flag9
0 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
1 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
2 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
3 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
4 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
5 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
6 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
7 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
8 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
9 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
10 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
11 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
12 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
13 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
14 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
15 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
16 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
17 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
18 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
19 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
20 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
21 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
22 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
23 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
24 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0`

例如,在peaks_ee.xpk的第0行中,原子名称为1.H1',化学位移为5.82020。 在同一行的第8列中,还有另一个原子名称2.H8,其化学位移为7.61004。 基本上,我想检查行中的第一个化学位移(5.82020)是否在某个范围内,以及第二个化学位移(7.49932)是否在另一个范围内。 如果是,则将原子名称(1.H1'和2.H8)写到一个名为tclust.txt的文件中

到目前为止,这是代码,我之前发布了另一个问题,@ wwii帮了我这个代码。

pattern = '''{(\d\.H\d'?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)

j = 0;
contents_atom = []
atom_lines=[]

result = {}
with open("peaks_ee.xpk","r") as atom_name:
    for line in atom_name:
        for match in rex.finditer(line):
            name, shift = match.groups()
            if name not in result:
                result[name] = float(shift)
                print (name,shift)
                    if filename == 'ee_pinkH1.xpk':
                        if result[name]<=8.5
                            float_str = re.findall("\d\.\H\d'?",name)
                            if (len(float_str))>1:
                                j=j+1
                                value1 = ('Atom ' + str(j) + ' ' + str(float_str[0])+ ' ' + str(float_str[1])+ '\n')
                                atom_lines.insert(-1,value1)

tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
    tclust_atom.write(value1)
tclust_atom.close()

这是从行print (names,shift)打印出的原子名称及其化学位移列表的图片print (names,shift)

原子名称和化学位移

从该图中,前两行是:

“ 1.H1'”,“ 5.82020”,“ 2.H8”,“ 7.61004”,但前两行实际上仅来自peaks_ee.xpk的第一行,我想看看“ 5.82020”是否介于5.1和6,如果7.61004在7和8.25之间。 有没有办法可以通过使用字典的值来做到这一点? 我注意到,每隔两行将是我想要查看的值(如果它们介于5.1到6之间),而交替值则是我想要查看的值,如果它们介于7和8.25之间。

编辑:这是我完整的代码:

import pandas as pd
import os
import sys
import re

i=0;
contents_peak=[]
peak_lines=[]
with open ("ee_pinkH1.xpk","r") as peakPPM:
    for PPM in peakPPM.readlines():
        float_num = re.findall("[\s][1-9]{1}\.[0-9]+",PPM)
        if (len(float_num)>1):
            i=i+1
            value = ('Peak '+ str(i) + ' ' + str(float_num[0]) + ' 0.05 ' + str(float_num[1]) + ' 0.05 ' + '\n')
            peak_lines.insert(-1,value)
tclust_peak = open("tclust.txt","w+")
tclust_peak.write("rbclust \n")
for value in peak_lines:
    tclust_peak.write(value)
tclust_peak.close()

pattern = '''{\d\.H\d'?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)

j=0;
contents_atom=[]
atom_lines=[]
result = {}
with open("peaks_ee.xpk","r") as atomName:
    for name in atomName:
        for match in rex.finditer(line):
            name,shift = match.groups()
            print (name,shift)
            if name not in result:
                result[name]=float(shift)
                float_str = re.findall("\d\.H\d'?",name)
                if (len(float_str)>1):
                    j=j+1
                    value1 = ('Atom ' +str(j)+ ' ' + str(float_str[0])+ ' ' + str(float_str[1]) + '\n')
                    atom_lines.insert(-1,value)

df = pd.read_csv("D:/tmp/peaks_ee.xpk", sep= " ", skiprows=5)

shift1= df["1H.P"]
shift2= df["1H_2.P"]

mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))

result = df[mask]
result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
print result

tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
    tclust_atom.write(value1)
tclust_atom.close()

这是我得到的错误:

Traceback (most recent call last):
  File "pandas.py", line 1, in <module>
    import pandas as pd 
  File "/Users/malaikaiyer/Downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/pandas.py", line 23, in <module>
rex = re.compile(pattern)
  File "/Users/malaikaiyer/Downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/lib/jython-standalone-2.7.0.jar/Lib/re.py", line 190, in compile
  File "/Users/malaikaiyer/Downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/lib/jython-standalone-2.7.0.jar/Lib/re.py", line 242, in _compile
sre_constants.error: unbalanced parenthesis

编辑:最新代码7/26:

import pandas as pd
import os
import sys
import re
import csv 

i=0;
contents_peak=[]
peak_lines=[]
with open ("ee_pinkH1.xpk","r") as peakPPM:
    for PPM in peakPPM.readlines():
        float_num = re.findall("[\s][1-9]{1}\.[0-9]+",PPM)
        if (len(float_num)>1):
            i=i+1
            value = ('Peak '+ str(i) + ' ' + str(float_num[0]) + ' 0.05 ' + str(float_num[1]) + ' 0.05 ' + '\n')
            peak_lines.insert(-1,value)
tclust_peak = open("tclust.txt","w+")
tclust_peak.write("rbclust \n")
for value in peak_lines:
    tclust_peak.write(value)
tclust_peak.close()

pattern = ‘’’{(\d\.H\d’?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)

j=0;
contents_atom=[]
atom_lines=[]
result = {}
text = ‘ee’

if text == ‘ee’:
    df = pd.read_csv('peaks_ee.xpk', sep=" ",skiprows=5)

    shift1= df["1H.P"]
    shift= df["1H_2.P"]
    if filename == 'ee_pinkH1.xpk'
        mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))
    elif filename == 'ee_pinkH2.xpk'
        mask = ((shift1>3.25)&(shift1<5))&((shift2>7)&(shift2<8.5))
    result = df[mask]
    result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
    result.to_csv("result.csv")

if text == ‘ef’:
    df = pd.read_csv('peaks_ef.xpk', sep=" ",skiprows=5)

    shift1= df["1H.P"]
    shift2= df["1H_2.P"]
    if filename == ‘ef_blue.xpk’:
        mask = ((shift1>5) & (shift1<6)) & ((shift2>7.25) & (shift2<8.25))
    elif filename == ‘ef_green.xpk’:
        mask = ((shift1>7) & (shift1<9)) & ((shift2>5.25) & (shift2<6.2))
    elif filename == ‘ef_orange:
        mask = ((shift1>3) & (shift1<5)) & ((shift2>5.2) & (shift2<6.25))
    result = df[mask]
    result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
    result.to_csv("result.csv")

if text == ‘fe’:
    df = pd.read_csv('peaks_fe.xpk', sep=" ",skiprows=5)

    shift1= df[“Atom1”]
    shift2= df[“Atom2”]
    if filename == ‘fe_yellow’:
        mask = ((shift1>3) & (shift1<5)) & ((shift2>5) & (shift2<6))
    elif filename == ‘fe_green’:
        mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))
        result = df[mask]
        result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
        result.to_csv("result.csv")

tclust_peak = open("tclust.txt”,”a")
tclust_peak.write((str(result))
tclust_atom.close()

您可以尝试使用pandas软件包。

以下代码将加载您的文件,并跳过前五行以加载所需的数据。 然后,它会在各列之间进行按位检查以创建掩码,最后选择所需的列。

import pandas as pd
df = pd.read_csv("peaks_ee.xpk", sep=" ", skiprows=5)

shift1 = df["1H.P"]
shift2 = df["1H_2.P"]

mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))

result = df[mask]
result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]

结果如下:

>>> result
       1H.L     1H.P  1H_2.L   1H_2.P
0   {1.H1'}  5.82020  {2.H8}  7.61004
3   {1.H1'}  5.82020  {1.H8}  8.13712
5   {2.H1'}  5.90291  {2.H8}  7.61004
8   {1.H1'}  5.82020  {2.H8}  7.61004
11  {4.H1'}  5.74125  {3.H6}  7.53261
12  {3.H1'}  5.54935  {4.H8}  7.49932
15  {3.H1'}  5.54935  {3.H6}  7.53261
18  {2.H1'}  5.90291  {3.H6}  7.53261
21  {4.H1'}  5.74125  {4.H8}  7.49932
24  {3.H1'}  5.54935  {4.H8}  7.49932

然后,如果需要,可以将result导出到csv文件,如下所示:

result.to_csv("result.csv")

我不确定这段代码是否正是您所需要的,但是对于您如何使用pandas可能是一个不错的开始。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM