如何检查字典中的第二个值是否在特定范围内？

Question

我有一本字典，它从一个名为peaks_ee.xpk的文件中读取。

来自peaks_ee.xpk的样本：

label dataset sw sf
1H 1H_2
NOESY_F1eF2e.nv
4807.69238281 4803.07373047
600.402832031 600.402832031
1H.L 1H.P 1H.W 1H.B 1H.E 1H.J 1H.U 1H_2.L 1H_2.P 1H_2.W 1H_2.B 1H_2.E 1H_2.J 1H_2.U vol int stat comment flag0 flag8 flag9
0 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
1 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
2 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
3 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
4 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
5 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
6 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
7 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
8 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
9 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
10 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
11 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
12 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
13 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
14 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
15 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
16 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
17 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
18 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
19 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
20 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
21 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
22 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
23 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
24 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0`

例如，在peaks_ee.xpk的第0行中，原子名称为1.H1'，化学位移为5.82020。 在同一行的第8列中，还有另一个原子名称2.H8，其化学位移为7.61004。 基本上，我想检查行中的第一个化学位移（5.82020）是否在某个范围内，以及第二个化学位移（7.49932）是否在另一个范围内。 如果是，则将原子名称（1.H1'和2.H8）写到一个名为tclust.txt的文件中

到目前为止，这是代码，我之前发布了另一个问题，@ wwii帮了我这个代码。

pattern = '''{(\d\.H\d'?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)

j = 0;
contents_atom = []
atom_lines=[]

result = {}
with open("peaks_ee.xpk","r") as atom_name:
    for line in atom_name:
        for match in rex.finditer(line):
            name, shift = match.groups()
            if name not in result:
                result[name] = float(shift)
                print (name,shift)
                    if filename == 'ee_pinkH1.xpk':
                        if result[name]<=8.5
                            float_str = re.findall("\d\.\H\d'?",name)
                            if (len(float_str))>1:
                                j=j+1
                                value1 = ('Atom ' + str(j) + ' ' + str(float_str[0])+ ' ' + str(float_str[1])+ '\n')
                                atom_lines.insert(-1,value1)

tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
    tclust_atom.write(value1)
tclust_atom.close()

这是从行print (names,shift)打印出的原子名称及其化学位移列表的图片print (names,shift)

原子名称和化学位移

从该图中，前两行是：

“ 1.H1'”，“ 5.82020”，“ 2.H8”，“ 7.61004”，但前两行实际上仅来自peaks_ee.xpk的第一行，我想看看“ 5.82020”是否介于5.1和6，如果7.61004在7和8.25之间。 有没有办法可以通过使用字典的值来做到这一点？ 我注意到，每隔两行将是我想要查看的值（如果它们介于5.1到6之间），而交替值则是我想要查看的值，如果它们介于7和8.25之间。

编辑：这是我完整的代码：

import pandas as pd
import os
import sys
import re

i=0;
contents_peak=[]
peak_lines=[]
with open ("ee_pinkH1.xpk","r") as peakPPM:
    for PPM in peakPPM.readlines():
        float_num = re.findall("[\s][1-9]{1}\.[0-9]+",PPM)
        if (len(float_num)>1):
            i=i+1
            value = ('Peak '+ str(i) + ' ' + str(float_num[0]) + ' 0.05 ' + str(float_num[1]) + ' 0.05 ' + '\n')
            peak_lines.insert(-1,value)
tclust_peak = open("tclust.txt","w+")
tclust_peak.write("rbclust \n")
for value in peak_lines:
    tclust_peak.write(value)
tclust_peak.close()

pattern = '''{\d\.H\d'?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)

j=0;
contents_atom=[]
atom_lines=[]
result = {}
with open("peaks_ee.xpk","r") as atomName:
    for name in atomName:
        for match in rex.finditer(line):
            name,shift = match.groups()
            print (name,shift)
            if name not in result:
                result[name]=float(shift)
                float_str = re.findall("\d\.H\d'?",name)
                if (len(float_str)>1):
                    j=j+1
                    value1 = ('Atom ' +str(j)+ ' ' + str(float_str[0])+ ' ' + str(float_str[1]) + '\n')
                    atom_lines.insert(-1,value)

df = pd.read_csv("D:/tmp/peaks_ee.xpk", sep= " ", skiprows=5)

shift1= df["1H.P"]
shift2= df["1H_2.P"]

mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))

result = df[mask]
result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
print result

tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
    tclust_atom.write(value1)
tclust_atom.close()

这是我得到的错误：

Traceback (most recent call last):
  File "pandas.py", line 1, in <module>
    import pandas as pd 
  File "/Users/malaikaiyer/Downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/pandas.py", line 23, in <module>
rex = re.compile(pattern)
  File "/Users/malaikaiyer/Downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/lib/jython-standalone-2.7.0.jar/Lib/re.py", line 190, in compile
  File "/Users/malaikaiyer/Downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/lib/jython-standalone-2.7.0.jar/Lib/re.py", line 242, in _compile
sre_constants.error: unbalanced parenthesis

编辑：最新代码7/26：

import pandas as pd
import os
import sys
import re
import csv 

i=0;
contents_peak=[]
peak_lines=[]
with open ("ee_pinkH1.xpk","r") as peakPPM:
    for PPM in peakPPM.readlines():
        float_num = re.findall("[\s][1-9]{1}\.[0-9]+",PPM)
        if (len(float_num)>1):
            i=i+1
            value = ('Peak '+ str(i) + ' ' + str(float_num[0]) + ' 0.05 ' + str(float_num[1]) + ' 0.05 ' + '\n')
            peak_lines.insert(-1,value)
tclust_peak = open("tclust.txt","w+")
tclust_peak.write("rbclust \n")
for value in peak_lines:
    tclust_peak.write(value)
tclust_peak.close()

pattern = ‘’’{(\d\.H\d’?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)

j=0;
contents_atom=[]
atom_lines=[]
result = {}
text = ‘ee’

if text == ‘ee’:
    df = pd.read_csv('peaks_ee.xpk', sep=" ",skiprows=5)

    shift1= df["1H.P"]
    shift= df["1H_2.P"]
    if filename == 'ee_pinkH1.xpk'
        mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))
    elif filename == 'ee_pinkH2.xpk'
        mask = ((shift1>3.25)&(shift1<5))&((shift2>7)&(shift2<8.5))
    result = df[mask]
    result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
    result.to_csv("result.csv")

if text == ‘ef’:
    df = pd.read_csv('peaks_ef.xpk', sep=" ",skiprows=5)

    shift1= df["1H.P"]
    shift2= df["1H_2.P"]
    if filename == ‘ef_blue.xpk’:
        mask = ((shift1>5) & (shift1<6)) & ((shift2>7.25) & (shift2<8.25))
    elif filename == ‘ef_green.xpk’:
        mask = ((shift1>7) & (shift1<9)) & ((shift2>5.25) & (shift2<6.2))
    elif filename == ‘ef_orange:
        mask = ((shift1>3) & (shift1<5)) & ((shift2>5.2) & (shift2<6.25))
    result = df[mask]
    result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
    result.to_csv("result.csv")

if text == ‘fe’:
    df = pd.read_csv('peaks_fe.xpk', sep=" ",skiprows=5)

    shift1= df[“Atom1”]
    shift2= df[“Atom2”]
    if filename == ‘fe_yellow’:
        mask = ((shift1>3) & (shift1<5)) & ((shift2>5) & (shift2<6))
    elif filename == ‘fe_green’:
        mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))
        result = df[mask]
        result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
        result.to_csv("result.csv")

tclust_peak = open("tclust.txt”,”a")
tclust_peak.write((str(result))
tclust_atom.close()

Answer 1

您可以尝试使用pandas软件包。

以下代码将加载您的文件，并跳过前五行以加载所需的数据。 然后，它会在各列之间进行按位检查以创建掩码，最后选择所需的列。

import pandas as pd
df = pd.read_csv("peaks_ee.xpk", sep=" ", skiprows=5)

shift1 = df["1H.P"]
shift2 = df["1H_2.P"]

mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))

result = df[mask]
result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]

结果如下：

>>> result
       1H.L     1H.P  1H_2.L   1H_2.P
0   {1.H1'}  5.82020  {2.H8}  7.61004
3   {1.H1'}  5.82020  {1.H8}  8.13712
5   {2.H1'}  5.90291  {2.H8}  7.61004
8   {1.H1'}  5.82020  {2.H8}  7.61004
11  {4.H1'}  5.74125  {3.H6}  7.53261
12  {3.H1'}  5.54935  {4.H8}  7.49932
15  {3.H1'}  5.54935  {3.H6}  7.53261
18  {2.H1'}  5.90291  {3.H6}  7.53261
21  {4.H1'}  5.74125  {4.H8}  7.49932
24  {3.H1'}  5.54935  {4.H8}  7.49932

然后，如果需要，可以将result导出到csv文件，如下所示：

result.to_csv("result.csv")

我不确定这段代码是否正是您所需要的，但是对于您如何使用pandas可能是一个不错的开始。

如何检查字典中的第二个值是否在特定范围内？

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-07-25 19:33:05

如何检查字典中的第二个值是否在特定范围内？

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-07-25 19:33:05

解决方案1
1 已采纳 2017-07-25 19:33:05