簡體   English   中英

如何檢查字典中的第二個值是否在特定范圍內?

[英]How can I check whether every second value from a dictionary is in a specific range?

我有一本字典,它從一個名為peaks_ee.xpk的文件中讀取。

來自peaks_ee.xpk的樣本:

label dataset sw sf
1H 1H_2
NOESY_F1eF2e.nv
4807.69238281 4803.07373047
600.402832031 600.402832031
1H.L 1H.P 1H.W 1H.B 1H.E 1H.J 1H.U 1H_2.L 1H_2.P 1H_2.W 1H_2.B 1H_2.E 1H_2.J 1H_2.U vol int stat comment flag0 flag8 flag9
0 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
1 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
2 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
3 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
4 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
5 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
6 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
7 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
8 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
9 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
10 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
11 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
12 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
13 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
14 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
15 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
16 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
17 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
18 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
19 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
20 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
21 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
22 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
23 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
24 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0`

例如,在peaks_ee.xpk的第0行中,原子名稱為1.H1',化學位移為5.82020。 在同一行的第8列中,還有另一個原子名稱2.H8,其化學位移為7.61004。 基本上,我想檢查行中的第一個化學位移(5.82020)是否在某個范圍內,以及第二個化學位移(7.49932)是否在另一個范圍內。 如果是,則將原子名稱(1.H1'和2.H8)寫到一個名為tclust.txt的文件中

到目前為止,這是代碼,我之前發布了另一個問題,@ wwii幫了我這個代碼。

pattern = '''{(\d\.H\d'?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)

j = 0;
contents_atom = []
atom_lines=[]

result = {}
with open("peaks_ee.xpk","r") as atom_name:
    for line in atom_name:
        for match in rex.finditer(line):
            name, shift = match.groups()
            if name not in result:
                result[name] = float(shift)
                print (name,shift)
                    if filename == 'ee_pinkH1.xpk':
                        if result[name]<=8.5
                            float_str = re.findall("\d\.\H\d'?",name)
                            if (len(float_str))>1:
                                j=j+1
                                value1 = ('Atom ' + str(j) + ' ' + str(float_str[0])+ ' ' + str(float_str[1])+ '\n')
                                atom_lines.insert(-1,value1)

tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
    tclust_atom.write(value1)
tclust_atom.close()

這是從行print (names,shift)打印出的原子名稱及其化學位移列表的圖片print (names,shift)

原子名稱和化學位移

從該圖中,前兩行是:

“ 1.H1'”,“ 5.82020”,“ 2.H8”,“ 7.61004”,但前兩行實際上僅來自peaks_ee.xpk的第一行,我想看看“ 5.82020”是否介於5.1和6,如果7.61004在7和8.25之間。 有沒有辦法可以通過使用字典的值來做到這一點? 我注意到,每隔兩行將是我想要查看的值(如果它們介於5.1到6之間),而交替值則是我想要查看的值,如果它們介於7和8.25之間。

編輯:這是我完整的代碼:

import pandas as pd
import os
import sys
import re

i=0;
contents_peak=[]
peak_lines=[]
with open ("ee_pinkH1.xpk","r") as peakPPM:
    for PPM in peakPPM.readlines():
        float_num = re.findall("[\s][1-9]{1}\.[0-9]+",PPM)
        if (len(float_num)>1):
            i=i+1
            value = ('Peak '+ str(i) + ' ' + str(float_num[0]) + ' 0.05 ' + str(float_num[1]) + ' 0.05 ' + '\n')
            peak_lines.insert(-1,value)
tclust_peak = open("tclust.txt","w+")
tclust_peak.write("rbclust \n")
for value in peak_lines:
    tclust_peak.write(value)
tclust_peak.close()

pattern = '''{\d\.H\d'?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)

j=0;
contents_atom=[]
atom_lines=[]
result = {}
with open("peaks_ee.xpk","r") as atomName:
    for name in atomName:
        for match in rex.finditer(line):
            name,shift = match.groups()
            print (name,shift)
            if name not in result:
                result[name]=float(shift)
                float_str = re.findall("\d\.H\d'?",name)
                if (len(float_str)>1):
                    j=j+1
                    value1 = ('Atom ' +str(j)+ ' ' + str(float_str[0])+ ' ' + str(float_str[1]) + '\n')
                    atom_lines.insert(-1,value)

df = pd.read_csv("D:/tmp/peaks_ee.xpk", sep= " ", skiprows=5)

shift1= df["1H.P"]
shift2= df["1H_2.P"]

mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))

result = df[mask]
result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
print result

tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
    tclust_atom.write(value1)
tclust_atom.close()

這是我得到的錯誤:

Traceback (most recent call last):
  File "pandas.py", line 1, in <module>
    import pandas as pd 
  File "/Users/malaikaiyer/Downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/pandas.py", line 23, in <module>
rex = re.compile(pattern)
  File "/Users/malaikaiyer/Downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/lib/jython-standalone-2.7.0.jar/Lib/re.py", line 190, in compile
  File "/Users/malaikaiyer/Downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/lib/jython-standalone-2.7.0.jar/Lib/re.py", line 242, in _compile
sre_constants.error: unbalanced parenthesis

編輯:最新代碼7/26:

import pandas as pd
import os
import sys
import re
import csv 

i=0;
contents_peak=[]
peak_lines=[]
with open ("ee_pinkH1.xpk","r") as peakPPM:
    for PPM in peakPPM.readlines():
        float_num = re.findall("[\s][1-9]{1}\.[0-9]+",PPM)
        if (len(float_num)>1):
            i=i+1
            value = ('Peak '+ str(i) + ' ' + str(float_num[0]) + ' 0.05 ' + str(float_num[1]) + ' 0.05 ' + '\n')
            peak_lines.insert(-1,value)
tclust_peak = open("tclust.txt","w+")
tclust_peak.write("rbclust \n")
for value in peak_lines:
    tclust_peak.write(value)
tclust_peak.close()

pattern = ‘’’{(\d\.H\d’?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)

j=0;
contents_atom=[]
atom_lines=[]
result = {}
text = ‘ee’

if text == ‘ee’:
    df = pd.read_csv('peaks_ee.xpk', sep=" ",skiprows=5)

    shift1= df["1H.P"]
    shift= df["1H_2.P"]
    if filename == 'ee_pinkH1.xpk'
        mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))
    elif filename == 'ee_pinkH2.xpk'
        mask = ((shift1>3.25)&(shift1<5))&((shift2>7)&(shift2<8.5))
    result = df[mask]
    result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
    result.to_csv("result.csv")

if text == ‘ef’:
    df = pd.read_csv('peaks_ef.xpk', sep=" ",skiprows=5)

    shift1= df["1H.P"]
    shift2= df["1H_2.P"]
    if filename == ‘ef_blue.xpk’:
        mask = ((shift1>5) & (shift1<6)) & ((shift2>7.25) & (shift2<8.25))
    elif filename == ‘ef_green.xpk’:
        mask = ((shift1>7) & (shift1<9)) & ((shift2>5.25) & (shift2<6.2))
    elif filename == ‘ef_orange:
        mask = ((shift1>3) & (shift1<5)) & ((shift2>5.2) & (shift2<6.25))
    result = df[mask]
    result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
    result.to_csv("result.csv")

if text == ‘fe’:
    df = pd.read_csv('peaks_fe.xpk', sep=" ",skiprows=5)

    shift1= df[“Atom1”]
    shift2= df[“Atom2”]
    if filename == ‘fe_yellow’:
        mask = ((shift1>3) & (shift1<5)) & ((shift2>5) & (shift2<6))
    elif filename == ‘fe_green’:
        mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))
        result = df[mask]
        result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
        result.to_csv("result.csv")

tclust_peak = open("tclust.txt”,”a")
tclust_peak.write((str(result))
tclust_atom.close()

您可以嘗試使用pandas軟件包。

以下代碼將加載您的文件,並跳過前五行以加載所需的數據。 然后,它會在各列之間進行按位檢查以創建掩碼,最后選擇所需的列。

import pandas as pd
df = pd.read_csv("peaks_ee.xpk", sep=" ", skiprows=5)

shift1 = df["1H.P"]
shift2 = df["1H_2.P"]

mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))

result = df[mask]
result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]

結果如下:

>>> result
       1H.L     1H.P  1H_2.L   1H_2.P
0   {1.H1'}  5.82020  {2.H8}  7.61004
3   {1.H1'}  5.82020  {1.H8}  8.13712
5   {2.H1'}  5.90291  {2.H8}  7.61004
8   {1.H1'}  5.82020  {2.H8}  7.61004
11  {4.H1'}  5.74125  {3.H6}  7.53261
12  {3.H1'}  5.54935  {4.H8}  7.49932
15  {3.H1'}  5.54935  {3.H6}  7.53261
18  {2.H1'}  5.90291  {3.H6}  7.53261
21  {4.H1'}  5.74125  {4.H8}  7.49932
24  {3.H1'}  5.54935  {4.H8}  7.49932

然后,如果需要,可以將result導出到csv文件,如下所示:

result.to_csv("result.csv")

我不確定這段代碼是否正是您所需要的,但是對於您如何使用pandas可能是一個不錯的開始。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM