[英]How can I check check for matching values in a second dataframe, then return a value from a column in the second dataframe?
[英]How can I check whether every second value from a dictionary is in a specific range?
我有一本字典,它從一個名為peaks_ee.xpk的文件中讀取。
來自peaks_ee.xpk的樣本:
label dataset sw sf
1H 1H_2
NOESY_F1eF2e.nv
4807.69238281 4803.07373047
600.402832031 600.402832031
1H.L 1H.P 1H.W 1H.B 1H.E 1H.J 1H.U 1H_2.L 1H_2.P 1H_2.W 1H_2.B 1H_2.E 1H_2.J 1H_2.U vol int stat comment flag0 flag8 flag9
0 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
1 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
2 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
3 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
4 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
5 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
6 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
7 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
8 {1.H1'} 5.82020 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
9 {1.H8} 8.13712 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
10 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
11 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
12 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
13 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
14 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
15 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
16 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
17 {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
18 {2.H1'} 5.90291 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
19 {2.H8} 7.61004 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
20 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
21 {4.H1'} 5.74125 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
22 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
23 {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} {3.H6} 7.53261 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0
24 {3.H1'} 5.54935 0.05000 0.10000 ++ {0.0} {} {4.H8} 7.49932 0.05000 0.10000 ++ {0.0} {} 0.0 100.0000 0 {} 0 0 0`
例如,在peaks_ee.xpk的第0行中,原子名稱為1.H1',化學位移為5.82020。 在同一行的第8列中,還有另一個原子名稱2.H8,其化學位移為7.61004。 基本上,我想檢查行中的第一個化學位移(5.82020)是否在某個范圍內,以及第二個化學位移(7.49932)是否在另一個范圍內。 如果是,則將原子名稱(1.H1'和2.H8)寫到一個名為tclust.txt的文件中
到目前為止,這是代碼,我之前發布了另一個問題,@ wwii幫了我這個代碼。
pattern = '''{(\d\.H\d'?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)
j = 0;
contents_atom = []
atom_lines=[]
result = {}
with open("peaks_ee.xpk","r") as atom_name:
for line in atom_name:
for match in rex.finditer(line):
name, shift = match.groups()
if name not in result:
result[name] = float(shift)
print (name,shift)
if filename == 'ee_pinkH1.xpk':
if result[name]<=8.5
float_str = re.findall("\d\.\H\d'?",name)
if (len(float_str))>1:
j=j+1
value1 = ('Atom ' + str(j) + ' ' + str(float_str[0])+ ' ' + str(float_str[1])+ '\n')
atom_lines.insert(-1,value1)
tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
tclust_atom.write(value1)
tclust_atom.close()
這是從行print (names,shift)
打印出的原子名稱及其化學位移列表的圖片print (names,shift)
從該圖中,前兩行是:
“ 1.H1'”,“ 5.82020”,“ 2.H8”,“ 7.61004”,但前兩行實際上僅來自peaks_ee.xpk的第一行,我想看看“ 5.82020”是否介於5.1和6,如果7.61004在7和8.25之間。 有沒有辦法可以通過使用字典的值來做到這一點? 我注意到,每隔兩行將是我想要查看的值(如果它們介於5.1到6之間),而交替值則是我想要查看的值,如果它們介於7和8.25之間。
編輯:這是我完整的代碼:
import pandas as pd
import os
import sys
import re
i=0;
contents_peak=[]
peak_lines=[]
with open ("ee_pinkH1.xpk","r") as peakPPM:
for PPM in peakPPM.readlines():
float_num = re.findall("[\s][1-9]{1}\.[0-9]+",PPM)
if (len(float_num)>1):
i=i+1
value = ('Peak '+ str(i) + ' ' + str(float_num[0]) + ' 0.05 ' + str(float_num[1]) + ' 0.05 ' + '\n')
peak_lines.insert(-1,value)
tclust_peak = open("tclust.txt","w+")
tclust_peak.write("rbclust \n")
for value in peak_lines:
tclust_peak.write(value)
tclust_peak.close()
pattern = '''{\d\.H\d'?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)
j=0;
contents_atom=[]
atom_lines=[]
result = {}
with open("peaks_ee.xpk","r") as atomName:
for name in atomName:
for match in rex.finditer(line):
name,shift = match.groups()
print (name,shift)
if name not in result:
result[name]=float(shift)
float_str = re.findall("\d\.H\d'?",name)
if (len(float_str)>1):
j=j+1
value1 = ('Atom ' +str(j)+ ' ' + str(float_str[0])+ ' ' + str(float_str[1]) + '\n')
atom_lines.insert(-1,value)
df = pd.read_csv("D:/tmp/peaks_ee.xpk", sep= " ", skiprows=5)
shift1= df["1H.P"]
shift2= df["1H_2.P"]
mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))
result = df[mask]
result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
print result
tclust_atom = open("tclust.txt","a")
for value1 in atom_lines:
tclust_atom.write(value1)
tclust_atom.close()
這是我得到的錯誤:
Traceback (most recent call last):
File "pandas.py", line 1, in <module>
import pandas as pd
File "/Users/malaikaiyer/Downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/pandas.py", line 23, in <module>
rex = re.compile(pattern)
File "/Users/malaikaiyer/Downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/lib/jython-standalone-2.7.0.jar/Lib/re.py", line 190, in compile
File "/Users/malaikaiyer/Downloads/nmrfxstructure/nmrfxstructure/target/structure-10.1.1-bin/structure-10.1.1/lib/jython-standalone-2.7.0.jar/Lib/re.py", line 242, in _compile
sre_constants.error: unbalanced parenthesis
編輯:最新代碼7/26:
import pandas as pd
import os
import sys
import re
import csv
i=0;
contents_peak=[]
peak_lines=[]
with open ("ee_pinkH1.xpk","r") as peakPPM:
for PPM in peakPPM.readlines():
float_num = re.findall("[\s][1-9]{1}\.[0-9]+",PPM)
if (len(float_num)>1):
i=i+1
value = ('Peak '+ str(i) + ' ' + str(float_num[0]) + ' 0.05 ' + str(float_num[1]) + ' 0.05 ' + '\n')
peak_lines.insert(-1,value)
tclust_peak = open("tclust.txt","w+")
tclust_peak.write("rbclust \n")
for value in peak_lines:
tclust_peak.write(value)
tclust_peak.close()
pattern = ‘’’{(\d\.H\d’?)}\s(\d\.\d+)\s'''
rex = re.compile(pattern)
j=0;
contents_atom=[]
atom_lines=[]
result = {}
text = ‘ee’
if text == ‘ee’:
df = pd.read_csv('peaks_ee.xpk', sep=" ",skiprows=5)
shift1= df["1H.P"]
shift= df["1H_2.P"]
if filename == 'ee_pinkH1.xpk'
mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))
elif filename == 'ee_pinkH2.xpk'
mask = ((shift1>3.25)&(shift1<5))&((shift2>7)&(shift2<8.5))
result = df[mask]
result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
result.to_csv("result.csv")
if text == ‘ef’:
df = pd.read_csv('peaks_ef.xpk', sep=" ",skiprows=5)
shift1= df["1H.P"]
shift2= df["1H_2.P"]
if filename == ‘ef_blue.xpk’:
mask = ((shift1>5) & (shift1<6)) & ((shift2>7.25) & (shift2<8.25))
elif filename == ‘ef_green.xpk’:
mask = ((shift1>7) & (shift1<9)) & ((shift2>5.25) & (shift2<6.2))
elif filename == ‘ef_orange:
mask = ((shift1>3) & (shift1<5)) & ((shift2>5.2) & (shift2<6.25))
result = df[mask]
result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
result.to_csv("result.csv")
if text == ‘fe’:
df = pd.read_csv('peaks_fe.xpk', sep=" ",skiprows=5)
shift1= df[“Atom1”]
shift2= df[“Atom2”]
if filename == ‘fe_yellow’:
mask = ((shift1>3) & (shift1<5)) & ((shift2>5) & (shift2<6))
elif filename == ‘fe_green’:
mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))
result = df[mask]
result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
result.to_csv("result.csv")
tclust_peak = open("tclust.txt”,”a")
tclust_peak.write((str(result))
tclust_atom.close()
您可以嘗試使用pandas
軟件包。
以下代碼將加載您的文件,並跳過前五行以加載所需的數據。 然后,它會在各列之間進行按位檢查以創建掩碼,最后選擇所需的列。
import pandas as pd
df = pd.read_csv("peaks_ee.xpk", sep=" ", skiprows=5)
shift1 = df["1H.P"]
shift2 = df["1H_2.P"]
mask = ((shift1>5.1) & (shift1<6)) & ((shift2>7) & (shift2<8.25))
result = df[mask]
result = result[["1H.L","1H.P","1H_2.L","1H_2.P"]]
結果如下:
>>> result
1H.L 1H.P 1H_2.L 1H_2.P
0 {1.H1'} 5.82020 {2.H8} 7.61004
3 {1.H1'} 5.82020 {1.H8} 8.13712
5 {2.H1'} 5.90291 {2.H8} 7.61004
8 {1.H1'} 5.82020 {2.H8} 7.61004
11 {4.H1'} 5.74125 {3.H6} 7.53261
12 {3.H1'} 5.54935 {4.H8} 7.49932
15 {3.H1'} 5.54935 {3.H6} 7.53261
18 {2.H1'} 5.90291 {3.H6} 7.53261
21 {4.H1'} 5.74125 {4.H8} 7.49932
24 {3.H1'} 5.54935 {4.H8} 7.49932
然后,如果需要,可以將result
導出到csv文件,如下所示:
result.to_csv("result.csv")
我不確定這段代碼是否正是您所需要的,但是對於您如何使用pandas
可能是一個不錯的開始。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.