简体   繁体   中英

How can I replace values from one text file with other values from another text file only if certain values are equal?

I have a file called finalscores.txt and I want to to create a python script which will open it and read in values from two separate columns.

This is my finalscores.txt file

     Atom nVa  predppm   avgppm    stdev    delta    QPred   QMulti   qTotal
  7.H2   2    7.674    7.853    0.000    0.000    0.968    1.000    0.993
  9.H2   2    7.434    7.458    0.000    0.001    0.996    1.000    0.999
 20.H2   1    7.602    7.898    0.000    0.000    0.945    1.000    0.982
 21.H2   1    7.959    8.113    0.000    0.000    0.972    1.000    0.991
 8.H1'   2    5.363    5.238    0.002    0.003    0.978    0.997    0.993
22.H1'   2    5.593    5.523    0.002    0.003    0.988    0.997    0.995
10.H1'   1    5.378    5.426    0.000    0.000    0.992    1.000    0.997
19.H1'   1    5.691    5.681    0.000    0.000    0.998    1.000    0.999
score: 0.9941270604681679

The values I want to take in are from the first, "Atom" column, and the fourth "avgppm" columns. I don't want to take in the first line:

Atom nVa predppm avgppm stdev delta QPred QMulti qTotal

or the last line: score: 0.9941270604681679

I have another file called pinkH1_ppm.txt and I want to open it and append it. This is what my pinkH1_ppm.txt looks like:

2.H8 7.61004 0.3
1.H8 8.13712 0.3
3.H6 7.53261 0.3
4.H8 7.49932 0.3
5.H6 7.72158 0.3
7.H8 8.16859 0.3
6.H6 7.70272 0.3
9.H8 8.1053 0.3
8.H6 7.65014 0.3
10.H6 7.5231 0.3
11.H6 7.58213 0.3
12.H6 7.72805 0.3
13.H6 8.02977 0.3
14.H6 7.69624 0.3
15.H8 7.82994 0.3
17.H8 7.24899 0.3
18.H6 7.6439 0.3
20.H8 7.78512 0.3
19.H8 7.65501 0.3
22.H8 7.47677 0.3
23.H6 7.7306 0.3
24.H6 7.80104 0.3
25.H8 7.67295 0.3
26.H6 7.67463 0.3
27.H6 7.64807 0.3
1.H1' 5.8202 0.3
2.H1' 5.90291 0.3
4.H1' 5.74125 0.3
3.H1' 5.54935 0.3
6.H1' 5.54297 0.3
8.H1' 5.36287 0.3
11.H1' 5.50093 0.3
10.H1' 5.37814 0.3
14.H1' 5.96177 0.3
15.H1' 5.959 0.3
17.H1' 5.75214 0.3
19.H1' 5.69108 0.3
22.H1' 5.59257 0.3
24.H1' 5.55313 0.3
25.H1' 5.70819 0.3
27.H1' 5.74236 0.3
26.H1' 5.48061 0.3

I want to check if any values from the "Atom column" in my finalscores.txt , match any values in the first column of pinkH1_ppm.txt and if they do, I want to replace the second column in my pinkH1_ppm.txt with the value for that Atom from my finalscores.txt file.

So for example, in finalscores.txt , 19.H1' is the Atom which can also be found in pinkH1_ppm.txt so I would like to replace the value in the second column of pinkH1_ppm.txt , that also corresponds to 19.H1', which is 5.69108 with 5.681.

This is my code so far:

import pandas as pd
import os
import sys
import re 

filename = 'finalscore.txt'
ppmColor = 'pinkH1_ppm.txt'

df = pd.read_cv(filename,sep = " ", skiprows = 1)

col1 = df["Atom"]
col2 = df["avgppm"]

df2 = pd.read_cv(ppmColor,sep = " ", skiprows=0)
name = df2[0]
ppm = df2[1]
with open(ppmColor, "a") as ppmAppend:
    for line in ppmAppend
        if col1 == name: 

I'm trying to use pandas. I'm very unsure about my second dataframe df2 because there is no header in the ppmColor file, I want to start reading it from the first line. I thought using pandas would be the best idea but I'm not sure how to approach this exactly.

Error: replaceppm.py:10: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'. replaceppm.py:10: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support skipfooter; you can avoid this warning by specifying engine='python'.

df=pd.read_csv('finalscore.txt',sep=r'\\s+',skipfooter=1)

Traceback (most recent call last):

File"replaceppm.py", line 18, in <module>

pink.set_index("Atom",inplace=True)

NameError: name 'pink' is not defined

If you read the two csvs into pandas DataFrame objects, then it is a matter of updating the second one with values from the first. The update method requires the two dataframes to have a similar index.

import pandas as pd                                                                                                                                                                 

df = pd.read_csv('finalscores.csv', sep=r'\s+', engine='python', skipfooter=1)                                                                                                                   
df = df.ix[:, ['Atom', 'avgppm']]                                                                                                                                                   
pink = pd.read_csv('pinkH1_ppm.txt', sep=r'\s+', header=None, names=('Atom', 'avgppm', 'x'))                                                                                               

df.set_index('Atom', inplace=True)                                                                                                                                                  
pink.set_index('Atom', inplace=True)                                                                                                                                                
pink.update(df)     

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM