简体   繁体   中英

How can I subtract groups of two by two rows of a file on python?

I would like substract two consecutives rows of a file. For example:

I have a file with 4,000,000 of lines, with data like this:

    2345  345.67
    2344  245.34
    45678  331.45
    45679  339.32
    7654   109.42
    7655   250.78

So I would like substract two consecutives rows (column 2) and print the absolute result just if the result is bigger or equal than 60. The substract will be two lines by two lines, and print to the first value of colum 1. I mean, I would like to have a result like this:

    2345   100.13
    7654   141.36

I tried to do it in bash, but is so slowly, and I would like to do this in python, but i dont have idea how do it, i'm new in python. How can I read my file of a direct way and how can I use the python modules? I've readen than dataframe and abs could help me but, how?. Can you guide me please?

Thanks a lot.

x=1

while [ $x -ge 2 ]

do

a= sed -n '1,2p' file.dat| awk 'NR>1{print $1-p} {p=$1}' sed -n '1,2p' file.dat| awk 'NR>1{print $1-p} {p=$1}'

echo $a >> results.dat

grep -v "$a" file.dat > file.o

mv file.o file.dat

done

~
~

You can actually write the result directly to a file from within Python. For instance like this:

# import regular expression module of python
import re
# open file (replace data.txt with input file name and out.txt with the output file name)
with open('data.txt', 'r') as f, open('out.txt', 'w') as o:
    # read the first line (i=0) manually
    currentLine = re.findall('\d+\.?\d*', f.readline())
    # index i starts with 0 and refers to the currentLine, s.t.
    # prevLine
    # currentLine [i=0]
    # prevLine [i=0]
    # currentLine [i=1]
    # therefore we only look at every second iteration
    for i,line in enumerate(f.readlines()):
        # set the previous line to the current line
        prevLine = currentLine
        # extract numbers
        currentLine = re.findall('\d+\.?\d*', line)
        if i%2==0: # look only at every second iteration (row 1 - row 2; row 3 - row 4; etc.)
            # calculate the absolute difference between rows i and i+1, i.e. abs((i,0)-(i+1,1))
            sub = abs(float(prevLine[1])-float(currentLine[1]))
            # if this absolute difference is >= 60, print the result
            if sub>=60:
                outputLine = "%s %s"%(str(prevLine[0]), str(sub))
                print(outputLine)
                o.write(outputLine+"\n") # write the line to the file 'out.txt'

The output to your data would thus be:

2345 100.33000000000001
7654 141.36

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM