I would like substract two consecutives rows of a file. For example:
I have a file with 4,000,000 of lines, with data like this:
2345 345.67
2344 245.34
45678 331.45
45679 339.32
7654 109.42
7655 250.78
So I would like substract two consecutives rows (column 2) and print the absolute result just if the result is bigger or equal than 60. The substract will be two lines by two lines, and print to the first value of colum 1. I mean, I would like to have a result like this:
2345 100.13
7654 141.36
I tried to do it in bash, but is so slowly, and I would like to do this in python, but i dont have idea how do it, i'm new in python. How can I read my file of a direct way and how can I use the python modules? I've readen than dataframe and abs could help me but, how?. Can you guide me please?
Thanks a lot.
x=1
while [ $x -ge 2 ]
do
a= sed -n '1,2p' file.dat| awk 'NR>1{print $1-p} {p=$1}'
sed -n '1,2p' file.dat| awk 'NR>1{print $1-p} {p=$1}'
echo $a >> results.dat
grep -v "$a" file.dat > file.o
mv file.o file.dat
done
~
~
You can actually write the result directly to a file from within Python. For instance like this:
# import regular expression module of python
import re
# open file (replace data.txt with input file name and out.txt with the output file name)
with open('data.txt', 'r') as f, open('out.txt', 'w') as o:
# read the first line (i=0) manually
currentLine = re.findall('\d+\.?\d*', f.readline())
# index i starts with 0 and refers to the currentLine, s.t.
# prevLine
# currentLine [i=0]
# prevLine [i=0]
# currentLine [i=1]
# therefore we only look at every second iteration
for i,line in enumerate(f.readlines()):
# set the previous line to the current line
prevLine = currentLine
# extract numbers
currentLine = re.findall('\d+\.?\d*', line)
if i%2==0: # look only at every second iteration (row 1 - row 2; row 3 - row 4; etc.)
# calculate the absolute difference between rows i and i+1, i.e. abs((i,0)-(i+1,1))
sub = abs(float(prevLine[1])-float(currentLine[1]))
# if this absolute difference is >= 60, print the result
if sub>=60:
outputLine = "%s %s"%(str(prevLine[0]), str(sub))
print(outputLine)
o.write(outputLine+"\n") # write the line to the file 'out.txt'
The output to your data would thus be:
2345 100.33000000000001
7654 141.36
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.