简体   繁体   中英

Is there an easy way to use a small script to automate reading of a text file and plotting using bash script with awk or other unix commands?

I'm new to using awk and manipulation of text files. I have model output files consisting of >600,000 lines. I want to use in a bash script for automating plotting of x,y points from this file. I use "t" to denote time and "T" to denote temperature, which are x and y respectively. Each row or line has a different number of tT points

Each line consists of the following separated by spaces shown as commas here:

-loglikelihood, -posterior, #Npairs-1, t1, T1, t2, T2...

Is there a way to use a script to start reading the file at "CHAIN" and reading until "CHAIN END" is reached in the file? Each t,T pair from each row needs to be plotted as x,y pairs individually, so all t1, t2, t3, t4, t5, etc. are all the X-values and T1, T2, T3, T4, T5, etc, are all the Y-values.

so for example, if a row or single line looked like this: -400, -430, 3, 500, 50, 350, 100, 200, 20, 0, 5

---- then the 500, 350, 100, 200, and 0 are all X (time) points and the rest are the Y (temperature) points (ie, 50, 100, 20, 5). Therefore 500, 350 is the first x,y pair and so on...

Bonus: I would like to keep the log likelihood value of -400 for each row and associated set of points to then normalize the entire group of 600,000 tT 'paths' from 0-1 for plotting using a color ramp.

Actual data file *** https://drive.google.com/file/d/1DLabBKWbhaX-w4Kp5jxdiuL5afDFZmuX/view?usp=sharing** *

update: I had originally thought transposing to columns would work, but that may be difficult and inefficient, since not only would rows be transposed to columns but the pairs of time-temperature points would need to be split up into two columns per every row read in and all placed side-by-side to be read in correctly

partial answer, perhaps it will provide some hints for the final solution...

you can create the columnar format with a simply change

$ awk 'NR>=45&& NR<=600044 {for(i=5;i<=19;i+=2) print $i,$(i+1)}' input.txt > output.txt

this will drop the likelihood value since it doesn't appear in the output. Next challenge is splitting one pair of columns to multiple column format. If I understood it right, there will be 600,000 x 2 = 1.2M columns in the final output.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM