简体   繁体   中英

splitting file with awk command

I was trying to split a file into a training data set and a test data set. I have this error

awk: can't open file -v source line number 1 .

The command line was as follows:

awk -v lines=$(wc -l < data/yelp/yelp_review.v8.csv) -v fact=0.80  'NR <= lines * fact {print > "train.txt"; next} {print > "val.txt"}'  data/yelp/yelp_review.v8.csv

Anybody enlightens me why it was a problem on macbook?

Well .. miken32 has already identified what went wrong with your first attempt. I can't improve on his explanation of the problem.

My suggestion would be that rather than having wc provide your line count, you just do that job with awk itself. Something like this:

awk -v fact=0.8 'NR==FNR{lines++;next} FNR<=lines*fact{print>"train.txt";next} {print>"val.txt"}' "$file" "$file"

Though I'd probably write it more like this:

awk -v fact=0.8 'NR==FNR{lines++;next} {out="val.txt"} FNR<=lines*fact{out="train.txt"} {print > out}' "$file" "$file"

You can decide whether greater elegance is gained by brevity or avoidance of a next . :-)

What does the output from wc -l < data/yelp/yelp_review.v8.csv look like? Something like this perhaps?

      74

So what's going to happen when you drop that into your command?

awk -v lines=     74 -v fact=0.80 ...

As you can see, this isn't going to parse well. Always quote any variable data you use:

awk -v lines="$(wc -l < data/yelp/yelp_review.v8.csv)" -v fact=0.80 ...

Awk is smart enough to trim the spaces from the number before using it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM