简体   繁体   中英

Matching first column of file with awk, difficulty with quotes

My input file looks like this

Chr1 1
Chr1 2
Chr2 3

And I want to split the input file into multiple files according to Chr in the first column.

There should be two output files Output file 1 (named tmpChr1):

Chr1 1
Chr1 2

Output file 2 (named tmpChr2):

Chr2 3

Here's the code so far:

#!/bin/bash

for((chrom=1;chrom<30;chrom++)); do
echo Chr${chrom}
chr=Chr${chrom}
awk "\$1==$chr{print \$1}" input.txt > tmp$chr
done

The line awk "\\$1==$chr{print \\$1}" is the problem, awk seems to require quotations around $chr to correctly match $1

awk '$1=="Chr1"{print $1}' works and tmpChr1 is made

awk '$1=="$chr"{print $1}' doesn't work either

and neither does awk "$1=='$chr'{print $1}"

Really struggling with the quotations, could anyone shed some light on what I should do?

Never use double quotes around an awk script and never allow shell variables to expand as part of the body of an awk script. See http://cfajohnson.com/shell/cus-faq-2.html#Q24

You are WAY off the mark with your general approach though. All you need is this awk script:

awk '{print > ("tmp"$1)}' file

Look:

$ ls
file
$ cat file
Chr1 1
Chr1 2
Chr2 3
$ awk '{print > ("tmp"$1)}' file
$ ls
file  tmpChr1  tmpChr2
$ cat tmpChr1
Chr1 1
Chr1 2
$ cat tmpChr2
Chr2 3

Any time you write a loop in shell just to manipulate text you have the wrong approach. UNIX shell is an environment from which to call tools with a language to sequence those calls. The UNIX tool to manipulate text is awk. So if you need to manipulate text in UNIX, write an awk script and call it from shell, that's all.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM