简体   繁体   中英

How do I loop over multiple files to extract specific columns and save as separate files?

I have numerous *.txt files. I want to extract column 3 and 5 from each of these files and save them as new files keeping their oiginal names with new_ extension. I have this bash loop below in trying to do this, but doesn't do what I want. Can someone please help me with this?

for i in *.txt; do
cut -f 3,5 $i  > /media/owner/new_$i_assembly.txt 
done

Simple approach:

for f in *.txt; do
    cut -d$'\t' -f3,5 "$f" > "/media/owner/new_${f}_assembly.txt" 
done

In case if there could be possible whitespace(s) except tabs - you may use the following awk approach:

for f in *.txt; do
    awk '{ print $3,$5 }' OFS='\t' "$f" > "/media/owner/new_${f}_assembly.txt" 
done

You have to make sure and tell Bash explicitly to expand variable $i , otherwise it picks up characters you don't want and expands variable $i_assembly instead:

for i in *.txt; do
   cut -f 3,5 "$i"  > "/media/owner/new_${i}_assembly.txt" 
done

If you don't want the extension included in your new name, use parameter expansion ${i%.*} that removes everything up to the first . included, from the end.

for i in *.txt; do
   cut -f 3,5 "$i"  > "/media/owner/new_${i%.*}_assembly.txt" 
done

If you decide for a different approach that might result in paths, not just filenames (for example: **/*.txt ), you can use parameter expansion once again to get only the name of your file:

for i in **/*.txt; do
   base=${i##*/} 
   base=${base%.*}
   cut -f 3,5 "$i"  > "/media/owner/new_${base}_assembly.txt" 
done

Also note that TAB is the default delimiter for cut , you don't need to specify it with the -d option:

-d, --delimiter=DELIM
      use DELIM instead of TAB for field delimiter

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM