简体   繁体   English

如何遍历多个文件以提取特定的列并另存为单独的文件?

[英]How do I loop over multiple files to extract specific columns and save as separate files?

I have numerous *.txt files. 我有许多*.txt文件。 I want to extract column 3 and 5 from each of these files and save them as new files keeping their oiginal names with new_ extension. 我想从这些文件中的每一个中提取第3列和第5列,并将它们另存为新文件,并保留其原始名称并带有new_扩展名。 I have this bash loop below in trying to do this, but doesn't do what I want. 我在尝试执行此操作时遇到了下面的bash循环,但没有执行我想要的操作。 Can someone please help me with this? 有人可以帮我吗?

for i in *.txt; do
cut -f 3,5 $i  > /media/owner/new_$i_assembly.txt 
done

Simple approach: 简单方法:

for f in *.txt; do
    cut -d$'\t' -f3,5 "$f" > "/media/owner/new_${f}_assembly.txt" 
done

In case if there could be possible whitespace(s) except tabs - you may use the following awk approach: 如果除制表符之外可能还有空白,则可以使用以下awk方法:

for f in *.txt; do
    awk '{ print $3,$5 }' OFS='\t' "$f" > "/media/owner/new_${f}_assembly.txt" 
done

You have to make sure and tell Bash explicitly to expand variable $i , otherwise it picks up characters you don't want and expands variable $i_assembly instead: 您必须确保并明确告诉Bash扩展变量$i ,否则它将拾取不需要的字符并扩展变量$i_assembly

for i in *.txt; do
   cut -f 3,5 "$i"  > "/media/owner/new_${i}_assembly.txt" 
done

If you don't want the extension included in your new name, use parameter expansion ${i%.*} that removes everything up to the first . 如果您不希望扩展名包含在新名称中,请使用参数扩展${i%.*}删除所有内容,直到第一个. included, from the end. 从头开始。

for i in *.txt; do
   cut -f 3,5 "$i"  > "/media/owner/new_${i%.*}_assembly.txt" 
done

If you decide for a different approach that might result in paths, not just filenames (for example: **/*.txt ), you can use parameter expansion once again to get only the name of your file: 如果您决定采用一种可能导致路径而不只是文件名的其他方法(例如: **/*.txt ),则可以再次使用参数扩展来仅获取文件名:

for i in **/*.txt; do
   base=${i##*/} 
   base=${base%.*}
   cut -f 3,5 "$i"  > "/media/owner/new_${base}_assembly.txt" 
done

Also note that TAB is the default delimiter for cut , you don't need to specify it with the -d option: 另请注意, TABcut的默认定界符,您无需使用-d选项来指定它:

-d, --delimiter=DELIM
      use DELIM instead of TAB for field delimiter

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM