[英]Sorting a file up to a comment in Linux
Say I have a sort_me.txt file: 说我有一个sort_me.txt文件:
a
d
b
c
f
g
// dont mix the two sections
a
c
d
b
at the moment, I do the obvious sort sort_me.txt
and I get: 目前,我进行了显而易见的
sort sort_me.txt
并且得到:
a
a
b
b
c
c
d
d
// dont mix the two sections
f
g
Which of course is not what I want, what I want is for it to sort the section before the comment and then the section after the comment separately. 当然,这不是我想要的,我想要的是将其分别放在注释之前的部分和注释之后的部分中。
With the desired result as: 预期结果为:
a
b
c
d
f
g
// dont mix the two sections
a
b
c
d
Perl to the rescue: Perl解救:
perl -007 -nE '
@sections = map [ split /\n/ ], split m{^(?=//)}m;
say join "\n", sort @$_ for @sections;
' -- file
-007
reads the whole file instead of processing it line by line (only works if the file isn't huge) -007
读取整个文件,而不是逐行处理(仅在文件不大的情况下有效) @sections
is an array of arrays, outer arrays correspond to sections, inner arrays to individual lines @sections
是一个数组数组,外部数组对应于节,内部数组对应于@sections
If the file is too large to fit into the memory, you need to process it line by line, storing only the current section: 如果文件太大而无法放入内存,则需要逐行处理它,仅存储当前部分:
perl -ne '
sub out { print sort @lines; @lines = $_ }
if (m{^//}) { out() }
else { push @lines, $_ }
END { out() }
' -- file
Without perl you can do it with a script like this: 如果没有perl,则可以使用以下脚本来实现:
#!/bin/bash
FILE_NAME=$1
SEPARATOR='//'
LINE_NUMBER=`grep -n $SEPARATOR $FILE_NAME | cut -f1 -d:`
FILE_LENGTH=`wc -l $FILE_NAME | cut -f1 -d\s`
head -$(($LINE_NUMBER-1)) $FILE_NAME | sort
grep $SEPARATOR $FILE_NAME
tail -$(($FILE_LENGTH-$LINE_NUMBER-1)) $FILE_NAME | sort
It searches for the separator line and sort the sections one-by-one. 它搜索分隔线并逐个对部分进行排序。 Of course if you have more than two sections it won't work.
当然,如果您有两个以上的部分,则将无法使用。
I was thinking about using csplit
to split the sections into separate files, but of course there should be easier ways to accomplish this: 我当时正在考虑使用
csplit
将这些部分拆分为单独的文件,但是当然应该有更简单的方法来实现此目的:
#!/bin/bash
linenum=`csplit -z $1 /^$/ {*}`
count=0
output=''
for line in $linenum
do
file=`printf "xx%.2d" $count`
sorted=`cat $file | sort`
output="$output$sorted"
((count++))
done
echo "$output"
Notice that csplit
will create a temporary file for each section, so you might update the above script to unlink each of these ie unlink $file
. 注意,
csplit
将为每个部分创建一个临时文件,因此您可以更新上述脚本以取消链接每个unlink $file
,即unlink $file
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.