[英]Compress ranges of ranges of numbers in bash
I have a csv file named "ranges.csv", which contains:我有一个名为“ranges.csv”的 csv 文件,其中包含:
start_range,stop_range
9702220000,9702220999
9702222000,9702222999
9702223000,9702223999
9750000000,9750000999
9750001000,9750001999
9750002000,9750002999
I am trying to combine the ranges where the stop_range=start_range-1 and output the result in another csv file named "ranges2.csv".我正在尝试将 stop_range=start_range-1 和 output 的范围合并到另一个名为“ranges2.csv”的 csv 文件中。 So the output will be:
所以 output 将是:
9702220000,9702220999
9702222000,9702223999
9750000000,9750002999
Moreover, I need to know how many ranges contains a compress range (example: for the new range 9750000000,9750002999
I need to know that before the compression there were 3 ranges).此外,我需要知道有多少范围包含一个压缩范围(例如:对于新范围
9750000000,9750002999
,我需要知道在压缩之前有 3 个范围)。 This information will help me to create a new csv file named "ranges3.csv" which should contain only the range with the most ranges inside it (the most comprehensive area):此信息将帮助我创建一个名为“ranges3.csv”的新 csv 文件,该文件应仅包含其中范围最多的范围(最全面的区域):
9750000000,9750002999
I was thinking about something like this:我在想这样的事情:
if (stop_range = start_range-1)
new_stop_range = start_range-1
But I am not very smart and I am new to bash scripting.但我不是很聪明,而且我是 bash 脚本的新手。
I know how to output the results in another file but the function for what I need gives me headaches.我知道如何 output 将结果保存在另一个文件中,但是 function 让我头疼。
Assuming your ranges are sorted, then this code gives you the merged ranges only:假设您的范围已排序,则此代码仅为您提供合并的范围:
awk 'BEGIN{FS=OFS=","}
(FNR>1) && ($1!=e+1){print b,e; b=e="" }
($1==e+1){ e=$2; next }
{ b=$1; e=$2 }
END { print b,e }' file
Below you get the same but with the range count:下面你得到相同但范围计数:
awk 'BEGIN{FS=OFS=","}
(FNR>1) && ($1!=e+1){print b,e,c; b=e=c="" }
($1==e+1){ e=$2; c++; next }
{ b=$1; e=$2; c=1 }
END { print b,e,c }' file
If you want the largest one, you can sort on the third column.如果你想要最大的,你可以在第三列排序。 I don't want to make a rule to give the range with the most counts, as there might be multiple.
我不想制定规则来给出计数最多的范围,因为可能有多个。
If you really only want all the ranges with the maximum merge:如果您真的只想要最大合并的所有范围:
awk 'BEGIN{FS=OFS=","}
(FNR>1) && ($1!=e+1){
a[c] = a[c] (a[c]?ORS:"") b OFS e
m=(c>m?c:m)
b=e=c=""
}
($1==e+1){ e=$2; c++; next }
{ b=$1; e=$2; c=1 }
END { a[c] = a[c] (a[c]?ORS:"") b OFS e
m=(c>m?c:m)
print a[m]
}' file
I think this does the trick:我认为这可以解决问题:
#!/bin/bash
awk '
BEGIN { FS = OFS = ","}
NR == 2 {
start = $1; stop = $2; i = 1
}
NR > 2 {
if ($1 == (stop + 1)) {
i++;
stop = $2
} else {
if (++i > max) {
maxr = start "," stop;
max = i
}
start = $1
i = 0
}
stop = $2
}
END {
if (++i > max) {
maxr = start "," stop;
}
print maxr
}
' ranges.csv
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.