简体   繁体   English

是否可以使用 Linux 命令或 shell 脚本根据特定条件删除文件中不需要的行?

[英]Is it possible using a Linux command or shell script to remove unwanted rows in a file based on certain condition?

I have a file that has ~12300000 rows of type <timestamp, reading> -我有一个包含~12300000<timestamp, reading>类型的文件 -

1674587549.228 29214
1674587549.226 29384
1674587549.226 27813
1674587549.226 28403
1674587549.228 28445
...
1674587948.998 121
1674587948.998 126
1674587948.999 119
1674587949.000 126
1674587948.996 156
1674587948.997 152
1674587948.998 156
1674588149.225 316
1674588149.226 310
1674588149.223 150
1674588149.224 152
1674588149.225 150
1674588149.225 144
...
1674588149.225 227
1674588149.226 233
1674588149.226 275

The last - first timestamp equals 600 . last - first时间戳等于600 I want to create a new file that starts with row last - nth timestamp till the end.我想创建一个新文件,从last - nth时间戳直到结束。

For example, if n=200 , the new file should start with 1674588149.226-200 ie from 1674587949.000 126 to 1674588149.226 275 .例如,如果n=200 ,新文件应以1674588149.226-200开头,即从1674587949.000 1261674588149.226 275

Can this be done using a linux command / shell script?这可以使用 linux 命令/shell 脚本来完成吗? If so how it can be done?如果是这样怎么办? Thanks.谢谢。

If I understood correctly, you are trying to create files which have a constant and equal number of lines in each, starting from the last one.如果我理解正确的话,您正在尝试创建文件,这些文件从最后一个文件开始,每个文件的行数相同且相等。

If so, this script will perform the task.如果是这样,该脚本将执行该任务。

If you only want one file, then you can remove the logic associated with the looping and index value iterations.如果您只想要一个文件,那么您可以删除与循环和索引值迭代相关的逻辑。

Note: The name of each file corresponds to the first part of the last line in each file (ie last entry of record).注意:每个文件的名称对应于每个文件中最后一行的第一部分(即记录的最后一个条目)。

This example does splitting for groupings of 5 lines.此示例对 5 行的分组进行拆分。 You can replace the 5 by 100 or 200, as you see fit.您可以根据需要将 5 替换为 100 或 200。

#!/bin/bash

input="testdata.txt"
cat >"${input}" <<"EnDoFiNpUt"
1674587948.998 121
1674587948.998 126
1674587948.999 119
1674587948.996 156
1674587948.997 152
1674587948.998 156
1674587949.000 126
1674588149.225 316
1674588149.226 310
1674588149.223 150
1674588149.224 152
1674588149.225 150
1674588149.225 144
1674588149.225 227
1674588149.226 233
1674588149.226 275
EnDoFiNpUt

awk -v slice="5" 'BEGIN{
    split("", data) ;
    dataIDX=0 ;
}
{
    dataIDX++ ;
    data[dataIDX]=$0 ;
}
END{
    #print dataIDX ;

    slLAST=dataIDX ;
    #print slLAST ;

    slFIRST=slLAST-slice+1 ;
    if( slFIRST <= 0 ){
        slFIRST=1 ;
    } ;
    #print slFIRST ;

    k=0 ;
    while( slLAST > 0 ){
        k++;
        split(data[slLAST], datline, " " ) ;
        fname=sprintf("%s__%03d.txt", datline[1], k ) ;
        printf("\t New file: %s\n", fname ) | "cat >&2" ; 

        for( i=slFIRST ; i<=slLAST ; i++){
            print data[i] >fname ;
        } ;

        if( slFIRST == 1 ){
            exit ;
        } ;

        slLAST=slFIRST-1 ;
        slFIRST=slLAST-slice+1 ;
        if( slFIRST <= 0 ){
            slFIRST=1 ;
        } ;
    } ;
}' "${input}"
    

I you only want the last 200 line entries of a log, then the absolute simplest is by using tail .如果您只想要日志的最后 200 行条目,那么最简单的方法就是使用tail Namely

tail -200 log.txt >${newLogName}

If you want to create multiple files of 200 lines each, you could use the sequence如果你想创建多个文件,每个文件 200 行,你可以使用序列

tac log.txt | tail -n +201 | tac >log.remain
mv log.remain log.txt

in a loop that include assigning a unique name for each slice ${newLogName} slice.在一个循环中,包括为每个切片 ${newLogName} 切片分配一个唯一的名称。

OR, you could create a reverse log at the outset, and create the sublists working down the reverse list, but remembering to reverse each individual shortlist before saving those in their final form.或者,您可以在一开始就创建一个反向日志,并创建沿着反向列表工作的子列表,但请记住在将它们保存为最终形式之前先反向每个候选列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM