简体   繁体   English

根据条件分割linux文件

[英]Split linux files based on condition

I have a file in linux . 我在linux有一个文件。 The contents of the file are below. 该文件的内容如下。

Test_12
Test_abc
start_1
start_abcd
end_123
end_abcde_12

Now I want to split the file into multiple small files based on matching string that comes after the first underscore 现在,我想根据第一个underscore之后的匹配字符串将文件拆分为多个小文件

Ouput: 输出继电器:

  • Test.txt: Test.txt:

     Test_12 Test_abc 
  • start.txt: start.txt:

     start_1 start_abcd 
  • end.txt: end.txt:

     end_123 end_abcde_12 

I have tried like below 我已经尝试过如下

while read -r line ; do
    echo "$line" >> "${line}.txt"  
done < split.txt

But I got files for each line. 但是我每一行都有文件。

What am I doing wrong here and how can I get my desired output? 我在这里做错什么,如何获得所需的输出?

Better to use awk for this: 最好使用awk:

awk -F_ 'p && $1 != p{close(fn)} {p=$1; fn=p ".txt"; print>>fn} END{close(fn)}' split.txt

There is little bit of extra handling to close the files when value in first column changes so that we don't have too many open files if your input file is huge. 当第一列的值更改时,关闭文件的操作很少,因此如果您输入的文件很大,打开的文件就不会太多。

You need to trim the underscore and trailing text from each line. 您需要修剪每行的下划线和尾随文本。 %%_* does that: %%_*这样做:

while read -r line ; do
    echo "$line" >> "${line%%_*}.txt"  
done < split.txt

Explanation: 说明:

  • % : trim trailing text % :修剪尾随文本
  • %% : find the longest possible match %% :找到最长的匹配项
  • _* : an underscore and everything after _* :下划线以及之后的所有内容

Read why-is-using-a-shell-loop-to-process-text-considered-bad-practice and then just use awk. 阅读为什么使用壳循环处理文本认为不好的做法 ,然后仅使用awk。

With GNU awk all you need is: 使用GNU awk,您需要做的是:

awk -F'_' '{print > ($1".txt")}' file

Otherwise with other awks, if your input file is grouped by the first field as shown in your question then all you need is: 否则,由于其他问题,如果您的输入文件按第一个字段分组(如您的问题所示),那么您所需要做的就是:

awk -F'_' '{f=$1".txt"; print > f} f!=p{close(p); p=f}' file

and if it isn't then it's just slightly less efficient as you may need to re-open a file that was previously closed (hence the >> instead of > ): 如果不是,则效率会稍低,因为您可能需要重新打开以前关闭的文件(因此,用>>而不是> ):

awk -F'_' '{f=$1".txt"; print >> f} f!=p{close(p); p=f}' file

Can you try this: 你可以尝试一下:

while read line; do
    content=`echo $line|awk 'BEGIN{FS="_"}{print $1}'`
    for f in *; do
        filename=`echo $f|awk 'BEGIN{FS="."}{print $1}'`
        if [ "$content" == "$filename" ]; then
            echo $line>>$f
            break
        else
            echo $line>>$content.txt
            break
        fi
    done
done< file.txt

Output: 输出:

bash-4.4$ ls -lrt
total 12
-rw-r--r-- 1 21726 21726 978 Sep 22 04:54 README.txt
-rw-r--r-- 1 21726 21726  49 Sep 22 04:56 file.txt
-rwxr-xr-x 1 21726 21726 252 Sep 22 05:06 script.sh
bash-4.4$ cat file.txt
Test_12
Test_abc
Start_1
Start_abc
end_1
end_abc
bash-4.4$ ./script.sh
bash-4.4$ ls -lrt
total 24
-rw-r--r-- 1 21726 21726 978 Sep 22 04:54 README.txt
-rw-r--r-- 1 21726 21726  49 Sep 22 04:56 file.txt
-rwxr-xr-x 1 21726 21726 252 Sep 22 05:06 script.sh
-rw-r--r-- 1 21726 21726  17 Sep 22 05:06 Test.txt
-rw-r--r-- 1 21726 21726  18 Sep 22 05:06 Start.txt
-rw-r--r-- 1 21726 21726  14 Sep 22 05:06 end.txt
bash-4.4$ cat Start.txt
Start_1
Start_abc
bash-4.4$ cat Test.txt
Test_12
Test_abc
bash-4.4$ cat end.txt
end_1
end_abc

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM