简体   繁体   中英

Split linux files based on condition

I have a file in linux . The contents of the file are below.

Test_12
Test_abc
start_1
start_abcd
end_123
end_abcde_12

Now I want to split the file into multiple small files based on matching string that comes after the first underscore

Ouput:

  • Test.txt:

     Test_12 Test_abc 
  • start.txt:

     start_1 start_abcd 
  • end.txt:

     end_123 end_abcde_12 

I have tried like below

while read -r line ; do
    echo "$line" >> "${line}.txt"  
done < split.txt

But I got files for each line.

What am I doing wrong here and how can I get my desired output?

Better to use awk for this:

awk -F_ 'p && $1 != p{close(fn)} {p=$1; fn=p ".txt"; print>>fn} END{close(fn)}' split.txt

There is little bit of extra handling to close the files when value in first column changes so that we don't have too many open files if your input file is huge.

You need to trim the underscore and trailing text from each line. %%_* does that:

while read -r line ; do
    echo "$line" >> "${line%%_*}.txt"  
done < split.txt

Explanation:

  • % : trim trailing text
  • %% : find the longest possible match
  • _* : an underscore and everything after

Read why-is-using-a-shell-loop-to-process-text-considered-bad-practice and then just use awk.

With GNU awk all you need is:

awk -F'_' '{print > ($1".txt")}' file

Otherwise with other awks, if your input file is grouped by the first field as shown in your question then all you need is:

awk -F'_' '{f=$1".txt"; print > f} f!=p{close(p); p=f}' file

and if it isn't then it's just slightly less efficient as you may need to re-open a file that was previously closed (hence the >> instead of > ):

awk -F'_' '{f=$1".txt"; print >> f} f!=p{close(p); p=f}' file

Can you try this:

while read line; do
    content=`echo $line|awk 'BEGIN{FS="_"}{print $1}'`
    for f in *; do
        filename=`echo $f|awk 'BEGIN{FS="."}{print $1}'`
        if [ "$content" == "$filename" ]; then
            echo $line>>$f
            break
        else
            echo $line>>$content.txt
            break
        fi
    done
done< file.txt

Output:

bash-4.4$ ls -lrt
total 12
-rw-r--r-- 1 21726 21726 978 Sep 22 04:54 README.txt
-rw-r--r-- 1 21726 21726  49 Sep 22 04:56 file.txt
-rwxr-xr-x 1 21726 21726 252 Sep 22 05:06 script.sh
bash-4.4$ cat file.txt
Test_12
Test_abc
Start_1
Start_abc
end_1
end_abc
bash-4.4$ ./script.sh
bash-4.4$ ls -lrt
total 24
-rw-r--r-- 1 21726 21726 978 Sep 22 04:54 README.txt
-rw-r--r-- 1 21726 21726  49 Sep 22 04:56 file.txt
-rwxr-xr-x 1 21726 21726 252 Sep 22 05:06 script.sh
-rw-r--r-- 1 21726 21726  17 Sep 22 05:06 Test.txt
-rw-r--r-- 1 21726 21726  18 Sep 22 05:06 Start.txt
-rw-r--r-- 1 21726 21726  14 Sep 22 05:06 end.txt
bash-4.4$ cat Start.txt
Start_1
Start_abc
bash-4.4$ cat Test.txt
Test_12
Test_abc
bash-4.4$ cat end.txt
end_1
end_abc

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM