简体   繁体   中英

How to match files with same names and merge them in shell script?

I'm trying to merge several files into a single folder with similar naming pattern regardless to the directory.

The file structure is as follows:

20170219-A20-L1-AB1234_S1_R1_001.txt
20170211-B21-L3-AB1234-2_S1_R1_001.txt
20170210-C20-L1-AB1234-3_S1_R1_001.txt  
20170211-B21-L3-AB1234-2_S2_R1_001.txt
20170210-C20-L1-AB1234-3_S2_R1_001.txt

My criterion is to find out the files that contain _S1 and _S2 , and merge all the _S1 files into a new single file and all _S2 files into a new single file.

My expected output can be 20170219-B21-L3-AB1234-2_S1_R1_001_merge.txt and 20170219-B21-L3-AB1234-2_S2_R1_001_merge.txt . I don't have any specific requirement for the merged file name, but I want these merged files to be in the same folder.

I've been trying to use the grep and cut commands, but my for loop isn't working. I'm finding it tough to understand the regular expressions in shell.

Please help me out in constructing the logic.

Both of the previous solutions are suitable if the files you're searching are in your working directory, but won't merge any files from other directories. To recreate your problem I did the following, then attempted to solve it per your initial request:

created files per your specifications:

$ touch $(date +%Y%m%d)_{A,B}{20,21}_L{1,3}_AB1234_{1,3}_S{1,2}_R1_001.txt
$ touch $(date +%Y%m%d)_{A,B}{20,21}_L{1,3}_AB1234_S{1,2}_R1_001.txt
$ ls | wc -l
48

created an argument myText with 48 lines of random text generated Lorem Ipsum:

$ echo "${myText}" | wc -l
    48

Gave each file one of each line from myText :

$ ls -t1 | awk '{print NR" "$0}' | while read i j; do echo "${myText}" | awk -v var=${i} 'NR==var {print}' >> ${j}; done
$ for i in `ls -t1`; do echo -n " ${i}: "; cat ${i}; done
 20170219_B21_L3_AB1234_3_S1_R1_001.txt: This is additional line two
 20170219_B21_L3_AB1234_3_S2_R1_001.txt: line three
...
 20170219_A20_L3_AB1234_S1_R1_001.txt: Phasellus ut quam eu lacus aliquet vehicula.
 20170219_A20_L1_AB1234_S1_R1_001.txt: Proin nec orci accumsan, pharetra sapien sed, gravida arcu.
 20170219_B21_L3_AB1234_S2_R1_001.txt: Lorem ipsum dolor sit amet, consectetur adipiscing elit

Then I merged for all ...S1... and ...S2... files (this would have found any files matching my criteria and from my home directory down; to append instead of overwriting, use cat >> file instead of cat > file - depending on if the files are cleaned up before the script needs to be re-run):

$ find ~ -type f -iname "[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[A,B]*S1*" -exec cat > AB1234_S1_R1_001_merged.txt {} +
$ find ~ -type f -iname "[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[A,B]*S2*" -exec cat > AB1234_S2_R1_001_merged.txt {} +

result:

$ for i in `ls | grep merged`; do echo; echo "--- ${i} ---"; cat ${i}; done

--- AB1234_S1_R1_001_merged.txt ---
Donec et ante tempor, hendrerit est ut, egestas massa.
Donec laoreet erat a sapien finibus venenatis.
Etiam eget urna eu ipsum dapibus aliquet.
Phasellus ut quam eu lacus aliquet vehicula.
Phasellus sed lorem ac odio rutrum vehicula.
Aliquam ac eros ut risus fringilla fringilla.
Curabitur a purus ultricies sem venenatis auctor.
Praesent dignissim justo non diam ultrices, nec fermentum lectus dictum.
Donec imperdiet mi sit amet quam iaculis rhoncus.
Nam vitae neque vehicula, consectetur dui porttitor, placerat libero.
Nulla eget diam iaculis augue interdum posuere.
Fusce a diam ac neque accumsan sagittis.
Sed feugiat mi eget augue euismod, et laoreet urna dictum.
This is additional line two
Vestibulum egestas tellus non justo fringilla viverra eget eu neque.
Aliquam porttitor nisi nec laoreet vestibulum.
Donec congue diam ut leo commodo mattis.
Quisque egestas odio sit amet diam efficitur, non accumsan magna blandit.
Donec convallis metus at iaculis pellentesque.
Nam a ligula venenatis, consectetur lectus et, dictum erat.
Proin nec orci accumsan, pharetra sapien sed, gravida arcu.
Curabitur volutpat nibh nec leo tempus, at sagittis lacus euismod.
Mauris blandit sem ac lectus varius lobortis.
In eu ipsum et felis lobortis dictum.

--- AB1234_S2_R1_001_merged.txt ---
Aenean id orci sit amet lacus tincidunt molestie.
Duis pretium tellus dapibus lorem rhoncus, at tincidunt mauris pellentesque.
Integer hendrerit mauris sit amet nunc aliquam, id congue justo pulvinar.
Praesent dapibus augue ac enim consequat, vitae feugiat enim scelerisque.
This is additional line one
Sed sit amet dolor accumsan, commodo magna at, aliquet neque.
Quisque porttitor sapien sed orci vulputate, ac porta ante sollicitudin.
In malesuada leo sit amet purus accumsan porttitor commodo eu eros.
Integer ut odio elementum, viverra velit at, molestie nulla.
Suspendisse suscipit lorem id suscipit consectetur.
Donec vulputate nibh eget imperdiet volutpat.
Curabitur sit amet libero eget nulla viverra iaculis sit amet eget eros.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Maecenas imperdiet nisl quis arcu blandit, sed pretium mi auctor.
Sed sit amet nunc faucibus, ultricies elit quis, sodales magna.
Nulla pharetra mauris eu quam sollicitudin ornare in et metus.
Ut convallis nibh in tempus fringilla.
In ornare erat quis sodales hendrerit.
Phasellus molestie erat commodo est venenatis, ullamcorper tempus elit hendrerit.
Nam mollis ante in purus suscipit, quis facilisis risus efficitur.
Integer pellentesque sem eget diam ultrices, eget vulputate ante pharetra.
Mauris ac nisl vitae sapien lacinia ornare nec nec felis.
line three
Sed dapibus ipsum eu purus interdum, at varius libero ornare.

did this answer the question?

Something like this:

#!/bin/bash

for i in 'S1' 'S2'
do
    cat *_"$i"_R[0-9]*_[0-9]*.txt > "$i".txt
done

Using the list given in the for statement ( S1 & S2 in this case), cat the files using the regex pattern and send output to a single file for each element in the list. The merged output files will be S1.txt and S2.txt . You can work on the regex to make it more strict if required.

Below will help:

cat *_s1*  > 20170219-B21-L3-AB1234-2_S1_R1_001_merge.txt
cat *_s2*  > 20170219-B21-L3-AB1234-2_S2_R1_001_merge.txt

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM