简体   繁体   中英

How to pipe a file back into a loop in bash?

I'm trying to figure a way to remove a pair of lines from a file, the first line in the pair contains a unique id, and the second line a string. I was thinking something along the lines of

for i in $(cat idlist.txt ); do grep -v -A1 $i file1 

However I'm not sure how to pipe the output of the loop back into it with each iteration? Any tips?

The file I'm altering is basically in the format of

uniqueID.1
OJNEFONEOIWENWEJNEWEJ
uniqueID.2
HHTHANJAHTNTHAJNTEOEJ

There are some ids + strings I want gone.

Thanks

awk may be a good choice of tool in this case. Here's a quick version of the basic idea, wrapped in a bash script:

#!/bin/bash

awk '
FNR == 1 { filenum++ }
filenum == 1 { ids[$0] = 1 }
filenum == 2 {
    if ((FNR % 2) == 1) { id = $0 }
    else if (ids[id] != 1) { print id; print }
}
' idlist.txt file.txt

The idea is to process the idlist file by adding tags to ignore to an associative array ids , and then process the second file in pairs of lines, noting the id of the first line, and then printing it and the next line if the id isn't in ids .

The same mv stuff as has already been suggested can work here, if you need to modify the file "in place".

Transcript:

$ cat idlist.txt 
id.2
id.4
id.6
$ cat file.txt 
id.1
stuff 1
id.2
stuff 2
id.3
stuff 3
id.4
stuff 4
id.5
stuff 5
id.6
stuff 6
id.7
stuff 7
$ ./skipper.sh 
id.1
stuff 1
id.3
stuff 3
id.5
stuff 5
id.7
stuff 7

It seems very inefficient to read and write the file for each pattern in the list. It would be better to read and process the file just once, removing all the ids in one go.

How to do this depends on what kind of IDs you've got in that file idlist.txt . From the way you pass the patterns to grep , it looks as though they must be words or maybe simple regular expressions, so you could try the following approach.

First, transform the IDs into a sed program:

PROGRAM=$(while read ID; do echo "/$ID/{N;d;}"; done < idlist.txt)

Then use sed to run the program and update the file in-place:

sed -i '' -e "$PROGRAM" -- file1

The way the program works is that /$ID/ matches a line containing the id, and then the N command reads the next line from the file, and the d command deletes both lines. Other lines are printed normally. (Obviously this depends on $ID being a valid basic regular expression that contains no / characters.)

If you have a version of sed that accepts "extended regular expressions" (the -r option to the GNU version of the program, or the -E option to the BSD version), then you could compile all your IDs into a single regular expression:

PROGRAM=$(printf '/('; tr '\n' '|' < idlist.txt; printf '.^)/{N;d;}')
sed -r -i '' -e "$PROGRAM" -- file1

(Here .^ is a regular expression that can't possibly match; it follows the final | in the regular expression to ensure that there are no matches from the final clause in the alternation.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM