I'm trying to figure a way to remove a pair of lines from a file, the first line in the pair contains a unique id, and the second line a string. I was thinking something along the lines of
for i in $(cat idlist.txt ); do grep -v -A1 $i file1
However I'm not sure how to pipe the output of the loop back into it with each iteration? Any tips?
The file I'm altering is basically in the format of
uniqueID.1
OJNEFONEOIWENWEJNEWEJ
uniqueID.2
HHTHANJAHTNTHAJNTEOEJ
There are some ids + strings I want gone.
Thanks
awk
may be a good choice of tool in this case. Here's a quick version of the basic idea, wrapped in a bash
script:
#!/bin/bash
awk '
FNR == 1 { filenum++ }
filenum == 1 { ids[$0] = 1 }
filenum == 2 {
if ((FNR % 2) == 1) { id = $0 }
else if (ids[id] != 1) { print id; print }
}
' idlist.txt file.txt
The idea is to process the idlist file by adding tags to ignore to an associative array ids
, and then process the second file in pairs of lines, noting the id of the first line, and then printing it and the next line if the id isn't in ids
.
The same mv
stuff as has already been suggested can work here, if you need to modify the file "in place".
Transcript:
$ cat idlist.txt
id.2
id.4
id.6
$ cat file.txt
id.1
stuff 1
id.2
stuff 2
id.3
stuff 3
id.4
stuff 4
id.5
stuff 5
id.6
stuff 6
id.7
stuff 7
$ ./skipper.sh
id.1
stuff 1
id.3
stuff 3
id.5
stuff 5
id.7
stuff 7
It seems very inefficient to read and write the file for each pattern in the list. It would be better to read and process the file just once, removing all the ids in one go.
How to do this depends on what kind of IDs you've got in that file idlist.txt
. From the way you pass the patterns to grep
, it looks as though they must be words or maybe simple regular expressions, so you could try the following approach.
First, transform the IDs into a sed
program:
PROGRAM=$(while read ID; do echo "/$ID/{N;d;}"; done < idlist.txt)
Then use sed
to run the program and update the file in-place:
sed -i '' -e "$PROGRAM" -- file1
The way the program works is that /$ID/
matches a line containing the id, and then the N
command reads the next line from the file, and the d
command deletes both lines. Other lines are printed normally. (Obviously this depends on $ID
being a valid basic regular expression that contains no /
characters.)
If you have a version of sed
that accepts "extended regular expressions" (the -r
option to the GNU version of the program, or the -E
option to the BSD version), then you could compile all your IDs into a single regular expression:
PROGRAM=$(printf '/('; tr '\n' '|' < idlist.txt; printf '.^)/{N;d;}')
sed -r -i '' -e "$PROGRAM" -- file1
(Here .^
is a regular expression that can't possibly match; it follows the final |
in the regular expression to ensure that there are no matches from the final clause in the alternation.)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.