EDITED FOR CLARIFICATION
Thanks to @KamilCuk, @Jetchisel and @chepner for explaining it in a way that made sense
First, apologies for any noobness. I am not a coder. I am currently using OSX 10.14.6 and the OSX standard terminal.
Short issue:
sed '5d' *.txt
is NOT deleting the 5th line of each text file in a directory.
Background
I have thousands of plain text news articles that I will be using to conduct a corpus analysis. As such, I want to strip irrelevant text information from the files.
The articles are all in the following format (line numbers added for clarity):
1. <blank line>
2. <article heading>
3. <date>
4. <blank line>
5. Body
The word "Body" always occurs at line 5, is always capitalised, and is always by itself.
I want to strip either only line 5, or only lines that have the word "Body" by itself (as the articles will almost certainly include the word "Body" in them).
From reading a lot of pages, the following should work:
sed '5d' file
So, in my case:
sed '5d' *.txt
However, this is not working for me, nor any other variation I have tried (using either * or *.txt).
sed -i '5d' *.txt
sed -i '' '5d' *.txt
sed -e '5d' *.txt
Invariably it deletes the 5th line of the FIRST file, but none of the rest of the files in the directory, so SOMETHING is working.
Alternatively, is there a way to specify deleting the string "Body" when it is the only word on a line?
Clearly I have the wrong end of the stick here, so any direction would be appreciated.
From the POSIX specification:
An address is either a decimal number that counts input lines cumulatively across files, a '$' character that addresses the last line of input, or a context address (which consists of a BRE, as described in Regular Expressions in sed, preceded and followed by a delimiter, usually a ).
So the command 5d
only deletes the 5th line of the files taken as a whole, but you want the 5th line of each file.
I don't see anyway to "reset" the address, so you'll have to specify a context address.
sed -i '' '/^Body$/d' *.txt
This will delete each line that consists only of the word Body
; the ^
matches the beginning of a line, the $
the end.
Alternatively, just run sed
separately for each file.
for f in *.txt; do sed -i '' '5d' "$f"; done
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.