简体   繁体   中英

Deleting nth line of multiple text files with SED - not working for me on OSX

EDITED FOR CLARIFICATION

Thanks to @KamilCuk, @Jetchisel and @chepner for explaining it in a way that made sense

First, apologies for any noobness. I am not a coder. I am currently using OSX 10.14.6 and the OSX standard terminal.

Short issue:

sed '5d' *.txt

is NOT deleting the 5th line of each text file in a directory.

Background

I have thousands of plain text news articles that I will be using to conduct a corpus analysis. As such, I want to strip irrelevant text information from the files.

The articles are all in the following format (line numbers added for clarity):

1. <blank line>
2. <article heading>
3. <date> 
4. <blank line>
5. Body

The word "Body" always occurs at line 5, is always capitalised, and is always by itself.

I want to strip either only line 5, or only lines that have the word "Body" by itself (as the articles will almost certainly include the word "Body" in them).

From reading a lot of pages, the following should work:

sed '5d' file

So, in my case:

sed '5d' *.txt

However, this is not working for me, nor any other variation I have tried (using either * or *.txt).

sed -i '5d' *.txt

sed -i '' '5d' *.txt

sed -e '5d' *.txt

Invariably it deletes the 5th line of the FIRST file, but none of the rest of the files in the directory, so SOMETHING is working.

Alternatively, is there a way to specify deleting the string "Body" when it is the only word on a line?

Clearly I have the wrong end of the stick here, so any direction would be appreciated.

From the POSIX specification:

An address is either a decimal number that counts input lines cumulatively across files, a '$' character that addresses the last line of input, or a context address (which consists of a BRE, as described in Regular Expressions in sed, preceded and followed by a delimiter, usually a ).

So the command 5d only deletes the 5th line of the files taken as a whole, but you want the 5th line of each file.

I don't see anyway to "reset" the address, so you'll have to specify a context address.

sed -i '' '/^Body$/d' *.txt

This will delete each line that consists only of the word Body ; the ^ matches the beginning of a line, the $ the end.


Alternatively, just run sed separately for each file.

for f in *.txt; do sed -i '' '5d' "$f"; done

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM