I'm trying to clean up a text file with over 120,000 lines via a bash script. I need to perform several find and replaces. The order of each find and replace is important and the file needs to 'remember' the previous find and replaces.
example: replace all '.'(period) with '.\n' (period and new line), then
replace all '?'(questions marks) with '?\n' (questionmark and new line), then
replace all ','(period) with '.\n' (exclamation and new line). then... etc..
I'm doing this, but it's not working:
#!/usr/bin/env bash
sed 's/./.\n/g'
sed 's/?/?\n/g'
sed 's/!/!\n/g'
input.txt
What am I doing wrong?
Is sed or awk better for what I'm trying to achieve?
You may always pipe sed
commands, but in this case it makes sense to combine all the conditions into one command:
sed 's/[.!?]/&\n/g' file > newfile
The [.??]
matches .
, !
or ?
and &
in the replacement pattern puts the match value back into the string (the newline is added right after this value).
See the online demo :
s="This is a text. Want more? Yes! End"
sed 's/[.!?]/&\n/g' <<< "$s"
Output:
This is a text.
Want more?
Yes!
End
If you need to get rid of the spaces after ?
, !
and .
use
sed 's/\([.!?]\)[[:space:]]*/\1\n/g' file > newfile
See another sed
demo . Here:
\([.??]\)
- Capturing group 1: matches .
, !
or ?
[[:space:]]*
- 0 or more whitespaces The \1
in the replacement pattern refers to the value captured into Group 1.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.