简体   繁体   中英

How to split a text file into blocks with 10+ characters without dividing words using sed in Linux?

I want to come up with a sed command where once every 10 character will look for the nearest space and substitute it with "|"

I tried sed -E -e 's/ /|/\( *?[0-9a-zA-Z]*\)\{10,\}' new.file , but it shows errors.

Example input:

Hello there! How are you? I am trying to figure this out.

Expected Output:

Hello there!|How are you?|I am trying|to figure this|out.

This works for given sample:

$ sed -E 's/(.{10}[^ ]*) /\1|/g' ip.txt
Hello there!|How are you?|I am trying|to figure this|out.
  • (.{10}[^ ]*) this matches 10 characters, followed by any non-space characters
  • then a space is matched
  • \1| put back captured portion and a | character

Building upon Sundeep's solution , you may

  • Add support for any whitespace by replacing spaces with [[:space:]] and non-space with [^[:space:]]
  • Replace any chunk of one or more whitespace with a pipe if you add + (POSIX ERE) or \{1,\} (POSIX BRE).

You can use

sed 's/\(.\{10\}[^[:space:]]*\)[[:space:]]\{1,\}/\1|/g' ip.txt
sed -E 's/(.{10}[^[:space:]]*)[[:space:]]+/\1|/g' ip.txt

See the online demo :

#!/bin/bash
s='Hello there! How are you? I am trying to figure this out.'
sed 's/\(.\{10\}[^[:space:]]*\)[[:space:]]\{1,\}/\1|/g' <<< "$s"
sed -E 's/(.{10}[^[:space:]]*)[[:space:]]+/\1|/g' <<< "$s"

Output:

Hello there!|How are you?|I am trying|to figure this|out.
Hello there!|How are you?|I am trying|to figure this|out.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM