简体   繁体   中英

How to split a string or file that may be delimited by a combination of comments and spaces, tabs, newlines, commas, or other characters

If file: list.txt contains really ugly data like so:

aaaa 
#bbbb
cccc, dddd; eeee
 ffff;
    #gggg hhhh
iiii

jjjj,kkkk ;llll;mmmm
nnnn

How do we parse/split that file, excluding the commented lines, delimiting it by all commas, semicolons, and all white-space (including tabs, spaces, and newline and carrage-return characters) with a bash script?

It can be done with the following code:

#!/bin/bash
### read file:
file="list.txt"

IFSO=$IFS
IFS=$'\r\n'
while read line; do
    ### skip lines that begin with a "#" or "<whitespace>#"
    match_pattern="^\s*#"
    if [[ "$line" =~ $match_pattern ]];
        then 
        continue
    fi

    ### replace semicolons and commas with a space everywhere...
    temp_line=(${line//[;|,]/ })

    ### splitting the line at whitespaces requires IFS to be set back to default 
    ### and then back before we get to the next line.
    IFS=$IFSO
    split_line_arr=($temp_line)
    IFS=$'\r\n'
    ### push each word in the split_line_arr onto the final array
    for word in ${split_line_arr[*]}; do
            array+=(${word})
    done
done < $file

echo "Array items:"
for item in ${array[*]} ; do
    printf "   %s\n" $item
done

This was not posed as a question, but rather a better solution to what others have touched upon when answering other related questions. The bit that is unique here is that those other questions/solutions did not really address how to split a string when it is delimited with a combination of spaces and characters and comments; this is one solution that address all three simultaneously...

Related questions:

How to split one string into multiple strings separated by at least one space in bash shell?

How do I split a string on a delimiter in Bash?

Additional notes:

Why do this with bash when other scripting languages are better suited for splitting? A bash script is more likely to have all the libraries it needs when running from a basic upstart or cron (sh) shell, compared with a perl program for example. An argument list is often needed in these situations and we should expect the worst from people who maintain those lists...

Hopefully this post will save bash newbies a lot of time in the future (including me)... Good luck!

使用shell命令:

grep -v "^[ |\t]*#" file|tr ";," "\n"|awk '$1=$1'

sed 's/[# \\t,]/REPLACEMENT/g' input.txt

  • above command replaces comment characters ( '#' ), spaces ( ' ' ), tabs ( '\\t' ), and commas ( ',' ) with an arbitrary string ( 'REPLACEMENT' )

  • to replace newlines, you can try:

sed 's/[# \\t,]/replacement/g' input.txt | tr '\\n' 'REPLACEMENT'

if you have Ruby on your system

File.open("file").each_line do |line|
  next if line[/^\s*#/]
  puts line.split(/\s+|[;,]/).reject{|c|c.empty?}  
end

output

# ruby test.rb 
aaaa
cccc
dddd
eeee
ffff
iiii
jjjj
kkkk
llll
mmmm
nnnn

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM