简体   繁体   中英

Split a string with unescaped delimiter only

I want to split a string with , as delimiter. My problem is that in some cases the input may contains commas. Changing the delimiter is not an option. I want users to be able to escape comma with \\ , and so I want to split only on , but not on \\, like so:

str="1,10,100,1\,000,10\,000,100\,000"
while [[ ${#str} -gt 0 ]]; do
    #Get index of delimiter
    index=$(echo "$str" | grep -boP '(?<!\\),' | head -c 1)

    #If index is empty, there is nothing to do
    if [[ -z "$index" ]]; then
        echo "$str"
        break
    fi

    #Get the next string we're looking for
    echo "$str" | cut -c1-$index
    #Cut the original string
    str=$(echo "$str" | cut -c$(($index+2))-${#str})
done

This is currently printing:

1
10
100
1\,000
10\,000
100\,000

But I want it to print:

1
10
100
1,000
10,000
100,000

I can now use sed to replace \\, with , but this entire solution seems quite bulky for relatively simple problem. Is there a better way to do this?

Try this:

$ str="1,10,100,1\,000,10\,000,100\,000"
$ sed 's/\([^\]\),/\1\n/g' <<< $str
1
10
100
1\,000
10\,000
100\,000

With bash one-liner:

$ sed 's/\([^\]\),/\1\n/g' <<< $str | while read -r line; do echo "-> $line"; done
-> 1
-> 10
-> 100
-> 1\,000
-> 10\,000
-> 100\,000

As per comment by @fedorqui, by this way you can avoid opening a sub-shell.

while IFS= read -r line; do echo "-> $line"; done < <(sed 's/\([^\]\),/\1\n/g' <<< "$str")

This is a way:

str="1,10,100,1\,000,10\,000,100\,000"
echo "$str" |sed -n 's/\([0-9]\+\(\\,[0-9]*\)*\),\+/\1\n/gp'
1
10
100
1\,000
10\,000
100\,000

With tr you can just remove those backslashes:

str="1,10,100,1\,000,10\,000,100\,000"
echo "$str" |sed -n 's/\([0-9]\+\(\\,[0-9]*\)*\),\+/\1\n/gp' |tr -d '\\'
1
10
100
1,000
10,000
100,000

Using gnu awk you can use FPAT to use complex regex to parse each field separately:

str="1,10,100,1\,000,10\,000,100\,000"

awk -v FPAT='[^,\\\\]*(\\\\.[^,\\\\]*)*|[^,]*' '{
     for (i=1; i<=NF; i++) printf "%d: <%s>\n", i, $i}' <<< "$str"

1: <1>
2: <10>
3: <100>
4: <1\,000>
5: <10\,000>
6: <100\,000>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM