简体   繁体   中英

sed extracting group of digits

I have tried to extract a number as given below but nothing is printed on screen:

echo "This is an example: 65 apples" | sed -n  's/.*\([0-9]*\) apples/\1/p'

However, I get '65', if both digits are matched separately as given below:

echo "This is an example: 65 apples" | sed -n  's/.*\([0-9][0-9]\) apples/\1/p'
65

How can I match a number such that I don't know the number of digits in a number to be extracted eg it can be 2344 in place of 65?

$ echo "This is an example: 65 apples" | sed -r  's/^[^0-9]*([0-9]+).*/\1/'
65

It's because your first .* is greedy , and your [0-9]* allows 0 or more digits. Hence the .* gobbles up as much as it can (including the digits) and the [0-9]* matches nothing.

You can do:

echo "This is an example: 65 apples" | sed -n  's/.*\b\([0-9]\+\) apples/\1/p'

where I forced the [0-9] to match at least one digit, and also added a word boundary before the digits so the whole number is matched.

However, it's easier to use grep , where you match just the number:

echo "This is an example: 65 apples" | grep -P -o '[0-9]+(?= +apples)'

The -P means "perl regex" (so I don't have to worry about escaping the '+').

The -o means "only print the matches".

The (?= +apples) means match the digits followed by the word apples.

A simple way for extracting all numbers from a string

echo "1213 test 456 test 789" | grep -P -o "\d+"

And the result:

1213
456
789

What you are seeing is the greedy behavior of regex. In your first example, .* gobbles up all the digits. Something like this does it:

echo "This is an example: 65144 apples" | sed -n  's/[^0-9]*\([0-9]\+\) apples/\1/p'
65144

This way, you can't match any digits in the first bit. Some regex dialects have a way to ask for non-greedy matching, but I don't believe sed has one.

echo "This is an example: 65 apples" | ssed -nR -e 's/.*?\b([0-9]*) apples/\1/p'

You will however need super-sed for this to work. The -R allows perl regexp.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM