简体   繁体   中英

extract substring with SED

I have the next strings: for example:

     input1 = abc-def-ghi-jkl

     input2 = mno-pqr-stu-vwy

I want extract the first word between "-"

for the fisrt string I want to get: def

if the input is the second string, I want to get: pqr

I want to use the command SED , Could you help me please?

Use

sed 's,^[^-]*-\([^-]*\).*,\1,' file

The string after the first - will be captured up to the second - and the rest will be matched, then the matched line will be replaced with the group text.

With bash :

var='input1 = abc-def-ghi-jkl'
var=${var#*-}      # remove shortest prefix `*-`, this removes `input1 = abc-`
echo "${var%%-*}"  # remove longest suffix `-*`, this removes `-ghi-jkl`

Or with awk :

awk -F'-' '{print $2}' <<<'input1 = abc-def-ghi-jkl'

Use - as input field separator and print the second field.


Or with cut :

cut -d'-' -f2 <<<'input1 = abc-def-ghi-jkl'

When you want to use sed , you can choose between solutions like

# Double processing
echo "$input1" | sed 's/[^-]*-//;s/-.*//'
# Normal approach
echo "$input1" | sed -r 's/^[^-]*-([^-]*)|-.*)/\1/g'
# Funny alternative
echo "$input1" | sed -r 's/(^[^-]*-|-.*)//g'

The obvious "external" tool would be cut . You can also look at a Bash builtin solution like

[[ ${input1} =~ ([^-]*)-([^-]*) ]] && printf %s "${BASH_REMATCH[2]}"

grep solution (in my opinion this is the most natural approach, as you are only trying to find matches to a regular expression - you are not looking to edit anything, so there should be no need for the more advanced command sed )

grep -oP '^[^-]*-\K[^-]*(?=-)' << EOF
> abc-qrs-bobo-the-clown
> 123-45-6789
> blah-blah-blah
> no dashes here
> mahi-mahi
> EOF

Output

qrs
45
blah

Explanation

Look at the inputs first, included here for completeness as a heredoc (more likely you would name your file as the last argument to grep .) The solution requires at least two dashes to be present in the string; in particular, for mahi-mahi it will find no match. If you want to find the second mahi as a match, you can remove the lookahead assertion at the end of the regular expression (see below).

The regular expression does this. First note the command options: -o to return only the matched substring, not the entire line; and -P to use Perl extensions. Then, the regular expression: start from the beginning of the line ( ^ ); look for zero or more non-dash characters followed by dash, and then ( \K ) discard this part of the required match from the substrings found to match the pattern. Then look for zero or more non-dash characters again - this will be returned by the command. Finally, require a dash following this pattern, but do not include it in the match. This is done with a lookahead (marked by (?=... ) ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM