Parsing simple string with awk or sed in linux

Question

original string:
A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/

Depth of directories will vary, but /trunk part will always remain the same. And a single character in front of /trunk is the indicator of that line.

desired output:

A /trunk/apple
B /trunk/apple
Z /trunk/orange
Q /trunk/melon/juice/venti/straw

*** edit
I'm sorry I made a mistake by adding a slash at the end of each path in the original string which made the output confusing. Original string didn't have the slash in front of the capital letter, but I'll leave it be.

Answer 1

To deal with complex samples input, like where there could be N number of / and values after trunk in a single line please try following.

awk '
{
  gsub(/[^/]*\/trunk/,OFS"&")
  sub(/^ /,"")
  sub(/\//,OFS"&")
  gsub(/ +[^/]*\/trunk\/[^[:space:]]+/,"\n&")
  sub(/\n/,OFS)
  gsub(/\n /,ORS)
  gsub(/\/trunk/,OFS"&")
  sub(/[[:space:]]+/,OFS)
}
1
'  Input_file

With your shown samples, please try following awk code.

awk '{gsub(/\/trunk/,OFS "&");gsub(/trunk\/[^/]*\//,"&\n")} 1' Input_file

Answer 2

With GNU awk for multi-char RS and RT:

$ awk -v RS='([^/]+/){2}[^/\n]+' 'RT{sub("/",OFS,RT); print RT}' file
A trunk/apple
B trunk/apple
Z trunk/orange

I'm setting RS to a regexp describing each string you want to match, ie 2 repetitions of non- / s followed by / and then a final string of non- / s (and non-newline for the last string on the input line). RT is automatically set to each of the matching strings, so then I just change the first / to a blank and print the result.

If each path isn't always 3 levels deep but does always start with something/trunk/ , eg:

$ cat file
A/trunk/apple/banana/B/trunk/apple/Z/trunk/orange

then:

$ awk -v RS='[^/]+/trunk/' 'RT{if (NR>1) print pfx $0; pfx=gensub("/"," ",1,RT)} END{printf "%s%s", pfx, $0}' file
A trunk/apple/banana/
B trunk/apple/
Z trunk/orange

Answer 3

In awk you can try this solution. It deals with the special requirement of removing forward slashes when the next character is upper case. Will not win a design award but works.

$ echo "A/trunk/apple/B/trunk/apple/Z/trunk/orange" | 
    awk -F '' '{ x=""; for(i=1;i<=NF;i++){ 
      if($(i+1)~/[A-Z]/&&$i=="/"){$i=""}; 
      if($i~/[A-Z]/){ printf x""$i" "}
      else{ x="\n"; printf $i } }; print "" }'
A /trunk/apple
B /trunk/apple
Z /trunk/orange

Also works for n words. Actually works with anything that follows the given pattern.

$ echo "A/fruits/apple/mango/B/anything/apple/pear/banana/Z/ball/orange/anything" | 
    awk -F '' '{ x=""; for(i=1;i<=NF;i++){
      if($(i+1)~/[A-Z]/&&$i=="/"){$i=""};
      if($i~/[A-Z]/){ printf x""$i" "}
      else{ x="\n"; printf $i } }; print "" }'
A /fruits/apple/mango
B /anything/apple/pear/banana
Z /ball/orange/anything

Answer 4

This might work for you (GNU sed):

sed 's/[^/]*/& /;s/\//\n/3;P;D' file

Separate the first word from the first / by a space.

Replace the third / by a newline.

Print/delete the first line and repeat.

If the first word has the property that it is only one character long:

sed 's/./& /;s#/\(./\)#\n\1#;P;D' file

Or if the first word has the property that it begins with an upper case character:

sed 's/[[:upper:]][^/]*/& /;s#/\([[:upper:][^/]*/\)#\n\1#;P;D' file

Or if the first word has the property that it is followed by /trunk/ :

sed -E 's#([^/]*)(/trunk/)#\n\1 \2#g;s/.//' file

Answer 5

Using gnu awk you could use FPAT to set contents of each field using a pattern.

When looping the fields, replace the first / with /

str1="A/trunk/apple/B/trunk/apple/Z/trunk/orange"

echo $str1 | awk -v FPAT='[^/]+/trunk/[^/]+' '{    
for(i=1;i<=NF;i++) {
    sub("/", " /", $i)
    print $i
    }
}'

The pattern matches

[^/]+ Match any char except /
/trunk/[^/]+ Match /trunk/ and any char except /

Output

A  /trunk/apple
B  /trunk/apple
Z  /trunk/orange

Answer 6

With GNU sed:

$ str="A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/"
$ sed -E 's|/?(.)(/trunk/)|\n\1 \2|g;s|/$||' <<< "$str"

A /trunk/apple
B /trunk/apple
Z /trunk/orange/citrus
Q /trunk/melon/juice/venti/straw

Note the first empty output line. If it is undesirable we can separate the processing of the first output line:

$ sed -E 's|(.)|\1 |;s|/(.)(/trunk/)|\n\1 \2|g;s|/$||' <<< "$str"
A /trunk/apple
B /trunk/apple
Z /trunk/orange/citrus
Q /trunk/melon/juice/venti/straw

Answer 7

With awk using gsub() and sub() functions:

awk '
{
gsub(/[[:upper:]]{1}/,"& ")
sub(/[[:upper:]]{1}$/,"\n&",$2)
sub(/[[:upper:]]{1}$/,"\n&",$3)
$1=$1
gsub(/[/]\n/,"\n")
} 1' file
A /trunk/apple
B /trunk/apple
Z /trunk/orange

first gsub() is applied by default to $0 .
then we use the same regexp in sub() for $2 and $3 fields.
rebuild: $1=$1 .
finally, we remove the / at the end.

Answer 8

Assuming your data will always be in the format provided as a single string, you can try this sed .

$ sed 's/$/\//;s|\([A-Z]\)\([a-z/]*\)/\([a-z]*\?\)|\1 \2\3\n|g' input_file

$ echo "A/trunk/apple/pine/skunk/B/trunk/runk/bunk/apple/Z/trunk/orange/T/fruits/apple/mango/P/anything/apple/pear/banana/L/ball/orange/anything/S/fruits/apple/mango/B/rupert/cream/travel/scout/H/tall/mountains/pottery/barnes" | sed 's/$/\//;s|\([A-Z]\)\([a-z/]*\)/\([a-z]*\?\)|\1 \2\3\n|g'
A /trunk/apple/pine/skunk
B /trunk/runk/bunk/apple
Z /trunk/orange
T /fruits/apple/mango
P /anything/apple/pear/banana
L /ball/orange/anything
S /fruits/apple/mango
B /rupert/cream/travel/scout
H /tall/mountains/pottery/barnes

Answer 9

Some fun with perl, where you can using nonconsuming regex to autosplit into the @F array, then just print however you want.

perl -lanF'/(?=.{1,2}trunk)/' -e 'print "$F[2*$_] $F[2*$_+1]" for 0..$#F/2'

Step #1: Split

perl -lanF/(?=.{1,2}trunk)/'
This will take the input stream, and split each line whenever the pattern .{1,2}trunk is encountered
Because we want to retain trunk and the preceeding 1 or 2 chars, we wrap the split pattern in the (?=) for a non-consuming forward lookahead
This splits things up this way:

$ echo A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/ | perl -lanF'/(?=.{1,2}trunk)/' -e 'print join " ", @F'
A /trunk/apple/ B /trunk/apple/ Z /trunk/orange/citrus/ Q /trunk/melon/juice/venti/straw/

Step 2: Format output:

The @F array contains pairs that we want to print in order, so we'll iterate half of the array indices, and print 2 at a time:
print "$F[2*$_] $F[2*$_+1]" for 0..$#F/2 --> Double the iterator, and print pairs
using perl -l means each print has an implicit \n at the end
The results:

$ echo A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/ | perl -lanF'/(?=.{1,2}trunk)/' -e 'print "$F[2*$_] $F[2*$_+1]" for 0..$#F/2'
A /trunk/apple/
B /trunk/apple/
Z /trunk/orange/citrus/
Q /trunk/melon/juice/venti/straw/

Endnote: Perl obfuscation that didn't work.

Any array in perl can be cast as a hash, of the format (key,val,key,val....)
So %F=@F; print "$_ $F{$_}" for keys %F %F=@F; print "$_ $F{$_}" for keys %F seems like it would be really slick
But you lose order:

$ echo A/trunk/apple/B/trunk/apple/Z/trunk/orange/citrus/Q/trunk/melon/juice/venti/straw/ | perl -lanF'/(?=.{1,2}trunk)/' -e '%F=@F; print "$_ $F{$_}" for keys %F'
Z /trunk/orange/citrus/
A /trunk/apple/
Q /trunk/melon/juice/venti/straw/
B /trunk/apple/

Parsing simple string with awk or sed in linux

Question

9 answers

solution1
2 2021-11-17 10:21:21

solution2
2 2021-11-17 13:54:15

solution3
1 2021-11-17 11:13:39

solution4
1 2021-11-17 11:14:22

solution5
1 2021-11-17 11:25:00

solution6
1 2021-11-18 05:41:33

solution7
0 2021-11-17 14:34:53

solution8
0 2021-11-17 15:58:39

solution9
0 2021-11-18 04:52:19

Parsing simple string with awk or sed in linux

Question

9 answers

solution1 2 2021-11-17 10:21:21

solution2 2 2021-11-17 13:54:15

solution3 1 2021-11-17 11:13:39

solution4 1 2021-11-17 11:14:22

solution5 1 2021-11-17 11:25:00

solution6 1 2021-11-18 05:41:33

solution7 0 2021-11-17 14:34:53

solution8 0 2021-11-17 15:58:39

solution9 0 2021-11-18 04:52:19

solution1
2 2021-11-17 10:21:21

solution2
2 2021-11-17 13:54:15

solution3
1 2021-11-17 11:13:39

solution4
1 2021-11-17 11:14:22

solution5
1 2021-11-17 11:25:00

solution6
1 2021-11-18 05:41:33

solution7
0 2021-11-17 14:34:53

solution8
0 2021-11-17 15:58:39

solution9
0 2021-11-18 04:52:19