parsing of files using AWK with field-separators - does not parse correctly

Question

I have a file that contains data which is separated by D**> sub-string. It looks like this:

some text here...

text: nnD**>24%
text: nnD**>25%
text: nnD**>22%
text: nnD**>3%

some text here...

nn stands for float number (0.25 or 9.769 - does not matter) I need to put into a separate file just sequence of % values: 24, 25,22,3.... so, I did the following:

`read B1 <<<$(cat FILE_NAME | awk 'BEGIN {FS="D**>" {print $2}')`
`eecho -e "$B1"`

exptect to get the list like this: 24%, 25%, 22%...

but it does not parse it correctly - it simply dumn lot of other strings in the file. If I do like this:

read B1 <<<$(cat FILE_NAME | awk 'BEGIN {FS="*>" {print $2}')

it works correctly. Could someone explain to me what is the problem?

Answer 1

The field separator FS value is a regular expression so special characters like * need to be escaped. Try something like this:

read B1 <<< $(awk 'BEGIN {FS="D[*][*]>"} {print $2}' FILE_NAME)

Answer 2

I think that you're focussing on the wrong part of your input. The numbers and asterisks before the ">" are irrelevant. You should use something like this:

awk -F'[>%]' '{print $2}' oldfile > newfile

This sets the input field separator to either a ">" or a "%" and prints the second field (the numbers that you are interested in). The output is redirected to newfile .

The contents of newfile will then be:

Answer 3

FS="D**>" says Set the FS to the character D repeated zero or more times, repeated zero or more times again since * is the RE metacharacter that represents optional repetition.

That makes no sense so if you instead want to set the FS to be the character D followed by the character * followed by the character * then the way to write that would be FS="D\\\\*\\\\*" or FS="D[*][*]" to make the * s be treated literally instead of as RE metacharacters.

I really don't understand what it is you're trying to do with the rest of the script but I suspect you'd be better off just doing it all in one awk command. If you're just trying to get all of the percent values on one line:

$ awk -F'D[*][*]>' '{printf "%s%s", (NR>1?OFS:""), $2} END{print ""}' file
24% 25% 22% 3%

and if you want to strip off the % signs:

$ awk -F'D[*][*]>' '{printf "%s%s", (NR>1?OFS:""), $2+0} END{print ""}' file
24 25 22 3

and if you want to separate them with , instead of just a space:

$ awk -F'D[*][*]>' -v OFS=', ' '{printf "%s%s", (NR>1?OFS:""), $2+0} END{print ""}' file
24, 25, 22, 3

Answer 4

In addition to awk , this problem can also be solved with sed :

$ B1=$(sed -n 's/.*D\*\*>\(.*%\)/\1/p' input_file)
$ echo $B1
24% 25% 22% 3%

Answer 5

The read builtin command doesn't read input with multiple lines the way you expect.

read B1 < <(awk 'BEGIN{FS="D**>"}{print $2}' FILE_NAME)

would only assign 24% to the variable B1 because read is only taking input from the first line.

In order to capture multiple line output from your Awk command and assign it to a Bash variable, I'd use process substitution.

B1=$(awk 'BEGIN{FS="D**>"}{print $2}' FILE_NAME)

parsing of files using AWK with field-separators - does not parse correctly

Question

5 answers

solution1
2 2014-08-11 02:36:21

solution2
2 2014-08-11 07:16:35

solution3
1 2014-08-11 16:10:02

solution4
0 2014-08-11 02:49:05

solution5
0 2014-08-11 07:27:25

parsing of files using AWK with field-separators - does not parse correctly

Question

5 answers

solution1 2 2014-08-11 02:36:21

solution2 2 2014-08-11 07:16:35

solution3 1 2014-08-11 16:10:02

solution4 0 2014-08-11 02:49:05

solution5 0 2014-08-11 07:27:25

solution1
2 2014-08-11 02:36:21

solution2
2 2014-08-11 07:16:35

solution3
1 2014-08-11 16:10:02

solution4
0 2014-08-11 02:49:05

solution5
0 2014-08-11 07:27:25