Using sed/awk to extract multiple strings from each line

Question

I have a file that contains 30million lines(so big file)

On each line I have this kind of data:

"title": "some title" (SOME RANDOM DATA) "rank": "1,292,064"

I need to extract both the title value and the rank value so:

some title:1,292,064

Little help? :) I have tried my little heart out and nothing, can only extract one piece of data from each line

Answer 1

Except in the case there could be escaped quotes between the quotes, and other tricky stuff like that, I would try this sed command to filter your big file:

sed 's/^"[^"]*": "\([^"]*\)".*"\(.*\)"$/\1:\2/'

Basically, you look for two subgroups \\1 and \\2 containing the fields you want, and you print these separated by a : .

In case the string title appears litterally, the regex passed as argument to sed is less ugly:

sed 's/^"title": "\([^"]*\)".*"\(.*\)"$/\1:\2/'

Even safer, for avoiding side effects from the random data:

sed 's/^"title": "\([^"]*\)".*"rank": "\(.*\)"$/\1:\2/'

Using sed/awk to extract multiple strings from each line

Question

1 answers

solution1
2 ACCPTED 2020-03-01 17:25:31

Using sed/awk to extract multiple strings from each line

Question

1 answers

solution1 2 ACCPTED 2020-03-01 17:25:31

solution1
2 ACCPTED 2020-03-01 17:25:31