简体   繁体   中英

Using sed/awk to extract multiple strings from each line

I have a file that contains 30million lines(so big file)

On each line I have this kind of data:

"title": "some title" (SOME RANDOM DATA) "rank": "1,292,064"

I need to extract both the title value and the rank value so:

some title:1,292,064

Little help? :) I have tried my little heart out and nothing, can only extract one piece of data from each line

Except in the case there could be escaped quotes between the quotes, and other tricky stuff like that, I would try this sed command to filter your big file:

sed 's/^"[^"]*": "\([^"]*\)".*"\(.*\)"$/\1:\2/'

Basically, you look for two subgroups \\1 and \\2 containing the fields you want, and you print these separated by a : .

In case the string title appears litterally, the regex passed as argument to sed is less ugly:

sed 's/^"title": "\([^"]*\)".*"\(.*\)"$/\1:\2/'

Even safer, for avoiding side effects from the random data:

sed 's/^"title": "\([^"]*\)".*"rank": "\(.*\)"$/\1:\2/'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM