如何使用linux命令提取与文本文件中特定字段匹配的文本

Question

Hi below is my text file 嗨，下面是我的文本文件

{"Author":"john"
  "subject":"java"
  "title":"java cook book.pdf"}

{"title":"Php book.pdf"
 "Author":"Smith"
 "subject":"PHP"}

{"Author":"Smith"
"title":"Java book.pdf"}

from the above data i want to extract all titles which contains "java" word, i should get the following output 从上面的数据我想提取所有包含“java”字的标题，我应该得到以下输出

java cook book.pdf
Java book.pdf

Please suggest me 请建议我

Thanks 谢谢

Answer 1

GNU sed GNU sed

sed -r '/title.*java/I!d;s/.*:.(.*).}$/\1/' file

java cook book.pdf
Java book.pdf

Answer 2

You can try something like this with awk : 你可以用awk尝试这样的事情：

awk -F: '$1~/title/&&tolower($2)~/java/{gsub(/\"/,"",$2);print $2}' file

Explaination: 阐释：

-F: sets the field separator to : -F:将字段分隔符设置为:
$1~/title checks where first column is title $1~/title检查第一列是title
tolower($2)~/java/ checks for second column java case insensitively tolower($2)~/java/检查第二列java不区分大小写
gsub(..) is to remove " . gsub(..)将删除" 。
print $2 to print your second column print $2打印第二列

Answer 3

I will avoid any complex solution and will rely on old good grep+awk+tr instead: 我将避免任何复杂的解决方案，并将依赖旧的好grep + awk + tr代替：

$ grep '"title":' test.txt | grep '[Jj]ava' | awk -F: '{print $2}' | tr -d [\"}]
java cook book.pdf
Java book.pdf

which works as follow: 其工作原理如下：

extract all lines which contain "title": 提取包含"title":所有行"title":
extract from these lines all which contain either Java or java 从这些行中提取所有包含Java或java
split these lines by : and show second field 将这些行拆分为:并显示第二个字段
remove " and } signs 删除"和}标志

Answer 4

You should definitely use a json parser to get flawless results.. I like the one provided with PHP and if your file is, as shown, a bunch json blocks separated with blank lines: 你肯定应该使用一个json解析器来获得完美的结果..我喜欢PHP提供的那个，如果你的文件是，如图所示，一堆json块用空行分隔：

foreach( explode("\n\n", file_get_contents('/your/file.json_blocks')) as $js_block ):
    $json = json_decode( trim($js_block) );
    if ( isset( $json['title'] ) && $json['title'] && stripos($json['title'], 'java') ):
        echo trim($json['title']), PHP_EOL;
    endif;
endforeach;

This will be a lot more sure fire than doing the same with any given combination of sed/awk/grep/ et al, simply because json is follows a specific format and should be used with a parser. 对于任何给定的sed / awk / grep / et组合，这将更加肯定，因为json遵循特定格式并且应该与解析器一起使用。 As an example, a simple new line in the 'title' which has no real meaning to the json but will break the solution provided by Jaypal.. Please see this for a similar problem: parsing xhtml with regex and why you shouldn't do it: RegEx match open tags except XHTML self-contained tags 举个例子，'title'中的一个简单的新行对json没有实际意义，但会打破Jaypal提供的解决方案..请看一下类似的问题：用正则表达式解析xhtml以及为什么你不应该这样做它： RegEx匹配除XHTML自包含标签之外的开放标签

如何使用linux命令提取与文本文件中特定字段匹配的文本

问题描述

4 个解决方案

解决方案1
3 已采纳 2013-06-13 12:39:53

GNU sed GNU sed

解决方案2
2 2013-06-13 12:09:46

Explaination: 阐释：

解决方案3
1 2013-11-11 14:13:12

解决方案4
0 2013-06-13 12:23:37

如何使用linux命令提取与文本文件中特定字段匹配的文本

问题描述

4 个解决方案

解决方案1 3 已采纳 2013-06-13 12:39:53

GNU sed GNU sed

解决方案2 2 2013-06-13 12:09:46

Explaination: 阐释：

解决方案3 1 2013-11-11 14:13:12

解决方案4 0 2013-06-13 12:23:37

解决方案1
3 已采纳 2013-06-13 12:39:53

解决方案2
2 2013-06-13 12:09:46

解决方案3
1 2013-11-11 14:13:12

解决方案4
0 2013-06-13 12:23:37