简体   繁体   English

如何使用linux命令提取与文本文件中特定字段匹配的文本

[英]how to extract text which matches particular fields in text file using linux commands

Hi below is my text file 嗨,下面是我的文本文件

{"Author":"john"
  "subject":"java"
  "title":"java cook book.pdf"}

{"title":"Php book.pdf"
 "Author":"Smith"
 "subject":"PHP"}

{"Author":"Smith"
"title":"Java book.pdf"}

from the above data i want to extract all titles which contains "java" word, i should get the following output 从上面的数据我想提取所有包含“java”字的标题,我应该得到以下输出

java cook book.pdf
Java book.pdf

Please suggest me 请建议我

Thanks 谢谢

GNU GNU

sed -r '/title.*java/I!d;s/.*:.(.*).}$/\1/' file
java cook book.pdf
Java book.pdf

You can try something like this with awk : 你可以用awk尝试这样的事情:

awk -F: '$1~/title/&&tolower($2)~/java/{gsub(/\"/,"",$2);print $2}' file

Explaination: 阐释:

  • -F: sets the field separator to : -F:将字段分隔符设置为:
  • $1~/title checks where first column is title $1~/title检查第一列是title
  • tolower($2)~/java/ checks for second column java case insensitively tolower($2)~/java/检查第二列java不区分大小写
  • gsub(..) is to remove " . gsub(..)将删除"
  • print $2 to print your second column print $2打印第二列

I will avoid any complex solution and will rely on old good grep+awk+tr instead: 我将避免任何复杂的解决方案,并将依赖旧的好grep + awk + ​​tr代替:

$ grep '"title":' test.txt | grep '[Jj]ava' | awk -F: '{print $2}' | tr -d [\"}]
java cook book.pdf
Java book.pdf

which works as follow: 其工作原理如下:

  1. extract all lines which contain "title": 提取包含"title":所有行"title":
  2. extract from these lines all which contain either Java or java 从这些行中提取所有包含Javajava
  3. split these lines by : and show second field 将这些行拆分为:并显示第二个字段
  4. remove " and } signs 删除"}标志

You should definitely use a json parser to get flawless results.. I like the one provided with PHP and if your file is, as shown, a bunch json blocks separated with blank lines: 你肯定应该使用一个json解析器来获得完美的结果..我喜欢PHP提供的那个,如果你的文件是,如图所示,一堆json块用空行分隔:

foreach( explode("\n\n", file_get_contents('/your/file.json_blocks')) as $js_block ):
    $json = json_decode( trim($js_block) );
    if ( isset( $json['title'] ) && $json['title'] && stripos($json['title'], 'java') ):
        echo trim($json['title']), PHP_EOL;
    endif;
endforeach;

This will be a lot more sure fire than doing the same with any given combination of sed/awk/grep/ et al, simply because json is follows a specific format and should be used with a parser. 对于任何给定的sed / awk / grep / et组合,这将更加肯定,因为json遵循特定格式并且应该与解析器一起使用。 As an example, a simple new line in the 'title' which has no real meaning to the json but will break the solution provided by Jaypal.. Please see this for a similar problem: parsing xhtml with regex and why you shouldn't do it: RegEx match open tags except XHTML self-contained tags 举个例子,'title'中的一个简单的新行对json没有实际意义,但会打破Jaypal提供的解决方案..请看一下类似的问题:用正则表达式解析xhtml以及为什么你不应该这样做它: RegEx匹配除XHTML自包含标签之外的开放标签

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用linux命令将文本文件转换为二进制文件 - How to convert a text file to binary file using linux commands 如何在linux中提取行并保存为文本文件 - How to extract rows and save as a text file in linux 如何在Linux中根据记录数分割定界文本文件,该文件在数据字段中具有记录结尾分隔符 - How to Split a Delimited Text file in Linux, based on no of records, which has end-of-record separator in data fields 如何使用linux中的find打印与我的文本匹配的行? - How to print the line that matches my text using find in linux? 无法使用linux sed在文件的特定行插入文本 - unable to insert text at a particular line of a file using linux sed 如何使用一些 Linux 命令复制一个巨型文件的前几行,并在其末尾添加一行文本? - How to copy the first few lines of a giant file, and add a line of text at the end of it using some Linux commands? 有歧义时,使用grep从Linux中的txt文件中提取文本 - extract text from txt file in linux using grep when there is ambguity 使用 c++98 linux 从文本文件中提取数据 - To Extract data from text file using c++98 linux 如何使用sed从字符串中的特定点提取文本? - How to extract text from a particular point in string using sed? 使用linux命令修改列中的文本 - Modify text in columns using linux commands
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM