[英]Extract lines matching result from text file
I need to extract the filename from a text file whereas the output on the text file doesn't have fonts. 我需要从文本文件中提取文件名,而文本文件上的输出没有字体。
So as you can see from the output file below I need to print out results where they are no fonts after the first results? 因此,正如您从下面的输出文件中看到的那样,我需要打印出第一个结果之后没有字体的结果? So only the last result has fonts in this output
所以只有最后一个结果在此输出中有字体
Does this make sense - Would Grep, Sed or Awk be the answer 这有意义吗-Grep,Sed或Awk是答案吗
So need a output from the text file below that shows that no fonts are present in that PDf within the **START and **END 因此,需要以下文本文件的输出,该输出表明** START和** END中的PDf中没有字体
******************START***********************
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
/home/user1/Documents/temp1.pdf
******************END***********************
******************START***********************
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
/home/user1/Documents/temp2.pdf
******************END***********************
******************START***********************
name type emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
BAAAAA+TimesNewRomanPS-BoldMT TrueType yes yes yes 14 0
CAAAAA+TimesNewRomanPSMT TrueType yes yes yes 9 0
/home/user3/Documents/temp file.pdf
******************END***********************
This prints any line containing ".pdf" if the previous line starts with -
. 如果前一行以
-
开头,则将打印包含“ .pdf”的任何行。
[me@home]$ awk '{if (st && match($0,".pdf")){print $0}; st=match($0,"^-")}' in.txt
/home/user1/Documents/temp1.pdf
/home/user1/Documents/temp2.pdf
It is not a generic solution, but will work with the input data you've given. 它不是通用解决方案,但可以处理您提供的输入数据。 I can imagine several edge cases where this might fail but it's all down to the specifications of your input file.
我可以想象几种可能失败的极端情况,但这完全取决于您的输入文件的规范。
(Based on the script you've posted in the comments below) If what you're trying to do is simply to identify PDF files that have no embedded fonts, this might work: (基于您在下面的评论中发布的脚本)如果您要尝试仅识别没有嵌入字体的PDF文件,则可能会起作用:
MAGNUM="/mnt/network/User\ 1\ PDF\ 06.12.11/"
has_no_fonts() {
COUNT=$(pdffonts "$1" 2> /dev/null | wc -l)
exit $(( $COUNT - 4 ))
}
export -f has_no_fonts
find "$MAGNUM" -type f -name "*.pdf" -exec bash -c 'has_no_fonts "{}"' \; -print
Here's a breakdown of the script: 这是脚本的细分:
Detecting embedded font count. 检测嵌入式字体计数。 Would have been simple if
pdffonts
returned a specific value if no fonts were embedded but that is not so. 如果
pdffonts
返回一个特定的值(如果没有嵌入任何字体的话)会很简单,但事实并非如此。 We therefore count the number of output lines and deduct 2 (header lines) to determine the number of embedded fonts 因此,我们计算输出行数并减去2(标题行)以确定嵌入字体的数量。
COUNT=$(pdffonts "$1" 2> /dev/null | wc -l) # number of output lines # exactly 2 if no fonts # exactly 0 if there are errors exit $(( $COUNT - 2 )) # exit 0 (success) if and only if PDF has no fonts
bash function exported so it can be used in subshell. bash函数已导出,因此可以在subshell中使用。
export -f has_no_fonts
Locate pdf files and only print out name if PDF valid and has no fonts 找到pdf文件,并且仅在PDF有效且没有字体时才打印出名称
find ..... -exec bash -c 'has_no_fonts "{}"' \\; -print ------- ------- | | -exec cannot run bash functions Will only print so run in a bash subshell filename if prev command exit with 0
If you prefer a one-line, the whole script can be written as: 如果您喜欢单行,则整个脚本可以编写为:
find "$MAGNUM" -name "*.pdf" \
-exec bash -c 'exit $(($(pdffonts "{}" 2> /dev/null |wc -l) - 2))' \; -print
This might work for you: 这可能对您有用:
sed -n '/^\*/,//{H;/\*END\*/{x;s/\n/&/6;t;s|[^/]*\([^\n]*\).*|\1|p}}' in.txt
/home/user1/Documents/temp1.pdf
/home/user1/Documents/temp2.pdf
Explanation: 说明:
*
*
开头的行之间的行 Or at a pinch: 或紧要关头:
sed -n '/^\*/,//{H;/\*END\*/{x;s|[^/]*-\n\(/[^\n]*\).*|\1|p}}' in.txt
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.