简体   繁体   English

egrep:如何搜索包含双引号的文本(win7 cmd窗口)

[英]egrep: how to search for text that includes double quotes (win7 cmd window)

I'm trying to make a TOC in my HTML file by searching for all HTML tags that contain one of three classes: article, section, and subsection. 我正在尝试通过搜索包含以下三个类之一的所有HTML标签来在HTML文件中创建目录:TOC,SECTION和SUBSUB。

I'm using GNU grep 2.4.2 in a Windows 7 cmd window. 我在Windows 7 cmd窗口中使用GNU grep 2.4.2。 Now I've read at least 12 pages from my Google search and tried 20+ permutations of my grep command. 现在,我已经从Google搜索中至少读取了12页,并尝试了20多个grep命令的排列。 I'm trying to find classes in my HTML file. 我正在尝试在HTML文件中查找类。 Luckily in my HTML file there is only one HTML tag per line in the HTML file, which simplifies things. 幸运的是,在我的HTML文件中,HTML文件中每行只有一个HTML标签,从而简化了事情。

I made a cmd batch file and tried running this and got various errors. 我制作了一个cmd批处理文件,并尝试运行此文件,但出现各种错误。 I've tried escaping the double quotes, and not escaping them. 我试图转义双引号,而不是转义。 I tried escaping the parens and not escaping them. 我尝试逃避括号,而不逃避。 I've tried different switches, with and without -E, etc. This is the regex I need to search for on every line and print the lines that match. 我尝试了不同的开关,带有和不带有-E等。这是我需要在每行上搜索并打印匹配行的正则表达式。

/class="\\(article\\|section\\|subsection\\)"/

This is one of my later grep attempts. 这是我后来的grep尝试之一。

grep -i -E 'class="\\(article\\|section\\|subsection\\)"' ch18IP.htm

In this example I'm not getting any lines returned nor any error message. 在此示例中,我没有得到任何返回的行,也没有任何错误消息。 What am I doing wrong here? 我在这里做错了什么?

Thank you! 谢谢!

You have three problems: 您有三个问题:

1) double quote " literals must be escaped as \\" when using grep on windows. 1)在Windows上使用grep时,请双引号"字面量必须转义为\\"

2) meta-characters ( , ) , and | 2)元字符()| should only be escaped as \\( , \\) , and \\| 应该仅以\\(\\)\\|转义\\| when using basic mode. 使用基本模式时。 The -E exended regex option uses the more traditional unescaped form. -E扩展的正则表达式选项使用更传统的未转义形式。 This is documented at http://www.gnu.org/software/grep/manual/html_node/Basic-vs-Extended.html http://www.gnu.org/software/grep/manual/html_node/Basic-vs-Extended.html中有记录

3) If a parameter requires quoting on Windows, then double quotes are used, not single quotes. 3)如果参数需要在Windows上加引号,则使用双引号,而不是单引号。 But in this case, enclosing quotes are not required, and would actually get in the way. 但是在这种情况下,不需要用引号引起来,而实际上会造成麻烦。 I'll explain this later in the answer. 我将在稍后的答案中对此进行解释。

I also suggest that you add a word boundry assertion \\b before class so that you don't mistakenly match something like subclass . 我还建议您在class之前添加一个单词边界声明\\b ,以免误匹配subclass东西。

So either of the following should work: 因此,以下任何一项都可以工作:

grep -i -E \bclass=\"(article|section|subsection)\" ch18IP.htm
grep -i \bclass=\"\(article\|section\|subsection\)\" ch18IP.htm

It gets tricky if you want to enclose your search argument in quotes because the search term also includes quote literals, as well as poison characters like | 如果您想将搜索参数用引号引起来,则会变得很棘手,因为搜索项还包括引号文字以及|等有毒字符| that have special meaning to the cmd "shell". 对cmd“ shell”有特殊含义的。 So you may end up having to escape some characters for both grep and cmd.exe. 因此,您可能最终不得不对grep cmd.exe都转义一些字符。 See https://stackoverflow.com/a/19816688/1012053 for more info. 有关更多信息,请参见https://stackoverflow.com/a/19816688/1012053

In your case, here are two options for how you could quote your search term for Windows. 在您的情况下,有两种方法可以引用Windows的搜索词。

grep -i -E ^"\bclass=\"(article|section|subsection)\"^" ch18IP.htm
grep -i -E "\bclass=\"(article^|section^|subsection)\"" ch18IP.htm

That last form looks mighty weird if you decide to use the basic regex: 如果您决定使用基本的正则表达式,则最后一种形式看起来很奇怪:

grep -i "\bclass=\"\(article\^|section\^|subsection\)\"" ch18IP.htm

Getting double-quotes as input on Windows cmd.exe command line is notoriously problematic. 众所周知,在Windows cmd.exe命令行上将双引号作为输入是有问题的。 See if this works for you: https://www.gnu.org/software/gawk/manual/html_node/DOS-Quoting.html 看看是否适合您: https : //www.gnu.org/software/gawk/manual/html_node/DOS-Quoting.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用双引号搜索文本 - Elasticsearch - Search text with double quotes - Elasticsearch 如何使用正则表达式搜索双引号包围的文本? - How to search text surrounded by double-quotes with RegEx? 搜索双引号包围的文本并转换为 HTML 实体 - Search for Text Surrounded by Double Quotes and Convert to HTML Entities egrep表达式:从文件读取时如何取消转义单引号? - Egrep expression: how to unescape single quotes when reading from file? 如何拆分文本以匹配双引号加上尾随文本到点? - How to split text to match double quotes plus trailing text to dot? 如何在双引号内搜索字符串? - How can I search for a string inside double quotes? 如何使用正则表达式搜索带或不带双引号的项目? - How to use regex to search for items with or without ending double quotes? 如何使用bash脚本在一行文本中删除额外的双引号而不是打开和关闭双引号 - How to remove extra double quotes rather than open and closing double quotes in a line of text using bash script 在 CSV 值的双引号内搜索并替换(转义)双引号 - Search and replace (escape) double quotes within double quotes in CSV values 在JSON中现有的双引号内搜索和替换双引号 - Search and replace double quotes inside existing double quotes in JSON
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM