简体   繁体   English

Notepad ++ Regex:查找所有1和2个字母的单词

[英]Notepad++ Regex: Find all 1 and 2 letter words

I'm working with a text file with 200.000+ lines in Notepad++. 我正在使用Notepad ++中有200.000+行的文本文件。 Each line has only one word. 每行只有一个单词。 I need to strip out and remove all words which only contains one letter (eg: I ) and words which contains only two letters (eg: as ). 我需要删除并删除所有只包含一个字母的单词(例如: I )和仅包含两个字母的单词(例如: as )。

I thought I could just pas in regular regex like this [a-zA-Z]{1,2} but I does not recognize anything (I'm trying to Mark them). 我想我可以像普通的正则表达式一样[a-zA-Z]{1,2}但我不认识任何东西(我试图标记它们)。

I've done manual search and I know that there do exists words of that length so therefor can it only be my regex code that's wrong. 我已经完成了手动搜索,我知道确实存在这样长度的单词,因此只能是我的正则表达式代码是错误的。 Anyone knows how to do this in Notepad++ ??? 任何人都知道如何在Notepad ++中执行此操作???

Cheers, 干杯,
- Mestika - 梅斯蒂卡

If you want to remove only the words but leave the lines empty, this works: 如果您只想删除单词但将行留空,则可以:

^[a-zA-Z]{1,2}$

Replace this with an empty string. 用空字符串替换它。 ^ and $ are anchors for the beginning and the end of a line (because Notepad++'s regexes work in multi-line mode). ^$是行的开头和结尾的锚点(因为Notepad ++的正则表达式在多行模式下工作)。

If you want to remove the lines completely, search for this: 如果要完全删除这些行,请搜索以下内容:

^[a-zA-Z]{1,2}\r\n

And replace with an empty string. 并用空字符串替换。 However, this won't work before Notepad++ 6, so make sure yours is up-to-date. 但是,这在Notepad ++ 6之前不起作用,因此请确保它是最新的。

Note that you will have to replace \\r\\n with the specific line-endings of your file! 请注意,您必须将\\r\\n替换为文件的特定行尾!

As Tim Pietzker suggested, a platform independent solution that also removes empty lines would be: 正如Tim Pietzker所建议的那样,一个独立于平台的解决方案也可以删除空行:

^[a-zA-Z]{1,2}[\r\n]+

A platform-independent solution that does not remove empty lines but only those with one or two letters would be: 一个独立于平台的解决方案,它不会删除空行,而只删除那些带有一个或两个字母的空行:

^[a-zA-Z]{1,2}(\r\n?|\n)

我不使用Notepad ++,但我的猜测可能是因为你有太多的匹配 - 尝试包含单词边界(你的exp将匹配每组2个字母)

\b[a-zA-Z]{1,2}\b

The regex you specified should find 1-or-2 characters (even in Notepad++'s Find-dialog), but not in the way you'd think. 你指定的正则表达式应该找到1或2个字符(即使在Notepad ++的查找对话框中),但不是你想象的方式。 You want to have the regex make sure it starts at the beginning of the line and ends at the end with ^ and $ , respecitevely: 你想让正则表达式确保它从行的开头开始,并在结束时以^$结束:respecitely:

^[a-zA-Z]{1,2}$

Notepad++ version 6.0 introduced the PCRE engine, so if this doesn't work in your current version try updating to the most recent. Notepad ++ 6.0版引入了PCRE引擎,因此如果这在当前版本中不起作用,请尝试更新到最新版本。

You seem to use the version of Notepad++ that doesn't support explicit quantifiers: that's why there's no match at all (as { and } are treated as literals, not special symbols). 您似乎使用不支持显式量词的Notepad ++版本:这就是为什么根本没有匹配(因为{}被视为文字,而不是特殊符号)。

The solution is to use their somewhat more lengthy replacement: 解决方案是使用更长的替代品:

\w\w?

... but that's only part of the story, as this regex will match any symbol, and not just short words. ......但这只是故事的一部分,因为这个正则表达式将匹配任何符号,而不仅仅是简短的单词。 To do that, you need something like this: 要做到这一点,你需要这样的东西:

^\w\w?$

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM