简体   繁体   English

正则表达式从字符串中提取多个不同的单词

[英]Regex to extract multiple different words from a string

I am looking for a regex that gives me something of the format: 我正在寻找一种正则表达式,可以给我某种格式:

"Core i7 Extreme Edition" or "Core i3" or "Atom" or "Pentium", given the following inputs: 鉴于以下输入,“ Core i7 Extreme Edition”或“ Core i3”或“ Atom”或“ Pentium”:

"Intel® Core™ i7-6950X Processor Extreme Edition", "Intel® Core™ i3-6300T Processor", "Intel® Atom™ Processor D2550 " or "Intel® Pentium® Processor G4400" or "Intel® Core™2 Duo Processor E6400" or "Intel® Core™2 Extreme Processor QX6800" or "Intel® Core™2 Quad Processor Q9400S". “Intel®Core™i7-6950X处理器至尊版”,“Intel®Core™i3-6300T处理器”,“Intel®Atom™处理器D2550”或“Intel®Pentium®处理器G4400”或“Intel®Core™2 Duo处理器” E6400”或“Intel®Core™2 Extreme处理器QX6800”或“Intel®Core™2四核处理器Q9400S”。

I want to read the special identifying features from the product name. 我想从产品名称中读取特殊的识别功能。

I realise that something along the lines of this: Core|i3|i5|i7|Atom|Pentium|\\s4\\s|Celeron|Extreme Edition 我意识到有些类似的东西:Core | i3 | i5 | i7 | Atom | Pentium | \\ s4 \\ s | Celeron | Extreme Edition

Will give me what I want in a perfect world, where nothing is added. 在没有添加任何东西的完美世界中,我会得到我想要的东西。

It is possible to create it? 有可能创建吗? If it adds anything I am using C# but it is in an environment that is very generic and I only have the string and the regex. 如果添加了任何内容,我正在使用C#,但是它是在非常通用的环境中,并且我只有字符串和正则表达式。

You can try this regex: (See on regex101 ) 您可以尝试以下正则表达式:(请参见regex101

Intel® | Processor|®|™|[ -][A-Z]*\d{4}[A-Z]*

And replace with empty string "" . 并替换为空字符串"" This matches all non-wanted parts and removes them. 这将匹配所有不需要的部分并将其删除。

string pattern = @"Intel® | Processor|®|™|[ -][A-Z]*\d{4}[A-Z]*";
string substitution = @"";
string input = @"Intel® Core™ i7-6950X Processor Extreme Edition";
Regex regex = new Regex(pattern);
string result = regex.Replace(input, substitution);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM