简体   繁体   English

在以特定单词开头,后跟以分号分隔的单词的行中,用逗号替换分号并用双引号括起单词

[英]In lines starting with specific word followed by words separated by semicolon, replace semicolon with a comma and wrap the words in double quotes

I'm trying to change certain lines in my file using notepad++ and I have very less knowledge at regular expressions and henceforth seeking help. 我正在尝试使用notepad ++更改文件中的某些行,而且我对正则表达式的知识非常少,从而寻求帮助。 Any kind of help is appreciable. 任何形式的帮助都是值得的。

Find all the lines that looks like as See ABC'D EFG;IJKL;FOO;BAR;XXXXson on. 找到所有看起来像See ABC'D EFG;IJKL;FOO;BAR;XXXXson on.

  1. Lines that starts with word "See" "See"开头的行
  2. After that,there are words all in Capital letters and separated by semicolon 在那之后,有大写字母全部用分号分隔
  3. Words can have special characters 单词可以有特殊字符

    a) 一种) space 空间

    b) ' (apostrophy) b) ' (萎缩)

    c) , (comma) c) , (逗号)

    d) - (hiphen) d) - (hiphen)

  4. Ends with a full stop . 结束了一个句号.

And replace those lines as: 并将这些行替换为:

See:["ABC'D EFG","IJKL","FOO","BAR",....]

find what: See ([A-Z'\\-, ]+)\\;([A-Z'\\-, ]+)\\. 找到什么: See ([A-Z'\\-, ]+)\\;([A-Z'\\-, ]+)\\.
replace with: See:["\\1", "\\2"] 替换为: See:["\\1", "\\2"]
see https://regex101.com/r/bfJkN6/3 请参阅https://regex101.com/r/bfJkN6/3
also tested on my notepad++, got See:["ABC'D EFG", "IJKL"] 还在我的记事本++上测试,看到了:[“ABC'D EFG”,“IJKL”]
I updated the regex to catch multi hits on https://regex101.com/r/bfJkN6/5 我更新了正则表达式,以便在https://regex101.com/r/bfJkN6/5上获得多次点击
See ((([A-Z'\\-, ]+)\\;)+)([A-Z'\\-, ]+)\\.

Use \\W which matches any non-word character 使用\\W匹配任何非单词字符

Example https://regex101.com/r/lFANF0/4 示例https://regex101.com/r/lFANF0/4

Find See\\s([AZ' ]+)\\W(\\w+)\\. 找到 See\\s([AZ' ]+)\\W(\\w+)\\. and Replace See:["$1","$2"] 替换 See:["$1","$2"]

在此输入图像描述

1stGroup (\\w+\\'\\w+\\s+) \\w+ matches any word character (equal to [a-zA-Z0-9_] ) 1stGroup (\\w+\\'\\w+\\s+) \\w+匹配任何单词字符(等于[a-zA-Z0-9_]
+ Matches between one and unlimited times +一次和无限次之间的匹配
\\s+ matches any whitespace character (equal to [\\r\\n\\t\\f\\v ] ) \\s+匹配任何空白字符(等于[\\r\\n\\t\\f\\v ]
2nd Group (\\w+\\W*\\w+) \\W* matches any non-word character (equal to [ ^a-zA-Z0-9_] ) 第二组(\\w+\\W*\\w+) \\W*匹配任何非单词字符(等于[ ^a-zA-Z0-9_]

Lets say the number of semi-colon is variable. 可以说分号的数量是可变的。 You need to proceed in two passes. 你需要两次通过。
Use Replace All for the two passes: 两次通过使用Replace All

find: ^See \\K([AZ ,;'-]+)\\. find: ^See \\K([AZ ,;'-]+)\\.
replace: ["$1"] 替换: ["$1"]

and then: 接着:

find: (?:\\G(?!^)|^See \\["(?=[^"]*"]))[^";]*\\K; find: (?:\\G(?!^)|^See \\["(?=[^"]*"]))[^";]*\\K;
replace: ", " 替换: ", "

The first pass is easy to understand, it only finds corresponding lines, remove the final dot and encloses the part with uppercase letters, commas, spaces, semi-colons, apostrophes and hyphens between double quotes and square brackets. 第一遍很容易理解,它只找到相应的行,删除最后一个点,并用双引号和方括号之间的大写字母,逗号,空格,分号,撇号和连字符包围该部分。

The second pass needs to replace only semi-colons inside quotes and square brackets for lines that start with See . 第二遍需要仅替换引号内的分号和方括号中的以See开头的行。 To do that I used the second branch ^See \\["(?=[^"]*"]) to reach the interesting lines and the \\G anchor in the second branch to ensure that the next matches are contiguous to the first. Since [^";]* excludes the double quote, once the last semicolon is reached, the first branch can no longer succeed and the contiguity is broken. 为此,我使用了第二个分支^See \\["(?=[^"]*"])到达有趣的行和第二个分支中的\\G锚点,以确保下一个匹配与第一个匹配。由于[^";]*排除双引号,一旦到达最后一个分号,第一个分支就不能再成功并且连续性被破坏。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正则表达式匹配用分号和空格分隔的单词 - Regex that matches words separated by a semicolon and whitespace 正则表达式从包含以分号(;)分隔的不同单词的字符串中获取第n个单词 - Regex to get a nth word from a string which is containing different words which are separated by semicolon(;) 如何提取正则表达式中由单词和分号分隔的句子? - How do I extract sentences in regex that are separated by a word followed by a semicolon? 在保持相同字符的同时,用分号替换逗号后跟一个字符 - replace comma followed by a character by semicolon while keeping the same character 匹配以特定关键字开头的一行中以逗号分隔的单词 - Match words separated by comma in a line starting with a specific keyword 替换引号中的分号(Nintex Workflow) - Replace semicolon in quotes (Nintex Workflow) 正则表达式替换双分号 - Regular expression to replace double semicolon 在 R 的正则表达式中指定一个单词后跟一个特定单词,后跟最多 3 个单词 - Specifying a word followed by a specific word followed by max of 3 words in regex in R 两个单词之间用空格隔开,然后用逗号分隔多次 - Two words separated by a space followed by a comma more than one time 正则表达式模式接受逗号或分号分隔的值 - Regex pattern accepting comma or semicolon separated values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM