简体   繁体   English

在记事本++中寻求正则表达式以仅在两个引号[“]之间搜索和替换CRLF

[英]Seeking regex in Notepad++ to search and replace CRLF between two quotation marks ["] only

I've got a CSV file with some 600 records where I need to replace some [CRLF] with a [space] but only when the [CRLF] is positioned between two ["] (quotation marks). When the second ["] is encountered then it should skip the rest of the line and go to the next line in the text. 我有一个包含600条记录的CSV文件,需要在其中用[空格]替换某些[CRLF],但前提是[CRLF]位于两个[“](引号)之间。当第二个[”]如果遇到,则应跳过该行的其余部分,然后转到文本的下一行。

I don't really have a starting point. 我真的没有起点。 Hope someone comes up with a suggestion. 希望有人提出建议。

Example: 例:

John und Carol,,Smith,,,J.S.,,,,,,,,,,,,,+11 22 333 4444,,,,,"streetx 21[CRLF]
New York City[CRLF]
USA",streetx 21,,,,New York City,,,USA,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Normal,,My Contacts,[CRLF]

In this case the two [CRLF] after the first ["] need to be replaced with a space [ ]. When the second ["] is encountered, skip the end of the line and go to next line. 在这种情况下,第一个[“]之后的两个[CRLF]需要用空格[]替换。当遇到第二个[”]时,跳过该行的末尾并转到下一行。

Then again, now on the next line, after the first ["] is encountered replace all [CRLF] until the second ["] is encountered. 然后再次,现在在下一行,遇到第一个[“]之后替换所有[CRLF]直到遇到第二个[”]。 The [CRLF]s vary in numbers. [CRLF]的数量不尽相同。 In the CSV-file the amount of commas [,] before (23) and after (65) the 2 quotation marks ["] is constant. 在CSV文件中,逗号[,]之前(23)和之后(65)的2个引号[“]的数量是不变的。

So maybe a comma counter could be used. 因此,也许可以使用逗号计数器。 I don't know. 我不知道。

Thanks for feedback. 感谢您的反馈。

This will work using one regex only (tested in Notepad++): 这将只使用一个正则表达式(在Notepad ++中测试):

Enter this regex in the Find what field: Find what字段中输入此正则表达式:

((?:^|\\r\\n)[^"]*+"[^\\r\\n"]*+)\\r\\n([^"]*+")

Enter this string in the Replace with field: 在“ Replace with字段中输入以下字符串:

$1 $2

Make sure the Wrap around check box (and Regular expression radio button) are selected. 确保选中“ Wrap around复选框(和“ Regular expression单选按钮)。

Do a Replace All as many times as required (until the "0 occurrences were replaced" dialog pops up). 根据需要多次执行Replace All (直到弹出“0次更换”对话框)。

Explanation: 说明:

(
  (?:^|\r\n)     Begin at start of file or before the CRLF before the start of a record
  [^"]*+         Consume all chars up to the opening "
  "              Consume the opening "
  [^\r\n"]*+     Consume all chars up to either the first CRLF or the closing "
)                Save as capturing group 1 (= everything in record before the target CRLF)
\r\n             Consume the target CRLF without capturing it
(
  [^"]*+         Consume all chars up to the closing "
  "              Consume the closing "
)                Save as capturing group 2 (= the rest of the string after the target CRLF)

Note: The *+ is a possessive quantifier. 注意:* +是占有量词。 Use them appropriately to speed up execution. 适当使用它们可以加快执行速度。

Update: 更新:

This more general version of the regex will work with any line break sequence ( \\r\\n , \\r or \\n ): 这个更正式的正则表达式版本适用于任何换行符序列( \\r\\n\\r \\n\\n ):

((?:^|[\\r\\n]+)[^"]*+"[^\\r\\n"]*+)[\\r\\n]+([^"]*+")

In this case the source data is generated by the export function in GMail for your contacts. 在这种情况下,源数据由GMail中的导出功能为您的联系人生成。 After the modification outlined below (without RegEx) the result can be used to tidy up your contacts database and re-import it to GMail or to MS Outlook. 经过下面概述的修改(不使用RegEx),结果可用于整理联系人数据库并将其重新导入到GMail或MS Outlook。 Yes, I am standing on the shoulders of @alan and @robinCTS. 是的,我站在@alan和@robinCTS的肩膀上。 Thank you both. 谢谢你俩。

Instructions in 5 steps: 5个步骤的说明:

use Notepad++ / find replace / extended search mode / wrap around = on 使用记事本++ /查找替换/扩展搜索模式/环绕=开启

-1- replace all [CRLF] with a unique set characters or a string (I used [ ~~ ]) -1-用唯一的设置字符或字符串替换所有[CRLF](我用过[ ~~ ])

find: \\r\\n and replace with: ~~ The file contents are now on one line only. 查找: \\r\\n并替换为: ~~文件内容现在仅在一行上。

-2- Now we need to separate the header line. -2-现在我们需要分开标题行。 For this move to where the first record starts exactly before the 88th. 为此,要移至第一个记录恰好在第88个记录之前开始的位置。 comma (including the word after the 87th. comma [,]) and enter the [CRLF] manually by hitting the return key. 逗号(包括第87个逗号[,]之后的单词)并按回车键手动输入[CRLF]。 There are two lines now: header and records. 现在有两行:标头和记录。

-3- now find all [ ,~~ ] and replace with [ ,\\r\\n ] The result is one record per line. -3-现在找到所有[ ,~~ ]并替换为[ ,\\r\\n ]结果是每行一条记录。

-4- remove the remaining [~~] find: ~~ and replace with: [ -4-删除剩余的[~~]查找: ~~并替换为:[ ] a space. ] 空间。 The file is now clean of unwanted [CRLF]s. 现在,文件中清除了不需要的[CRLF]。

-5- Save the file and use it as intended. -5-保存文件并按预期使用。

Maybe do it in three steps (assuming you have 88 fields in the CSV, because you said there are 23 commas before, and 65 after each second " ) 也许分三步完成(假设您在CSV中有88个字段,因为您说之前有23个逗号,每秒后有65个"

Step 1 : replace all CR/LF with some character not anywhere in the file, like ~ 步骤1 :将所有CR / LF替换为文件中任何地方都没有的字符,例如~

Search: \\r\\n Replace: ~ 搜索: \\r\\n替换: ~

Step 2 : replace all ~ after every 88th 'comma group' (or however many fields in CSV) with \\r\\n -- to reinsert the required CSV linebreaks: 第2步 :在每个第88个“逗号组”(或CSV中的多个字段)之后用\\r\\n替换所有~ ,以重新插入所需的CSV换行符:

Search: ((?:[^,]*?,){88})~ Replace: $1\\r\\n 搜索: ((?:[^,]*?,){88})~替换: $1\\r\\n

Step 3 : replace all remaining ~ with space 第3步 :用空间替换所有剩余的~

Search ~ Replace: <space> 搜索~替换: <space>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM