简体   繁体   English

如何正则表达式匹配并返回具有已知起始格式并以双换行符结尾的字符串?

[英]How to regex match and return strings with a known start format and ending with a double line break?

I'm trying to parse a text file with javascript.我正在尝试使用 javascript 解析文本文件。 I have no control over the contents of the text file.我无法控制文本文件的内容。

The text file consists of multiple records.文本文件由多条记录组成。 Each record begins with a HH:MM timestamp.每条记录都以 HH:MM 时间戳开头。 Each record is separated by a double line break \n\n .每条记录由双换行符\n\n分隔。 Records may be a single line, or may be multiple lines separated by a single line break \n .记录可以是单行,也可以是由单个换行符\n分隔的多行。

example:例子:

09:00\tRecordA
\tSome extra detail about Record A

10:00\tRecordB
\tSome extra detail about Record B
\tEven more detail about Record B

11:00\tRecordC

I hope to generate an array of records like this:我希望生成一个这样的记录数组:

[ 
    "09:00\tRecordA\n\tSome extra detail about Record A",
    "10:00\tRecordB\n\tSome extra detail about Record B\n\tEven more detail about Record B",
    "11:00\tRecordC"
]

So far I can get the first lines without problem.到目前为止,我可以毫无问题地获得第一行。

textFile.match(/^\d\d:\d\d.*\n?/gm);

[ 
    "09:00\tRecordA",
    "10:00\tRecordB",
    "11:00\tRecordC
]

After a lot of searching, trial and error I'm still having trouble getting the extra details.经过大量搜索,反复试验,我仍然无法获得额外的细节。 Below are what appeared to be the most promising avenues, but I'm probably far from the mark.以下是似乎最有希望的途径,但我可能离目标还很远。

Adding an extra \n, but as the wildcard doesn't match line breaks this obviously did not work.添加一个额外的 \n,但由于通配符与换行符不匹配,这显然不起作用。

textFile.match(/^\d\d:\d\d.*\n\n?/gm);

Using the \s modifier, but this did not split records into separate array items.使用 \s 修饰符,但这并没有将记录拆分为单独的数组项。

textFile.match(/^\d\d:\d\d.*\n?/sgm); 

[ 
    "09:00\tRecordA\n\tSome extra detail about Record A\n\n10:00\tRecordB\n\tSome extra detail about Record B\n\tEven more detail about Record B"
]

Defining a group and repeating it twice, but this returned null定义一个组并重复两次,但这返回null

textFile.match(/^\d\d:\d\d.*(\n){2}?/gm);

My regex skills are quite limited and I'm trying to learn.我的正则表达式技能非常有限,我正在努力学习。 Would appreciate any pointers and advice on this problem.将不胜感激有关此问题的任何指示和建议。

m multiline modifier, will never work, since then its only procesing one line at a time. m多行修饰符,永远不会起作用,从那时起它一次只处理一行。

without m ^ will only match beginning of text.没有m ^只会匹配文本的开头。

The usual newline wildcard is [^] (not nothing), but this will match until last new line.通常的换行通配符是[^] (不是什么都没有),但这将匹配到最后一个新行。

There might be a way with regex正则表达式可能有一种方法

But you could consider .split("\n\n") instead但你可以考虑.split("\n\n")代替

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM