[英]Ignoring carriage returns in regular expressions
我目前正在尝试用Javascript解析对话文件。 这是一个这样的对话的例子。
09/05/2016, 13:11 - Joe Bloggs: Hey Jane how're you doing? 😊 what dates are you in London again? I realise that June isn't actually that far away so might book my trains down sooner than later! 09/05/2016, 13:47 - Jane Doe: Hey! I'm in london from the 12th-16th of june! Hope you can make it down :) sorry it's a bit annoying i couldn't make it there til a sunday! 09/05/2016, 14:03 - Joe Bloggs: Right I'll speak to my boss! I've just requested 5 weeks off in November/December to visit Aus so I'll see if I can negotiate some other days! When does your uni term end in November? I'm thinking of visiting perth first then going to the east coast! 09/05/2016, 22:32 - Jane Doe: Oh that'll be awesome if you come to aus! Totally understand if it's too hard for you to request more days off in june. I finish uni early November! So should definitely be done by then if you came here 09/05/2016, 23:20 - Joe Bloggs: I could maybe get a couple of days 😊 when do you fly into London on the Sunday? Perfect! I need to speak to everyone else to make sure they're about. I can't wait to visit but it's so far away! 09/05/2016, 23:30 - Jane Doe: I fly in at like 7.30am so I'll have that whole day! I'm sure the year will fly since it's may already haha 09/05/2016, 23:34 - Joe Bloggs: Aw nice one! Even if I can get just Monday off I can get an early train on Sunday 😊
我当前的正则表达式看起来像这样
(\d{2}\/\d{2}\/\d{4}),\s(\d(?:\d)?:\d{2})\s-\s([^:]*):\s(.*?)(?=\s*\d{2}\/|$)/gm
我的方法几乎就在那里,并按预期给了我4组
{
"group": 1,
"value": "09/05/2016"
},
{
"group": 2,
"value": "13:11"
},
{
"group": 3,
"value": "Joe Bloggs"
},
{
"group": 4,
"value": "Hey Jane how're you doing? 😊 what dates are you in London again? I realise that June isn't actually that far away so might book my trains down sooner than later!"
}
当消息(组4)包含回车时出现问题。 (请参阅示例代码段中第3行的消息)。
我做了一些研究和使用
[\s\S]没有解决我的问题。 该模式只是停止并移动到下一次出现。
对于第三个对话,消息在回车时被切断。
任何帮助,将不胜感激!
尝试
(\d{2}\/\d{2}\/\d{4}),\s(\d{1,2}:\d{2})\s-\s([^:]*):\s+(.*(?:\n+(?!\n|\d{2}\/).*)*)
( https://regex101.com/r/sA3sB8/2 )扫描到行尾,然后使用重复的组首先检查新行是否以\\d\\d/
开头(这是在下一行开始一个日期),如果没有,也要捕获整行。
如果您担心两个数字后跟正斜杠可能会遇到任何边缘情况,您可以使负面预测更具体一些。 它增加了步骤数,但会使它更安全一些。
如果用户实际输入换行符后跟该语法中的日期,则可能会出现问题,因为它会在该点停止匹配。 我怀疑他们也会包括一个逗号和一个24小时的时间,所以这可能是处理这种情况的一种方法。
例:
09/05/2016, 23:36 - Jane Doe: Great! Let me give you my travel details:
10/01/2016 @ 6am - Arrive at the station
10/01/2016 @ 7am - Get run over by a drunk horse carriage (the driver and the horse were both sober; the carriage stayed up a bit late to drink)
10/01/2016 @ 7:15am - Pull myself out from under the carriage and kick at its wheels vehemently.
09/05/2016, 23:40 - Joe Bloggs: Haha, sounds great.
这只是一个示例( 相应的修复 ,为处理它添加更多细节到前瞻)只是为了显示用户如何添加可能破坏正则表达式的特定修订版本的文本。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.