[英]Ignoring carriage returns in regular expressions
我目前正在嘗試用Javascript解析對話文件。 這是一個這樣的對話的例子。
09/05/2016, 13:11 - Joe Bloggs: Hey Jane how're you doing? 😊 what dates are you in London again? I realise that June isn't actually that far away so might book my trains down sooner than later! 09/05/2016, 13:47 - Jane Doe: Hey! I'm in london from the 12th-16th of june! Hope you can make it down :) sorry it's a bit annoying i couldn't make it there til a sunday! 09/05/2016, 14:03 - Joe Bloggs: Right I'll speak to my boss! I've just requested 5 weeks off in November/December to visit Aus so I'll see if I can negotiate some other days! When does your uni term end in November? I'm thinking of visiting perth first then going to the east coast! 09/05/2016, 22:32 - Jane Doe: Oh that'll be awesome if you come to aus! Totally understand if it's too hard for you to request more days off in june. I finish uni early November! So should definitely be done by then if you came here 09/05/2016, 23:20 - Joe Bloggs: I could maybe get a couple of days 😊 when do you fly into London on the Sunday? Perfect! I need to speak to everyone else to make sure they're about. I can't wait to visit but it's so far away! 09/05/2016, 23:30 - Jane Doe: I fly in at like 7.30am so I'll have that whole day! I'm sure the year will fly since it's may already haha 09/05/2016, 23:34 - Joe Bloggs: Aw nice one! Even if I can get just Monday off I can get an early train on Sunday 😊
我當前的正則表達式看起來像這樣
(\d{2}\/\d{2}\/\d{4}),\s(\d(?:\d)?:\d{2})\s-\s([^:]*):\s(.*?)(?=\s*\d{2}\/|$)/gm
我的方法幾乎就在那里,並按預期給了我4組
{
"group": 1,
"value": "09/05/2016"
},
{
"group": 2,
"value": "13:11"
},
{
"group": 3,
"value": "Joe Bloggs"
},
{
"group": 4,
"value": "Hey Jane how're you doing? 😊 what dates are you in London again? I realise that June isn't actually that far away so might book my trains down sooner than later!"
}
當消息(組4)包含回車時出現問題。 (請參閱示例代碼段中第3行的消息)。
我做了一些研究和使用
[\s\S]沒有解決我的問題。 該模式只是停止並移動到下一次出現。
對於第三個對話,消息在回車時被切斷。
任何幫助,將不勝感激!
嘗試
(\d{2}\/\d{2}\/\d{4}),\s(\d{1,2}:\d{2})\s-\s([^:]*):\s+(.*(?:\n+(?!\n|\d{2}\/).*)*)
( https://regex101.com/r/sA3sB8/2 )掃描到行尾,然后使用重復的組首先檢查新行是否以\\d\\d/
開頭(這是在下一行開始一個日期),如果沒有,也要捕獲整行。
如果您擔心兩個數字后跟正斜杠可能會遇到任何邊緣情況,您可以使負面預測更具體一些。 它增加了步驟數,但會使它更安全一些。
如果用戶實際輸入換行符后跟該語法中的日期,則可能會出現問題,因為它會在該點停止匹配。 我懷疑他們也會包括一個逗號和一個24小時的時間,所以這可能是處理這種情況的一種方法。
例:
09/05/2016, 23:36 - Jane Doe: Great! Let me give you my travel details:
10/01/2016 @ 6am - Arrive at the station
10/01/2016 @ 7am - Get run over by a drunk horse carriage (the driver and the horse were both sober; the carriage stayed up a bit late to drink)
10/01/2016 @ 7:15am - Pull myself out from under the carriage and kick at its wheels vehemently.
09/05/2016, 23:40 - Joe Bloggs: Haha, sounds great.
這只是一個示例( 相應的修復 ,為處理它添加更多細節到前瞻)只是為了顯示用戶如何添加可能破壞正則表達式的特定修訂版本的文本。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.