简体   繁体   English

使用正则表达式提取对话

[英]Extract conversations using regex

I have text like this: 我有这样的文字:

[agent]:Welcome to ABC bank My name is Asif. [代理]:欢迎来到ABC银行。我叫Asif。 How may I help you [cust]:I got additional charge in my credit card, I will not be paying this, please remove it [agent]:Okay can I place the call on hold [cust]:This is very unresponsive behaviour on banks side 我可以如何帮助您[cust]:我的信用卡上有额外的费用,我将不支付这笔费用,请删除它[agent]:好的,我可以保留通话吗[cust]:这是一种非常无响应的行为银行方面

The conversations are not line seperated. 对话不是行分隔的。 I need to extract only what customer said and ignore what agent said for analyzing customer sentiment. 我只需要提取客户所说的内容,而忽略代理商在分析客户情绪时所说的内容。 Please help with this regex. 请帮助此正则表达式。

Either: 要么:

\\[cust\\]:((?:(?!\\[\\w+\\]:).)*)

or 要么

(?s)\\[cust\\]:(.*?)(?=\\[\\w+\\]:|$)

https://regex101.com/r/RT2O4y/1 https://regex101.com/r/RT2O4y/1

Benchmarks: 基准测试:

Regex1:   \[cust\]:((?:(?!\[\w+\]:).)*)
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   2
Elapsed Time:    1.37 s,   1372.69 ms,   1372693 µs
Matches per sec:   72,849


Regex2:   (?s)\[cust\]:(.*?)(?=\[\w+\]:|$)
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   2
Elapsed Time:    0.92 s,   918.17 ms,   918175 µs
Matches per sec:   108,911

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM