简体   繁体   中英

How to count two words as 1 in same line

In the text file I've got, each sentence is represented with a specific type such as: contrast.

A contrasting sentence can either be represented with a tag "CONTRAST" or "CONTR" or "WEAKCONTR". For instance:

IMPSENT_CONTRAST_VIS(Studying networks in this way can help to
identify the people from whom an individual learns , where
conflicts_MD:+ in understanding_MD:+ may originate , and which
contextual factors influence learning .)

So I count these with following expression: /(\\_(WEAK))|(\\_CONTRAST)|(\\_CONTR(\\_|\\())/g which works perfectly fine.

Now the problem is some sentences are expressed with more than one contrast tag such as CONTR & WEAKCONTR together. For instance:

IMPSENT_CONTRAST_EMPH_WEAKCONTR_VIS(Studying_MD:+ networks in this way can help to identify_MD:+ the people from whom an individual learns , where conflicts_MD:+ in understanding_MD:+ may originate , and which contextual factors influence learning .)

At this point I have to count these as 1 not 2. Do you have any idea how possible this is with RegExp?

You can use lookaheads to assert it, and then count the matches:

(?=\w*_(?:WEAK|CONTRAST|CONTR[_)]))\b\w+\b

Demo here: http://regex101.com/r/xP2yI7/3
Notice the match count.

This will match the whole IMPSENT_CONTRAST_EMPH_WEAKCONTR_VIS expression, but only if it matches the part in the lookahead, which filters for the keywords you're looking after. This will match even if you have multiple such sentences on the same line.

Also, I've simplified your regex a bit, retaining the same meaning. Notice you don't have to escape the _ .

You really just care if the tag shows up in the line at all, so just grab the whole line, provided it has a tag, like so:

/^([A-Z_]+(WEAK|CONTRAST|CONTR)+[A-Z_]*)/gm

From the start of the line ^ look for a word block with AZ or _ followed by the tag, optionally followed by more words/underscores.

DEMO

你能尝试添加\\w+

/(\_(WEAK\w+))|(\_CONTRAST\w+)|(\_CONTR(\_\w+|\())/g

像这样的东西?

(^(\_(WEAK))|(\_CONTRAST)|(\_CONTR(\_|\()))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM