简体   繁体   English

使用javascript正则表达式在大文本中查找首字母缩写词

[英]Finding acronyms in a big text using javascript regex

I have a big text in which there are some acronyms. 我有一个很大的文字,里面有一些缩写。 All the acronyms are in parenthesis and are in capital letters. 所有首字母缩写词都用括号括起来,并用大写字母表示。 Previous to the parenthesis, there is always the same number of words as the letters in the parenthesis starting with the same alphabets. 在括号之前,总是有与括号中以相同字母开头的字母相同数量的单词。 However, the words might not be started by capital letters. 但是,这些单词可能不能以大写字母开头。

Ex: 例如:

bla bla radar cross section (RCS) bla bla... bla bla雷达横截面(RCS)bla bla ...

bla bla Radar Cross Section (RCS) bla bla... bla bla雷达横截面(RCS)bla bla ...

I need to list all the acronyms. 我需要列出所有首字母缩写词。 How should I start? 我应该如何开始?

Here's one possibility. 这是一种可能性。 It returns an object whose keys are the acronyms and values are the matching preceding words (without any attempt to normalize them for capitalization.) 它返回一个对象,其键是首字母缩写词,值是匹配的前一个单词(不尝试将它们标准化为大写形式)。

 const findAcronyms = (str) => { const words = str.split(/\\s+/) return words.reduce((all, word, i) => { const isCandidate = word.match(/\\([AZ]+\\)/) if (!isCandidate) {return all} const letters = word.split('').slice(1, -1) const acro = letters.join('') if (i - letters.length < 0) {return all} if (words.slice(i - letters.length, i) .map(s => s[0]).join('') .toLowerCase() !== acro.toLowerCase()) { return all } return { ...all, [acro]: words.slice(i - letters.length, i).join(' ') } }, {}) } const str = 'bla bla radar cross section (RCS) but this one (IN) is not And This One (ATO) is' console.log(findAcronyms(str)) //~> // { // RCS: "radar cross section", // ATO: "And This One" // } 

Note that "IN" is not included in the result, as it doesn't match the preceding text. 请注意,结果中不包含"IN" ,因为它与前面的文本不匹配。

If you just want the actual acronyms, without what they stand for, then you could modify the return to be an array, or you could simply run Object.keys over this result. 如果您只想使用实际的首字母缩写词而没有其代表的含义,则可以将return修改为一个数组,也可以仅在此结果上运行Object.keys

 const str = "bla bla radar cross section (RCS) bla bla...(aaaaaa) stack overflow (SO)", acronymes = [], result = str.match(/\\(([AZ].*?)\\)/g).map(val => { acronymes.push(val.substr(1, val.length - 2)); }); console.log(acronymes) 

这是您可以做的:

[\([A-Z]+[\)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM