简体   繁体   English

如何在 javascript 中按单词序列匹配两个字符串?

[英]How to match two strings by word sequence in javascript?

Need to find the string that matches most with a given string.需要找到与给定字符串最匹配的字符串。

list = ['Mr. Adam Smith Junior', 'Anna Smith Jr.', 'Adam Jhon Smith', 'Smith Adam'];
str = 'Adam Smith Jr.';

// output: 'Mr. Adam Smith Junior'

I have tried by tokenizing words and for each of the item in list, it matches with 2tokens(Adam/Smith/Jr.).我尝试过对单词进行标记,对于列表中的每个项目,它都与 2tokens(Adam/Smith/Jr.) 匹配。 But my expected output is list[0].但我预期的 output 是列表 [0]。 So I need a solution to find the matched string by most matching word sequence.所以我需要一个解决方案来通过最匹配的单词序列找到匹配的字符串。

// Your code request - Updated code.
var list = ['Mr. Adam Smith Junior', 'Anna Smith Jr.', 'Adam Jhon Smith', 'Smith Adam'];
var str = 'Adam Smith Jr.';

// Updated code.
str = str.split(' ');
var i = 0, j = 0;
var matched = [];

// First loop for your `list` variable array to go through each string.
while(i < str.length){
    // Second loop for your `str` variable to match each part of string with your `list` variable's each string.    
    for(j=0;j < list.length;j++){      
        var match = new RegExp( str[i], 'g' );  
            if(list[j].match(match)){
            // Find exact match - Your code goes here.
            if(!matched.includes(list[j])){
                matched.push(list[j]);
            }

          }
        }
  i++;
}

// Your result of find the all matched values from `list` variable.
console.log(matched);

EDIT: It seems like you're looking to find the most consecutive number of words.编辑:您似乎正在寻找最连续的单词数。 Here's the version that exactly does this:这是完全执行此操作的版本:

    var list = ['Mr. Adam Smith Junior', 'Anna Smith Jr.', 'Adam Jhon Smith', 'Smith Adam'];
    var str = 'Adam Smith Jr.';

    /**
     * Representing a class to search and match a string in a list of strings
     * @param {string[]} data An array consists of input strings to search in
     * @param {string} query A string to search for
     */
    var TextMatcher = function(data, query) {

        this.wordsToSearch = extractWords(query);

        /** @type {any[]} */
        this.results = [];
        var $this = this;

        data.forEach(function(ele, idx) {

            var words = extractWords(ele);

            var start = findSearchStartPoint(words, $this.wordsToSearch);

            var numberOfMatches = findNumberOfMatches(words, $this.wordsToSearch, start);

            $this.results.push({
                words: words,
                wordsToSearch: $this.wordsToSearch,
                firstListMatchedIndex: start.firstListMatchedIndex,
                numberOfMatches: numberOfMatches,
                index: idx
            });
        });

        // The logic can be improved by using regular expression
        /**
         * Simply breaks down a string to its words
         * @param {string} s The input string 
         */
        function extractWords(s) {
            return (s || '').split(' ');
        }

        /**
         * @typedef {Object} SearchPoint Starting point of search
         * @property {number} startIndex - Index of `wordsToSearch` element
         * @property {number} firstListMatchedIndex - Index of `words` element
         */

        /**
         * Finds the starting point of the search. In other words, it returns the first word in `wordsToSearch`
         * that matches the first word in `words`
         * @param {string[]} words The array in which search will be done
         * @param {string[]} wordsToSearch The array of words that needs to be searched for
         * @returns {SearchPoint} Returns indices of each input array from which search should start
         */
        function findSearchStartPoint(words, wordsToSearch) {
            var startIndex = wordsToSearch.length;
            var firstListMatchedIndex = -1;
            for (var i = 0; i < wordsToSearch.length; i++) {
                firstListMatchedIndex = words.findIndex(function(w, x) {
                    return x > firstListMatchedIndex
                            && wordsToSearch[i].toLowerCase() === w.toLowerCase();
                });

                if (firstListMatchedIndex > -1) {
                    startIndex = i;
                    break;
                }
            }

            return {
                startIndex: startIndex,
                firstListMatchedIndex: firstListMatchedIndex
            };
        }

        /**
         * Returns number of consecutive `wordsToSearch` elements in `words` starting
         * from `start`
         * @param {string[]} words The array in which search will be done
         * @param {string[]} wordsToSearch The array of words that needs to be searched for
         * @param {SearchPoint} start
         */
        function findNumberOfMatches(words, wordsToSearch, start) {
            var numberOfMatched = 0;
            if (start.firstListMatchedIndex > -1) {
                numberOfMatched = 1;
                for (var i = start.startIndex + 1; i < wordsToSearch.length; i++) {
                    if (wordsToSearch[i].toLowerCase() === (words[i + start.firstListMatchedIndex] || '').toLowerCase()) {
                        numberOfMatched++;            
                    } else {
                        break;
                    }
                }
            }

            return numberOfMatched;
        }
    };

    /**
     * Sends a summary of how the search performed to `console`
     */
    TextMatcher.prototype.showResutls = function() {
        console.info('Words to be searched:')
        console.info(this.wordsToSearch);
        console.info('\n');

        this.results.forEach(function(r) {
            console.info(r.words);
            console.info('List item ' + r.index + ' ---- No. of words matched: ' + r.numberOfMatches
                            + (r.numberOfMatches > 0  ? ', First word matched: ' + r.words[r.firstListMatchedIndex] : ''));    
        });
        console.info('\n');
    };

    /**
     * Displays which `data` item has the most number of matched consecutive words in `console`
     */
    TextMatcher.prototype.mostMatched = function() {
        var max = Math.max(...this.results.map(function(el) {
            return el.numberOfMatches;
        }));
        return this.results.find(function(el) {
            return el.numberOfMatches === max;
        });
    };

    // Creates an instance of TextMatcher
    var search = new TextMatcher(list, str);
    // Shows results in console    
    search.showResutls();
    // Gets the most matched item in the list
    var res = search.mostMatched();
    // Shows the most matched item in console
    console.info('The phrase with the most consecutive words matched:');
    console.info('Phrase: "' + list[res.index] + '", No. of words matched: ' + res.numberOfMatches + ', Index: ' + res.index);

And the output will look like this: output 将如下所示:

//Words to be searched:
//[ 'Adam', 'Smith', 'Jr.' ]


//[ 'Mr.', 'Adam', 'Smith', 'Junior' ]
//List item 0 ---- No. of words matched: 2, First word matched: Adam
//[ 'Anna', 'Smith', 'Jr.' ]
//List item 1 ---- No. of words matched: 1, First word matched: Smith
//[ 'Adam', 'Jhon', 'Smith' ]
//List item 2 ---- No. of words matched: 1, First word matched: Adam
//[ 'Smith', 'Adam' ]
//List item 3 ---- No. of words matched: 1, First word matched: Adam


//The phrase with the most consecutive words matched:
//Phrase: "Mr. Adam Smith Junior", No. of words matched: 2, Index: 0

Please let me know if this is not still what you wanted.如果这仍然不是您想要的,请告诉我。

================================== ====================================

INITIAL POST:初始帖子:

I was wondering if you're looking for something like this:我想知道你是否正在寻找这样的东西:

var list = ['Mr. Adam Smith Junior', 'Anna Smith Jr.', 'Adam Jhon Smith', 'Smith Adam'];
var str = 'Adam Smith Jr.';

// This loop finds the best matching string in the array and
// and prints it in the console
function findBestMatch(arr, s) {
    for (var i = s.length; i > 1; i--) {
        var found = arr.find(function(ele) {
            return ele.indexOf(s.substr(0, i)) > -1;
        }) || [];
        if (found.length) {
            console.info(found);
            break;
        }
    }
}

findBestMatch(list, str);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM