简体   繁体   English

如何将一个长的连续字符串拆分为它包含的单词数组?

[英]How can I split a long continuous string into an array of the words it contains?

I have a long continous string that looks something like this:我有一个长的连续字符串,看起来像这样:

let myString = "onetwothreefourfivesixseveneightnineteneleventwelvethirteenfourteen";

It does not have any separators to easily target.它没有任何分隔符可以轻松定位。
So how can I itrate over it and split the words so it ends up like:那么我怎样才能迭代它并拆分单词,所以它最终会像:

splitString = ["one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen"];

Preferably with JavaScript.最好使用 JavaScript。

The problem here is the lack of separators as you have mentioned - this makes it impossible for the software to know where the words begin and end.这里的问题是您提到的缺少分隔符 - 这使得软件无法知道单词的开始和结束位置。

Given that you know the words that will show up, my technique would be so:鉴于您知道会出现的单词,我的技巧是这样的:

NOTE: This does not take into account the possibility of overlapping words and assumes none of the words are possible subsets of other words...注意:这没有考虑重叠单词的可能性,并假设没有一个单词是其他单词的可能子集......

  1. Iterate the known words迭代已知单词
  2. Search (indexOf) the string for each known word and note down it's positions in the string搜索 (indexOf) 每个已知单词的字符串并记下它在字符串中的位置
  3. Sort the the values by the index values按索引值对值进行排序
  4. Generate an array with the values contained in the order found使用找到的顺序中包含的值生成一个数组

/**
 * This assumes that:
 *  - Input words are not subsets of other input words
 */

// Find all indices of the input word in the input String
function findAll(inputString, inputWord) {
    const indices = [];
    let index = 0;
    while (index < inputString.length) {
        index = inputString.indexOf(inputWord, index);
        if (index == -1) break; // -1 means not found so we break here
        indices.push({ index, word: inputWord });
        index += inputWord.length;
    }
    return indices;
}

// Split the words into an array of Objects holding their positions and values
function splitWords(inputString, inputWords) {
    // For holding the results
    let results = [];
    // Loop the input words
    for (const inputWord of inputWords) {
        // Find the indices and concat to the results array
        results = results.concat(findAll(inputString, inputWord));
    }
    return results;
}

// Sort the words and return just an array of Strings
const orderWords = (inputArr) => inputArr.sort((a, b) => a.index - b.index).map(input => input.word);

/**
 * Usage like so:
 */
const myString = 'onetwothreefourfivesixseveneightnineteneleventwelvethirteenfourteen';
const inputWords = ["one", "two", "three","four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen"];

const result = splitWords(myString, inputWords);
const ordered = orderWords(result);

console.dir(ordered);

/**
 * Result:
    [
    'one',      'two',
    'three',    'four',
    'five',     'six',
    'seven',    'eight',
    'nine',     'ten',
    'eleven',   'twelve',
    'thirteen', 'four',
    'fourteen'
    ]
 */

If as you said in the comments that you know the expected words then create an array of these words and loop through your string to find these words如果您在评论中说您知道预期的单词,那么创建这些单词的数组并遍历您的字符串以查找这些单词

note the bellow code takes into account the length of the matched words so that you can find words such as one hundred eighty five otherwise the loop stops when it finds one请注意,波纹管代码考虑了匹配单词的长度,以便您可以找到诸如one hundred eighty five之类的单词,否则循环在找到one时停止

you can read the comments in the code to better understand it您可以阅读代码中的注释以更好地理解它

 // your string var myString = "onetwothreefourfivesixseveneightnineteneleventwelvethirteenfourteentwentyfiveonehundredeightyfiveeightyfive"; // the list of expected words var possibleWords = [ "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "twenty five", "one hundred eighty five", "eighty five", ]; function separateString(mergedString, possibleWords) { // the resulted array that has all the splited words var result = []; // buffer to temporary store the string and match it with the expected words array var buffer = ""; // The word that has been matched in buffer with possible word in expected words array var matchedWord = ""; // Index if the matched word var matchedWordLastIndex = -1; // Converting your string into array so we can access it by index letter by letter var splitedString = mergedString.split(""); // For every letter in your string for (var stringIndex = 0; stringIndex < splitedString.length; stringIndex++) { // Resetting the variables matchedWord = ""; buffer = ""; matchedWordLastIndex = -1; // Look a head from current string index to the end of your string and find every word that matches with expected words for ( var lookAhead = stringIndex; lookAhead < splitedString.length; lookAhead++) { // Append letters with each iteration of look ahead with the buffer so we can make words from it buffer += splitedString[lookAhead]; // loop through expected words to find a match with buffer for (var i = 0; i < possibleWords.length; i++) { // if buffer is equal to a word in expected words array: .replace(/ /g, '') removes space if the words inside expected array of words have space such as twenty five to twentyfive if (buffer == possibleWords[i].replace(/ /g, '')) { // check if the found word has more letters than the previouse matched word so we can find words like one hundred eighty five otherwise it will just find one and stops if(matchedWord.length < buffer.length) { // if the word has more letters then put the word into matched word and store the look ahead index into matchedWordLastIndex matchedWord = possibleWords[i]; matchedWordLastIndex = lookAhead; } } } } // if a word has been found if(matchedWord.length > 0){ // make starting index same as look ahead index since last word found ended there stringIndex = matchedWordLastIndex; // put the found word into result array result.push(matchedWord); } } return result; } console.log(separateString(myString, possibleWords));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM