简体   繁体   English

在字符串中查找多个子字符串时遇到问题

[英]Having issues finding multiple substrings within a string

I am trying to write a program that compares two strings (string and substring) and incitements each time the substring is found within the string.我正在尝试编写一个程序,每次在字符串中找到子字符串时,都会比较两个字符串(字符串和子字符串)和煽动。 However, using the standard:但是,使用标准:

if(str.find(substr) != string::npos)
{
count++;
}

I run into the problem that if the substring appears multiple times in the string it only increments once.我遇到了一个问题,如果子字符串在字符串中出现多次,它只会增加一次。 So if the string is "test test test test" and the substring is "test" count only ends up being 1 instead of 4.因此,如果字符串是“test test test test”并且子字符串是“test”,则计数最终只会是 1 而不是 4。

What would be the best way to fix this?解决此问题的最佳方法是什么?

*Notes for context: *上下文注释:

1) At one point I was checking the string character by character to see if they matched, but had to scrap that when I ran into issues when some words had smaller words in them. 1) 有一次我正在逐个字符地检查字符串以查看它们是否匹配,但是当我遇到一些单词中包含较小单词的问题时不得不放弃它。

Example: 'is' would get picked up inside the word 'this', etc示例:'is' 会在单词 'this' 中被提取,等等

2)The larger program that this is for accepts two vectors. 2)这是用于的较大程序接受两个向量。 The first vector has a string for each element being sentences the user get to type in (acting at the main string in the example above).第一个向量为每个元素都有一个字符串,即用户输入的句子(在上面的示例中作用于主字符串)。 And the second vector has each word from all the sentences entered into the first vector (acting as the substring in the example above).第二个向量将所有句子中的每个单词输入到第一个向量中(作为上例中的子串)。 Not sure if that bit matters or not, but figured I would throw it in there不确定那一点是否重要,但我想我会把它扔在那里

Example:例子:

vector<string> str {this is line one, this is line two, this is line three};
vector<string> substr {is, line, one, this, three, two};

3) I'm thinking if there was some way of doing the opposite of !=string::npos would work, but not sure if that even exist. 3)我在想是否有某种方法可以与 !=string::npos 做相反的事情,但不确定是否存在。

You need a loop to find all of the occurances of a substring in a given string.您需要一个循环来查找给定字符串中子字符串的所有出现次数。

However, since you want to differentiate substrings that are whole words from substrings in larger words, you need to parse the string to determine the whole words before you compare them.但是,由于您希望将整个单词的子字符串与较大单词中的子字符串区分开来,因此您需要在比较之前解析字符串以确定整个单词。

You can use std::string::find_first_of() and std::string::find_first_not_of() to find the beginning and ending indexes of each whole word between desired delimiters (whitespace, punctuation, etc).您可以使用std::string::find_first_of()std::string::find_first_not_of()在所需的分隔符(空格、标点符号等)之间查找每个整个单词的开始和结束索引。 You can use std::string::compare() to compare a substring between those two indexes to your desired substring.您可以使用std::string::compare()将这两个索引之间的子字符串与所需的子字符串进行比较。 For example:例如:

#include <string>

const std::string delims = ",. ";

size_t countWord(const std::string &str, const std::string &word)
{
    std::string::size_type start = 0, end;
    size_t count = 0;

    while ((start = str.find_first_not_of(delims, start)) != std::string::npos)
    {
        end = str.find_first_of(delims, start+1);
        if (end == std::string::npos)
        {
            if (str.compare(start, str.size()-start, word) == 0)
                ++count;

            break;
        }

        if (str.compare(start, end-start, word) == 0)
            ++count;

        start = end + 1;
    }

    return count;
}

Alternatively, you can extract the whole words into a std::vector and then use std::count() to count how many elements match the substring.或者,您可以将整个单词提取到std::vector ,然后使用std::count()计算与子字符串匹配的元素的数量。 For example:例如:

#include <string>
#include <vector>
#include <algorithm>

const std::string delims = ",. ";

size_t countWord(const std::string &str, const std::string &word)
{
    std::vector<std::string> vec;
    std::string::size_type start = 0, end;

    while ((start = str.find_first_not_of(delims, start)) != string::npos)
    {
        end = str.find_first_of(delims, start+1);
        if (end == std::string::npos)
        {
            vec.push_back(str.substr(start));
            break;
        }

        vec.push_back(str.substr(start, end-start));

        start = end + 1;
    }

    return std::count(vec.begin(), vec.end(), word);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM