简体   繁体   English

如何从C ++中的字符串中提取子字符串?

[英]How to extract a substring from a string in C++?

I've been looking thousand of questions and answers about what I'm going to ask, but I still didn't find the way to do what I'm gonna to explain. 我一直在寻找关于我要问的问题的成千上万的问题和答案,但是我仍然没有找到做我要解释的方法的方法。

I have a text file from which I have to extract information about several things, all of them with the following format: 我有一个文本文件,我必须从中提取有关几件事的信息,所有这些信息都采用以下格式:

"string1":"string2"

And after that, there is more information, I mean: 在那之后,有更多信息,我的意思是:

The text file is something like this: 文本文件是这样的:

LINE 1 XXXXXXXXXXXXXXXXXXXXXXXXXXXX"string1":"string2"XXXXXXXXXXXXXXXXXXXXXXXXXX"string3":"string4"XXXXXXXXXXXXXXXXXXXXXXXXXXXX...('\\n') LINE 1 XXXXXXXXXXXXXXXXXXXXXXXXXXXX“ string1”:“ string2” XXXXXXXXXXXXXXXXXXXXXXXXXX“ string3”:“ string4” XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ...('\\ n')

LINE 2 XXXXXXXXXXXXXXXXXXXXXXXXXXXX"string5":"string6"XXXXXXXXXXXXXXXXXXXXXXXXXX"string7":"string8"XXXXXXXXXXXXXXXXXXXXXXXXXXXX... LINE 2 XXXXXXXXXXXXXXXXXXXXXXXXXXXX“ string5”:“ string6” XXXXXXXXXXXXXXXXXXXXXXXXXX“ string7”:“ string8” XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ...

XXX represents irrelevant information I do not need, and theEntireString (string used in the code example) stores all the information of a single line, not all the information of the text file. XXX代表我不需要的无关信息,EntireString(在代码示例中使用的字符串)存储一行的所有信息,而不是文本文件的所有信息。

I have to find first the content of string1 and store the content of string2 into another string without the quotes. 我必须首先找到string1的内容,并将string2的内容存储到另一个没有引号的字符串中。 The problem is that I have to stop when I reache the last quote and I don't know how exactly do this. 问题是,当我到达最后一个报价时,我必须停下来,而且我不知道该怎么做。 I suppose I have to use the functions find() and substr(), but despite having tried it repeatedly, I did not succeed. 我想我必须使用功能find()和substr(),但是尽管反复尝试,但还是没有成功。

What I have done is something like this: 我所做的是这样的:

string extractInformation(string theEntireString)
{
  string s = "\"string1\":\"";    
  string result = theEntireString.find(s);
  return result;
}

But this way I suppose I store into the string the last quote and the rest of the string. 但是这样我想我将最后一个引号和其余字符串存储到字符串中。

"find" function just give you the position of matched string to get the resulting string you need to use the "subst" function. “ find”函数只为您提供匹配字符串的位置,以获取需要使用“ subst”函数的结果字符串。 Try This 尝试这个

string start,end;
start = theEntireString.substr(1,theEntireString.find(":")-2);
end = theEntireString.substr(theEntireString.find(":")+2,theEntireString.size()-1);

That will solve you problem 那会解决你的问题

Best of Luck... 祝你好运...

Two steps: 两步:

First we have to find the position of the : and splice the string into two parts: 首先,我们必须找到的位置:和拼接字符串分为两个部分:

string first = theEntireString.substr(0, theEntireString.find(":"));
string second = theEntireString.substr(theEntireString.find(":") + 1);

Now, we have to remove the "" : 现在,我们必须删除""

string final_first(first.begin() + 1, first.end() - 1);
string final_second(second.begin() + 1, second.end() - 1);

Assuming either the key or value contains a quotation mark. 假设键或值包含引号。 The following will output the value after the ":". 以下将在“:”之后输出值。 You can also use it in a loop to repeatedly extract the value field if you have multiple key-value pairs in the input string, provided that you keep a record of the position of last found instance. 如果在输入字符串中有多个键值对,还可以在循环中使用它重复提取值字段,前提是您要记录最后找到的实例的位置。

#include <iostream>
using namespace std;

string extractInformation(size_t p, string key, const string& theEntireString)
{
  string s = "\"" + key +"\":\"";
  auto p1 = theEntireString.find(s);
  if (string::npos != p1)
    p1 += s.size();
  auto p2 = theEntireString.find_first_of('\"',p1);
  if (string::npos != p2)
    return theEntireString.substr(p1,p2-p1);
  return "";
}

int main() {
  string data = "\"key\":\"val\" \"key1\":\"val1\"";
  string res = extractInformation(0,"key",data);
  string res1 = extractInformation(0,"key1",data);
  cout << res << "," << res1 << endl;
}

Outputs: 输出:

val,val1
#include <regex>
#include <iostream>

using namespace std;

const string text = R"(
XXXXXXXXXXXXXXXXXXXXXXXXXXXX"string1":"string2"XXXXXXXXXXXXXXXXXXXXXXXXXX"string3"  :"string4" XXXXXXXXXXXXXXXXXXXXXXXXXXXX...
XXXXXXXXXXXXXXXXXXXXXXXXXXXX"string5":  "string6"XXXXXXXXXXXXXXXXXXXXXXXXXX"string7"  :  "string8" XXXXXXXXXXXXXXXXXXXXXXXXXXXX...
)";

int main() {
    const regex pattern{R"~("([^"]*)"\s*:\s*"([^"]*)")~"};
    for (auto it = sregex_iterator(begin(text), end(text), pattern); it != sregex_iterator(); ++it) {
        cout << it->format("First: $1, Second: $2") << endl;
    }
}

Output:

First: string1, Second: string2
First: string3, Second: string4
First: string5, Second: string6
First: string7, Second: string8

You don't need any string operation. 您不需要任何字符串操作。 I hope the XXXXX doesn't contain any '"', so You can read the both strings directly from the file: 我希望XXXXX不包含任何'“',因此您可以直接从文件中读取两个字符串:

ifstream file("input.txt");
for( string s1,s2; getline( getline( file.ignore( numeric_limits< streamsize >::max(), '"' ), s1, '"' ) >> Char<':'> >> Char<'"'>, s2, '"' ); )
    cout << "S1=" << s1 << " S2=" << s2 << endl;

the little help-function Char is: 一点帮助功能Char是:

template< char C >
std::istream& Char( std::istream& in )
{
    char c;
    if( in >> c && c != C )
        in.setstate( std::ios_base::failbit );
    return in;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM