简体   繁体   English

读取字符串 C++ 中的每个单词

[英]Read every word in a string C++

I am trying to read every word a string.我正在尝试将每个单词读为一个字符串。 I want a string to go in and the first word to come out, then I'll process it, then the second, and so on.我想要一个字符串进入,第一个单词出来,然后我会处理它,然后是第二个,依此类推。 But the internet isn't helping me, I know it's probably right under my nose but I can't figure it out!但是互联网并没有帮助我,我知道它可能就在我的眼皮底下,但我想不通!

string lex(string filecontent) {

string t = filecontent;
getline(cin, t);

istringstream iss(t);
string word;

while (iss >> word) {
    return word;
}

}
int main() {
    string data = load_file(); // Returns a string of words
    cout << data;
    cout << lex(data);
    getchar();
}

Right now this works... sort of it prints out a lot of random gibberish and crazy characters, The file I'm reading's output is ok I check this at cout << data and it is what I expect.现在这有效......它打印出很多随机的胡言乱语和疯狂的字符,我正在阅读的文件的输出没问题,我在 cout << data 中检查了这个,这正是我所期望的。 Any ideas?有任何想法吗?

Here is the solution I think you are looking for:这是我认为您正在寻找的解决方案:

int main() {
   string data = load_file(); // Returns a string of words

   istringstream iss(data);

   while(iss)
   {
      string tok;
      iss >> tok;
      cout << "token: " << tok << endl;
      //you can do what ever you want with the token here

   }
}

Have a look at this, it should help you.看看这个,应该对你有帮助。

main.cpp主程序

#include "stdafx.h"
#include "Utility.h"

int main() {
    using namespace util;

    std::string fileName( "sample.txt" );

    if ( fileName.empty() ) {
        std::cout << "Missing or invalid filename." << std::endl;
        return RETURN_ERROR;
    }

    std::string line;
    std::vector<std::string> results;
    std::fstream fin;

    // Try To Open File For Reading
    fin.open( fileName.c_str(), std::ios_base::in );
    if ( !fin.is_open() ) {
        std::cout << "Can not open file(" << fileName << ") for reading." << std::endl;
        return RETURN_ERROR;
    }

    // Read Line By Line To Get Data Contents Store Into String To Be Parsed
    while ( !fin.eof() ) {
        std::getline( fin, line );
        // Parse Each Line Using Space Character As Delimiter
        results = Utility::splitString( line, " " );

        // Print The Results On Each Iteration Of This While Loop 
        // This Is Where You Would Parse The Data Or Store Results Into
        // Class Objects, Variables Or Structures.
        for ( unsigned u = 0; u < results.size(); u++ ) {
            std::cout << results[u] << " ";
        }
        std::cout << std::endl;
    }

    // Close File Pointer
    fin.close();

    // Now Print The Full Vector Of Results - This Is To Show You That Each
    // New Line Will Be Overwritten And That Only The Last Line Of The File Will
    // Be Stored After The While Loop.
    std::cout << "\n-------------------------------------\n";
    for ( unsigned u = 0; u < results.size(); u++ ) {
        std::cout << results[u] << " ";
    }

    Utility::pressAnyKeyToQuit();
    return RETURN_OK;
} // main

sample.txt样本.txt

Please help me parse this text file
It spans multiple lines of text
I would like to get each individual word

stdafx.h - Some of these include files may not be needed they are here for I have a larger solution that requires them. stdafx.h - 其中一些包含文件可能不需要,因为我有一个更大的解决方案需要它们。

#ifndef STDAFX_H
#define STDAFX_H

#include <Windows.h>

#include <stdio.h>
#include <tchar.h>
#include <conio.h>

#include <string>
#include <sstream>
#include <fstream>
#include <iostream>
#include <iomanip>
#include <vector>
#include <array>
#include <memory>

#include <queue>
#include <functional>

#include <algorithm>

// User Application Specific
// #include "ExceptionHandler.h" - One Of My Class Objects Not Used Here

namespace util {

enum ReturnCode {
    RETURN_OK = 0,
    RETURN_ERROR = 1,
}; // ReturnCode

extern const unsigned INVALID_UNSIGNED;
extern const unsigned INVALID_UNSIGNED_SHORT;

} // namespace util

#endif // STDAFX_H

stdafx.cpp标准文件

#include "stdafx.h"

namespace util {

const unsigned INVALID_UNSIGNED = static_cast<const unsigned>( -1 );
const unsigned INVALID_UNSIGNED_SHORT = static_cast<const unsigned short>( -1 );

} // namespace util

Utility.h实用程序.h

#ifndef UTILITY_H
#define UTILITY_H

namespace util {

class Utility {
public:

    static void pressAnyKeyToQuit();

    static std::string  toUpper(const std::string& str);
    static std::string  toLower(const std::string& str);
    static std::string  trim(const std::string& str, const std::string elementsToTrim = " \t\n\r");

    static unsigned     convertToUnsigned(const std::string& str);
    static int          convertToInt(const std::string& str);
    static float        convertToFloat(const std::string& str);

    static std::vector<std::string> splitString(const std::string& strStringToSplit, const std::string& strDelimiter, const bool keepEmpty = true);

private:
    Utility(); // Private - Not A Class Object
    Utility(const Utility& c); // Not Implemented
    Utility& operator=(const Utility& c); // Not Implemented

    template<typename T>
    static bool stringToValue(const std::string& str, T* pValue, unsigned uNumValues);

    template<typename T>
    static T getValue(const std::string& str, std::size_t& remainder);

}; // Utility

#include "Utility.inl"

} // namespace util

#endif // UTILITY_H

Utility.inl实用程序文件

// ----------------------------------------------------------------------------
// stringToValue()
template<typename T>
static bool Utility::stringToValue(const std::string& str, T* pValue, unsigned uNumValues) {
    int numCommas = std::count(str.begin(), str.end(), ',');
    if (numCommas != uNumValues - 1) {
        return false;
    }

    std::size_t remainder;
    pValue[0] = getValue<T>(str, remainder);

    if (uNumValues == 1) {
        if (str.size() != remainder) {
            return false;
        }
    }
    else {
        std::size_t offset = remainder;
        if (str.at(offset) != ',') {
            return false;
        }

        unsigned uLastIdx = uNumValues - 1;
        for (unsigned u = 1; u < uNumValues; ++u) {
            pValue[u] = getValue<T>(str.substr(++offset), remainder);
            offset += remainder;
            if ((u < uLastIdx && str.at(offset) != ',') ||
                (u == uLastIdx && offset != str.size()))
            {
                return false;
            }
        }
    }
    return true;
} // stringToValue

Utility.cpp实用程序.cpp

#include "stdafx.h"
#include "Utility.h"

namespace util {

// ----------------------------------------------------------------------------
// pressAnyKeyToQuit()
void Utility::pressAnyKeyToQuit() {
    std::cout << "\nPress any key to quit" << std::endl;
    _getch();
} // pressAnyKeyToQuit

// ----------------------------------------------------------------------------
// toUpper()
std::string Utility::toUpper( const std::string& str ) {
    std::string result = str;
    std::transform( str.begin(), str.end(), result.begin(), ::toupper );
    return result;
} // toUpper

// ----------------------------------------------------------------------------
// toLower()
std::string Utility::toLower( const std::string& str ) {
    std::string result = str;
    std::transform( str.begin(), str.end(), result.begin(), ::tolower );
    return result;
} // toLower

// ----------------------------------------------------------------------------
// trim()
// Removes Elements To Trim From Left And Right Side Of The str
std::string Utility::trim( const std::string& str, const std::string elementsToTrim ) {
    std::basic_string<char>::size_type firstIndex = str.find_first_not_of( elementsToTrim );
    if ( firstIndex == std::string::npos ) {
        return std::string(); // Nothing Left
    }

    std::basic_string<char>::size_type lastIndex = str.find_last_not_of( elementsToTrim );
    return str.substr( firstIndex, lastIndex - firstIndex + 1 );
} // trim

// ----------------------------------------------------------------------------
// getValue()
template<>
float Utility::getValue( const std::string& str, std::size_t& remainder ) {
    return std::stof( str, &remainder );
} // getValue <float>

// ----------------------------------------------------------------------------
// getValue()
template<>
int Utility::getValue( const std::string& str, std::size_t& remainder ) {
    return std::stoi( str, &remainder );
} // getValue <int>

// ----------------------------------------------------------------------------
// getValue()
template<>
unsigned Utility::getValue( const std::string& str, std::size_t& remainder ) {
    return std::stoul( str, &remainder );
} // getValue <unsigned>

// ----------------------------------------------------------------------------
// convertToUnsigned()
unsigned Utility::convertToUnsigned( const std::string& str ) {
    unsigned u = 0;
    if ( !stringToValue( str, &u, 1 ) ) {
        std::ostringstream strStream;
        strStream << __FUNCTION__ << " Bad conversion of [" << str << "] to unsigned";
        throw strStream.str();
    }
    return u;
} // convertToUnsigned

// ----------------------------------------------------------------------------
// convertToInt()
int Utility::convertToInt( const std::string& str ) {
    int i = 0;
    if ( !stringToValue( str, &i, 1 ) ) {
        std::ostringstream strStream;
        strStream << __FUNCTION__ << " Bad conversion of [" << str << "] to int";
        throw strStream.str();
    }
    return i;
} // convertToInt

// ----------------------------------------------------------------------------
// convertToFloat()
float Utility::convertToFloat(const std::string& str) {
    float f = 0;
    if (!stringToValue(str, &f, 1)) {
        std::ostringstream strStream;
        strStream << __FUNCTION__ << " Bad conversion of [" << str << "] to float";
        throw strStream.str();
    }
    return f;
} // convertToFloat

// ----------------------------------------------------------------------------
// splitString()
std::vector<std::string> Utility::splitString( const std::string& strStringToSplit, const std::string& strDelimiter, const bool keepEmpty ) {
    std::vector<std::string> vResult;
    if ( strDelimiter.empty() ) {
        vResult.push_back( strStringToSplit );
        return vResult;
    }

    std::string::const_iterator itSubStrStart = strStringToSplit.begin(), itSubStrEnd;
    while ( true ) {
        itSubStrEnd = search( itSubStrStart, strStringToSplit.end(), strDelimiter.begin(), strDelimiter.end() );
        std::string strTemp( itSubStrStart, itSubStrEnd );
        if ( keepEmpty || !strTemp.empty() ) {
            vResult.push_back( strTemp );
        }

        if ( itSubStrEnd == strStringToSplit.end() ) {
            break;
        }

        itSubStrStart = itSubStrEnd + strDelimiter.size();
    }

    return vResult;

} // splitString

} // namspace util

In my small utility library I have a function that will split a string that can use any delimiter that the user defines.在我的小型实用程序库中,我有一个函数可以拆分可以使用用户定义的任何分隔符的字符串。 It will search for the first occurrence of that character delimiter and it will save everything before it into a string and it will push that string into a vector of strings, and it will continue this for every occurrence of that character until it is finished parsing the full string that is passed to it.它将搜索该字符定界符的第一次出现,并将其之前的所有内容保存到一个字符串中,并将该字符串推送到一个字符串向量中,并且它将在每次出现该字符时继续执行此操作,直到完成解析传递给它的完整字符串。 It will then return a vector of strings back to the user.然后它会将一个字符串向量返回给用户。 This is very helpful when engaged in parsing text files or even just data types with long strings that need to be broken down.这在解析文本文件或什至只是需要分解的长字符串数据类型时非常有用。 Now if there is a case where you are parsing a text file and lets say you need to have more than one word as a single string, this can be done but requires more work on your part.现在,如果您正在解析一个文本文件,并且假设您需要将多个单词作为单个字符串,则可以这样做,但需要您做更多的工作。 For example a text file might have personal record on a single line.例如,文本文件可能在一行中包含个人记录。

LastName, FirstName MiddleInitial Age Phone# Address
Cook, John S 33 1-888-323-4545 324 Complex Avenue

And you would want the 324 Complex Avenue to be in a single string also you don't want the comma stored after the last name.并且您希望 324 Complex Avenue 位于单个字符串中,并且您也不希望在姓氏之后存储逗号。 Your structure in code to store this info might look like this:用于存储此信息的代码结构可能如下所示:

struct PersonalRecord {
    std::string firstName;
    std::string lastName;
    char middleInitial;
    unsigned age;
    std::string phoneNumber;
    std:string address;
};

What you would have to do is after you read this line in from your file on that same iteration of the while loop is you would have to do multiple parsing.您需要做的是在 while 循环的同一次迭代中从您的文件中读取此行之后,您将不得不进行多次解析。

You would first start by using a temporary string and vector of strings and use the utility function splitString with the delimeter being the comma.您将首先使用临时字符串和字符串向量开始,然后使用实用函数 splitString 并将分隔符作为逗号。 So this would save 2 strings in the temp vector of strings the first being: Cook and the second being the rest of the line after the comma including the leading space.因此,这将在字符串的临时向量中保存 2 个字符串,第一个是:Cook,第二个是逗号之后的其余行,包括前导空格。 The reason you have the temp string and temp vector of strings is that you will need to pop values at when needed.您拥有临时字符串和临时字符串向量的原因是您需要在需要时弹出值。 So in this case we would have to do the following, but first how do we resolve the case with multiple words to one string?因此,在这种情况下,我们必须执行以下操作,但首先我们如何将多个单词解析为一个字符串的情况? We can change the line of text in the text file to be enclosed with double quotes as such:我们可以将文本文件中的文本行更改为用双引号括起来,如下所示:

textfile文本文件

Cook, John S 33 1-888-323-4545 "324 Complex Avenue"
Evens, Sue A 24 1-888-323-6996 "128 Mission Rd"
Adams, Chris B 49 1-777-293-8234 "2304 Helms Drive"

Then parse it with this logic flow or algorithm.然后用这个逻辑流程或算法解析它。

main.cpp主程序

#including "stdafx.h"
#including "Utility.h"

int main() {
    using namespace util;

    std::string strFilename( "personalRecord.txt" );
    std::ifstream file; 

    std::string strLine;
    std::vector<std::string> vTemp;
    std::vector<std::string> vResult;

    std::vector<PersonalRecord> vData;

    // Open File For Reading
    file.open( strFilename.c_str() );
    // Check For Error Of Opening File
    if ( !file.is_open() ) {
        std::cout << "Error opening file (" << strFilename << ")" << std::endl;
        return RETURN_ERROR;
    }

    // Continue Until End Of File
    while( !file.eof() ) {
        // Get Single Full Line Save To String
        std::getline( file, strLine );

        // Check For Comma
        vTemp = Utility::splitString( strLine, ",");
        // Save First String For Laster
        std::string lastName = vTemp[0];

        // Split String Using A Double Quote Delimiter Delimiter
        vTemp = Utility::splitString( vTemp[1], "\"" );

        // Check To See If vTemp Has More Than One String
        if ( vTemp.size() > 1 ) {
            // We Need To Use Pop Back To Account For Last Double Quote
            vTemp.pop_back(); // Remove Last Double Quote
            std::string temp = vTemp.back(); 
            vTemp.pop_back(); // Remove Wanted String From vTemp.

            // At This Point We Need To Parse vTemp Again Using Space Delimiter
            vResult = Utility::splitString( vTemp[0], " " );

            // Need To Account For Leading Space In Vector
            vResult[0].erase();
            // Need To Account For Last Space In Vector
            vResult.pop_back();

            // Now We Can Push Our Last String Back Into vResult
            vResult.push_back( temp );

            // Replace The First String " " With Our LastName
            vResult[0] = lastName;

        } else if ( vTemp.size() == 1 ) {
            // Just Parse vTemp Using Space Delimiter
            vResult = Utility::splitString( vTemp[0], " " );
        }


        // Print Out Results For Validity
        for ( unsigned u = 0; u < vResult.size(); u++) {
            std::cout << vResult.at(u) << " ";
        }
        std::cout << std::endl;

        // Here Is Where You Would Populate Your Variables, Structures Or Classes On Each Pass Of The While Loop.
        // With This Structure There Should Only Be 8 Entries Into Our vResult
        PersonalRecord temp;
        temp.lastName      = vResult[0];
        temp.firstName     = vResult[1];
        temp.middleInitial = vResult[2][0];
        temp.age           = Utility::convertToUnsigned( vResult[3] );
        temp.phoneNumber   = vResult[4];
        temp.address       = vResult[5];

        vData.push_back( temp );
    } // while

    // Close File
    file.close();

    std::cout << std::endl << std::endl;

    // Print Using Structure For Validity
    std::cout << "---------------------------------------\n";
    for ( unsigned u = 0; u < vData.size(); u++ ) {
        std::cout << vData[u].lastName << " " 
                  << vData[u].firstName << " "
                  << vData[u].middleInitial << " "
                  << vData[u].age << " "
                  << vData[u].phoneNumber << " "
                  << vData[u].address << std::endl;
    }

    Utility::pressAnyKeyToQuit();
    return RETURN_OK;
} // main

So both consideration and are has to be taken when parsing text or strings.因此,在解析文本或字符串时,必须同时考虑 和 are。 You have to account for every single character including your carriage returns, spaces etc. So the format that the text file is written in has to be considered.您必须考虑每个字符,包括回车、空格等。因此必须考虑写入文本文件的格式。 Yes the splitString() will also parse tabs, you would just have to use "\\t" for tabs, etc. Just remember that it will make a split at every occurrence.是的splitString()也将解析制表符,您只需要对制表符使用 "\\t" 等。请记住,它会在每次出现时进行拆分。 So if you have a sentence that has a colon ":" in it, but then you decide to use the colon as your delimiter between values, it will split that sentence as well.因此,如果您的句子中有一个冒号“:”,但是您决定使用冒号作为值之间的分隔符,它也会拆分该句子。 Now you could have different rules for each line of text from the file and if you know what line you are on you can parse each line accordingly.现在,您可以为文件中的每一行文本制定不同的规则,如果您知道自己在哪一行,则可以相应地解析每一行。 This is why most people prefer to write their code to read and parse binary, because it is much easier to program, then writing a text parser.这就是为什么大多数人更喜欢编写他们的代码来读取和解析二进制文件,因为它比编写文本解析器更容易编程。

I chose to use the PersonalRecord structure to show you how you can extract strings from a line of text and to convert them to basic types such as int, float or double by using some of my other functions in my Utility class.我选择使用 PersonalRecord 结构向您展示如何从一行文本中提取字符串,并通过使用我的 Utility 类中的一些其他函数将它们转换为基本类型,例如 int、float 或 double。 All methods in this class are declared as static and the constructor is private, so the class name acts as a wrapper or a namespace so to speak.这个类中的所有方法都声明为静态的,构造函数是私有的,因此类名可以作为包装器或命名空间。 You can not create an instance of a Utility util; // invalid object您不能创建Utility util; // invalid object的实例Utility util; // invalid object Utility util; // invalid object . Utility util; // invalid object Just include the header file and use the class name with the scope resolution operator :: to access any of the functions and make sure you are using the namespace util .只需包含头文件并使用类名和范围解析运算符::来访问任何函数,并确保您使用的是namespace util

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM