简体   繁体   中英

Unable to ignore the escape characters from a text file stream & store in a wchar_t [ ] in C++

I am trying to read data from a text file using C++ & store the strings at each line into wchar_t [] or LPCWSTR. (These 2 datatypes are the constraints of the application on which I am working. That's why I have to store the data in these datatypes)

The format of data in the .txt file is, for example:

abc\\def\\ghi 10
jkl\\mnopq\\rstq 20
aqq\\sdsds\\qc 30

I am trying to read data line by line & save each line as a map's key-value pair, where key is of type LPCWSTR or wchar_t[] type & value is of int type There is no issue in extracting int, but the issue comes in reading the strings

Here is my code:

#include<iostream>

#include<fstream>
#include<windows.h>
#include<cstdlib>

using namespace std;

int main()
{
    wchar_t test1[260];
    const char* s = "Hello\\ABC\\DEF";
    mbstowcs(test1, s, strlen(s));
    wcout<<test1<<endl;


    wchar_t gr[260];
    string gr_temp;
    int percentage;

    ifstream ifs;
    ifs.open("data.txt", ifstream::in);
    if (ifs.is_open()) {
        while (ifs >> gr_temp >> percentage){

            const char* source = gr_temp.c_str();
            mbstowcs(gr, source, strlen(source));

            wcout<<gr<<L" ";
            cout<<percentage<<endl;

        }
        ifs.close();
    }

    return 0;
}

However, it is giving the following output:

Hello\ABC\DEFa
abc\\def\\ghi 10
jkl\\mnopq\\rstq 20
aqq\\sdsds\\qc 30
  1. I did not understand why that tiny 'a' appeared out of nowhere in the first line of output

  2. I want the code to instead automatically process those double slashes, ie I want the output as:

     Hello\\ABC\\DEF abc\\def\\ghi 10 jkl\\mnopq\\rstq 20 aqq\\sdsds\\qc 30
  3. It would be even best if I could instead write the entries in the .txt file without double slashes & they get automatically processed without checking for any escape sequences. However, since the issue as in point no. 1) above is there, so I am not sure if it is even possible

  4. Even if add cout<<gr_temp<<endl; as the first line in the while loop, even that also outputs the string with double backward slashes.

What am I missing or doing wrong?

Update:

Also, when I add these key-value pairs to a std::map<LPCWSTR,int> m1 using the statement m1[gr] = percentage; at the end of each while loop, then with the print statement, it only shows one single element in the map.

My updated code is:

#include<iostream>

#include<fstream>
#include<windows.h>
#include<cstdlib>
#include<map>

using namespace std;

std::unordered_map<LPCWSTR, int>        m1;

int main()
{
    wchar_t test1[260];
    const char* s = "Hello\\ABC\\DEF";
    mbstowcs(test1, s, strlen(s));
    wcout<<test1<<endl;


    wchar_t gr[260];
    string gr_temp;
    int percentage;

    ifstream ifs;
    ifs.open("data.txt", ifstream::in);
    if (ifs.is_open()) {
        while (ifs >> gr_temp >> percentage){

            const char* source = gr_temp.c_str();
            mbstowcs(gr, source, strlen(source));
            
            m1[gr] = percentage;

        }
        ifs.close();
    }

    for (auto i = m1.begin(); i != m1.end(); i++) {
        wcout<< i->first << L" ";
        cout<< i->second << endl;
    }

    return 0;
}

This code is only adding 1 element in the map & that is the most recent added element.

I edited the code to use unordered_map, but still the same issue.

I further tried to print the size() of the map. In both these cases, size of map m1 was displayed as 1.

Miles Budnek already stated your problems.

If you look at the documentation of your function ( http://www.cplusplus.com/reference/cstdlib/mbstowcs/ ), you will see that the third parameter does not expect the number of bytes to translate to wchar_t, but much rather the maximum number of characters the buffer you are pointing to can hold.

It will stop once it finds a \\0 (which just happens to be what strlen is also looking for).

So just replace the third parameter of your first mbstowcs call with 260 (or sizeof(test1)/sizeof(wchar_t) and you're good on that stray 'a'.

As has also already been stated, there are no 'escape parameters' while reading from a file. These only exist in source code and represent ASCII codes you cannot type. ( https://www.asciitable.com/ )

\\n for example represents the codesign for 'new line' 0x0A.

So escaping the backslashes in the file is unnecessary and can be skipped.

If you know that your input file will have 'double backslashes' and need to 'unescape' them, you could look at the std::string functions 'find' and 'replace'.

Find "\\\\\\\\" (two backslashes in a row) and replace with "\\\\" .

In response to your updated question (which is basically another question): The problem is the key you chose for the map. Each map, unordered or not, requires unique keys and in your scenario, you keep using the same key.

LPCWSTR expands to 'Pointer to Wide Char String', so while you probably think you are using 'abc\\def\\ghi' as key, you are actually using &gr[0], which remains the same during all iterations.

As an additional result, once the program leaves the scope of gr , its content becomes invalid and accessing the map (which maintains the pointer but not the content), will access freed memory which tends to crash your program.

The solution as such is simple enough though: You need to use the content as key, instead of the pointer, for example by using a container object like std::wstring .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM