简体   繁体   English

无法忽略文本文件流中的转义字符并存储在 C++ 中的 wchar_t [ ] 中

[英]Unable to ignore the escape characters from a text file stream & store in a wchar_t [ ] in C++

I am trying to read data from a text file using C++ & store the strings at each line into wchar_t [] or LPCWSTR.我正在尝试使用 C++ 从文本文件中读取数据并将每一行的字符串存储到 wchar_t [] 或 LPCWSTR 中。 (These 2 datatypes are the constraints of the application on which I am working. That's why I have to store the data in these datatypes) (这 2 种数据类型是我正在处理的应用程序的约束。这就是我必须将数据存储在这些数据类型中的原因)

The format of data in the .txt file is, for example: .txt 文件中的数据格式例如:

abc\\def\\ghi 10
jkl\\mnopq\\rstq 20
aqq\\sdsds\\qc 30

I am trying to read data line by line & save each line as a map's key-value pair, where key is of type LPCWSTR or wchar_t[] type & value is of int type There is no issue in extracting int, but the issue comes in reading the strings我正在尝试逐行读取数据并将每一行保存为地图的键值对,其中键的类型为 LPCWSTR 或 wchar_t[] 类型和值的类型为 int 提取 int 没有问题,但问题来了在读取字符串时

Here is my code:这是我的代码:

#include<iostream>

#include<fstream>
#include<windows.h>
#include<cstdlib>

using namespace std;

int main()
{
    wchar_t test1[260];
    const char* s = "Hello\\ABC\\DEF";
    mbstowcs(test1, s, strlen(s));
    wcout<<test1<<endl;


    wchar_t gr[260];
    string gr_temp;
    int percentage;

    ifstream ifs;
    ifs.open("data.txt", ifstream::in);
    if (ifs.is_open()) {
        while (ifs >> gr_temp >> percentage){

            const char* source = gr_temp.c_str();
            mbstowcs(gr, source, strlen(source));

            wcout<<gr<<L" ";
            cout<<percentage<<endl;

        }
        ifs.close();
    }

    return 0;
}

However, it is giving the following output:但是,它提供以下输出:

Hello\ABC\DEFa
abc\\def\\ghi 10
jkl\\mnopq\\rstq 20
aqq\\sdsds\\qc 30
  1. I did not understand why that tiny 'a' appeared out of nowhere in the first line of output我不明白为什么那个小小的“a”突然出现在第一行输出中

  2. I want the code to instead automatically process those double slashes, ie I want the output as:我希望代码改为自动处理那些双斜杠,即我希望输出为:

     Hello\\ABC\\DEF abc\\def\\ghi 10 jkl\\mnopq\\rstq 20 aqq\\sdsds\\qc 30
  3. It would be even best if I could instead write the entries in the .txt file without double slashes & they get automatically processed without checking for any escape sequences.如果我可以在 .txt 文件中写入没有双斜杠的条目,并且它们会在不检查任何转义序列的情况下自动处理,那就更好了。 However, since the issue as in point no.但是,由于问题没有点。 1) above is there, so I am not sure if it is even possible 1)上面有,所以我不确定它是否可能

  4. Even if add cout<<gr_temp<<endl;即使添加cout<<gr_temp<<endl; as the first line in the while loop, even that also outputs the string with double backward slashes.作为 while 循环中的第一行,即使这样也会输出带有双反斜杠的字符串。

What am I missing or doing wrong?我错过了什么或做错了什么?

Update:更新:

Also, when I add these key-value pairs to a std::map<LPCWSTR,int> m1 using the statement m1[gr] = percentage;此外,当我使用语句m1[gr] = percentage;将这些键值对添加到std::map<LPCWSTR,int> m1 m1[gr] = percentage; at the end of each while loop, then with the print statement, it only shows one single element in the map.在每个 while 循环结束时,然后使用 print 语句,它只显示地图中的一个元素。

My updated code is:我更新的代码是:

#include<iostream>

#include<fstream>
#include<windows.h>
#include<cstdlib>
#include<map>

using namespace std;

std::unordered_map<LPCWSTR, int>        m1;

int main()
{
    wchar_t test1[260];
    const char* s = "Hello\\ABC\\DEF";
    mbstowcs(test1, s, strlen(s));
    wcout<<test1<<endl;


    wchar_t gr[260];
    string gr_temp;
    int percentage;

    ifstream ifs;
    ifs.open("data.txt", ifstream::in);
    if (ifs.is_open()) {
        while (ifs >> gr_temp >> percentage){

            const char* source = gr_temp.c_str();
            mbstowcs(gr, source, strlen(source));
            
            m1[gr] = percentage;

        }
        ifs.close();
    }

    for (auto i = m1.begin(); i != m1.end(); i++) {
        wcout<< i->first << L" ";
        cout<< i->second << endl;
    }

    return 0;
}

This code is only adding 1 element in the map & that is the most recent added element.此代码仅在地图中添加 1 个元素,这是最近添加的元素。

I edited the code to use unordered_map, but still the same issue.我编辑了代码以使用 unordered_map,但仍然是同样的问题。

I further tried to print the size() of the map.我进一步尝试打印地图的 size()。 In both these cases, size of map m1 was displayed as 1.在这两种情况下,地图 m1 的大小都显示为 1。

Miles Budnek already stated your problems. Miles Budnek 已经说明了您的问题。

If you look at the documentation of your function ( http://www.cplusplus.com/reference/cstdlib/mbstowcs/ ), you will see that the third parameter does not expect the number of bytes to translate to wchar_t, but much rather the maximum number of characters the buffer you are pointing to can hold.如果您查看函数的文档( http://www.cplusplus.com/reference/cstdlib/mbstowcs/ ),您将看到第三个参数不希望将字节数转换为 wchar_t,而是希望将其转换为 wchar_t您指向的缓冲区可以容纳的最大字符数。

It will stop once it finds a \\0 (which just happens to be what strlen is also looking for).一旦找到 \\0 (这恰好是 strlen 也在寻找),它就会停止。

So just replace the third parameter of your first mbstowcs call with 260 (or sizeof(test1)/sizeof(wchar_t) and you're good on that stray 'a'.因此,只需将您的第一个 mbstowcs 调用的第三个参数替换为 260(或sizeof(test1)/sizeof(wchar_t) ,您就可以处理那个流浪的“a”。

As has also already been stated, there are no 'escape parameters' while reading from a file.如前所述,从文件读取时没有“转义参数”。 These only exist in source code and represent ASCII codes you cannot type.这些仅存在于源代码中,代表您无法键入的 ASCII 代码。 ( https://www.asciitable.com/ ) ( https://www.asciitable.com/ )

\\n for example represents the codesign for 'new line' 0x0A. \\n 例如代表“新行”0x0A 的代码。

So escaping the backslashes in the file is unnecessary and can be skipped.所以转义文件中的反斜杠是不必要的,可以跳过。

If you know that your input file will have 'double backslashes' and need to 'unescape' them, you could look at the std::string functions 'find' and 'replace'.如果您知道您的输入文件将有“双反斜杠”并且需要“取消转义”它们,您可以查看 std::string 函数“查找”和“替换”。

Find "\\\\\\\\" (two backslashes in a row) and replace with "\\\\" .找到"\\\\\\\\" (连续两个反斜杠)并替换为"\\\\"

In response to your updated question (which is basically another question): The problem is the key you chose for the map.回应您更新的问题(这基本上是另一个问题):问题是您为地图选择的键。 Each map, unordered or not, requires unique keys and in your scenario, you keep using the same key.每个地图,无论是否无序,都需要唯一的键,并且在您的场景中,您一直使用相同的键。

LPCWSTR expands to 'Pointer to Wide Char String', so while you probably think you are using 'abc\\def\\ghi' as key, you are actually using &gr[0], which remains the same during all iterations. LPCWSTR扩展为“指向宽字符字符串的指针”,因此虽然您可能认为您使用的是“abc\\def\\ghi”作为键,但实际上您使用的是 &gr[0],它在所有迭代中都保持不变。

As an additional result, once the program leaves the scope of gr , its content becomes invalid and accessing the map (which maintains the pointer but not the content), will access freed memory which tends to crash your program.作为额外的结果,一旦程序离开gr的范围,它的内容就会变得无效并且访问映射(它维护指针而不是内容)将访问释放的内存,这往往会使程序崩溃。

The solution as such is simple enough though: You need to use the content as key, instead of the pointer, for example by using a container object like std::wstring .不过,这样的解决方案很简单:您需要使用内容作为键,而不是指针,例如通过使用像std::wstring这样的容器对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM