简体   繁体   English

C-wcstok()错误的结果

[英]C - wcstok() wrong results

I got an issue in my programm with one of the functions. 我的程序中有一个功能出现问题。 I've got a text, which consist of sentences. 我有一个包含句子的文本。 In each sentence I need to find symbols '@', '#', '%' and change them to "(at)", "<решетка>", "". 在每个句子中,我需要找到符号'@','#','%',并将其更改为“(at)”,“ <решетка>”,“”。 And I'm doing it using wcstok because I'm working with russian language. 我正在使用wcstok进行此操作,因为我正在使用俄语。 And I've got folowing problem. 而且我有以下问题。

Input: 输入:

He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without tak%ing a fish. 他是一个老人,一个人独自在墨西哥湾流的一条小艇上钓鱼,如今已经走了八十四天而没有钓到一条鱼。 In the first forty days a boy had been with him. 在头四十天内,有一个男孩和他在一起。 But after forty days without a fish the boy's parents had told him that the old man was now definitely and finally sa@lao, which is the worst form of unlucky, and the boy had gone at their orders in another boat which caught three good fis#h the first week. 但是在四十天没吃鱼之后,男孩的父母告诉他,现在肯定是老人了,最后是sa @ lao,这是最不幸的一种形式,男孩顺着他们的命令去了另一条船上,抓住了三个好鱼。 #h第一周。

Output: 输出:

He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without tak<>ing a fish. 他是一个老人,一个人独自在墨西哥湾流的一条小艇上钓鱼,如今已经走了八十四天而没有钓到一条鱼。 In the first forty days a boy had been with him. 在头四十天内,有一个男孩和他在一起。 B(at) (at)f(at)er for(at)yd(at)ys wi(at)ho(at) (at) fish (at)he boy's p(at)ren(at)sh(at)d (at)old him (at)h(at) (at)he old m(at)nw(at)s now defini(at)ely (at)nd fin(at)lly s(at)l(at)o, which is (at)he wors(at) form of (at)nl(at)cky, (at)nd (at)he boy h(at)d gone (at) (at)heir orders in (at)no(at)her bo(at) which c(at)gh(at) (at)hree good fis(at)h (at)he firs(at) week. B(at)(at)f(at)er for(at)yd(at)ys wi(at)ho(at)(at)鱼(at)男孩的p(at)ren(at)sh(at) d(at)old his(at)h(at)(at)he老m(at)nw(at)s defini(at)ely(at)nd fin(at)lly s(at)l(at) o,这是(at)nl(at)cky的(at)形式,(at)nd男孩在(at)的命令中消失了(at)每周(星期五)没有三个小时。

As you can see it changes all letters "a" and "t" to the "(at)". 如您所见,它将所有字母“ a”和“ t”更改为“(at)”。 And I don't understand why this happening. 而且我不明白为什么会这样。 It's the same situation with russian letters. 俄语字母也是如此。 This are two functions, which responsable for this work. 这是两个功能,负责这项工作。

void changeSomeSymbols(Text *text) {
wchar_t atSymbol = L'@';
wchar_t atString[5] = L"(at)";
wchar_t percentSymbol = L'%';
wchar_t percentString[10] = L"<percent>";
wchar_t barsSymbol = L'#';
wchar_t barsString[10] = L"<решетка>";
for (int i = 0; i < text->textSize; i++) {
    for (int j = 0; j < text->sentences[i].sentenceSize; j++) {
        switch (text->sentences[i].symbols[j])
        {
        case L'@':
            changeSentence(&(text->sentences[i]), &atSymbol, atString);
            break;
        case L'#':
            changeSentence(&(text->sentences[i]), &barsSymbol, barsString);
            break;
        case L'%':
            changeSentence(&(text->sentences[i]), &percentSymbol, percentString);
            break;
        default:
            break;
        }
    }
}

} }

void changeSentence(Sentence *sentence, wchar_t *flagSymbol, wchar_t *insertWstr) {
wchar_t *pwc;
wchar_t *newWcsentence;
wchar_t *buffer;
int insertionSize;
int tokenSize;
int newSentenceSize = 0;
insertionSize = wcslen(insertWstr);
newWcsentence = (wchar_t*)malloc(1 * sizeof(wchar_t));
newWcsentence[0] = L'\0';
pwc = wcstok(sentence->symbols, flagSymbol, &buffer);
do {
    tokenSize = wcslen(pwc);
    newWcsentence = (wchar_t*)realloc(newWcsentence, (newSentenceSize + tokenSize + 1) * sizeof(wchar_t));
    newSentenceSize += tokenSize;
    wcscat(newWcsentence, pwc);
    newWcsentence = (wchar_t*)realloc(newWcsentence, (newSentenceSize + insertionSize + 1) * sizeof(wchar_t));
    newSentenceSize += insertionSize;
    wcscat(newWcsentence, insertWstr);
    pwc = wcstok(NULL, flagSymbol, &buffer);
} while (pwc != NULL);
newSentenceSize -= insertionSize;
newWcsentence = (wchar_t*)realloc(newWcsentence, (newSentenceSize) * sizeof(wchar_t));
newWcsentence[newSentenceSize] = '\0';
free(sentence->symbols);
sentence->symbols = (wchar_t*)malloc((newSentenceSize + 1) * sizeof(wchar_t));
wcscpy(sentence->symbols, newWcsentence);
sentence->sentenceSize = newSentenceSize;
free(pwc);
free(newWcsentence);

} }

Text and Sentence are not defined, it's unclear what they are supposed to be. TextSentence没有定义,尚不清楚它们应该是什么。 Just do it in one function. 只需执行一项功能即可。

void realloc_and_copy(wchar_t** dst, int *dstlen, const wchar_t *src)
{
    if(!src)
        return;
    int srclen = wcslen(src);
    *dst = realloc(*dst, (*dstlen + srclen + 1) * sizeof(wchar_t));
    if (*dstlen)
        wcscat(*dst, src);
    else
        wcscpy(*dst, src);
    *dstlen += srclen;
}

int main()
{
    const wchar_t* src = L"He was an old man who fished alone in a skiff \
in the Gulf Stream and he had gone eighty - four days now without tak%ing a fish.\
In the first forty days a boy had been with him.But after forty days without a fish \
the boy’s parents had told him that the old man was now definitely and finally sa@lao, \
which is the worst form of unlucky, and the boy had gone at their orders in another \
boat which caught three good fis#h the first week.";

    wchar_t *buf = wcsdup(src);
    wchar_t *dst = NULL;
    int dstlen = 0;

    wchar_t *context = NULL;
    const wchar_t* delimiter = L"@#%";
    wchar_t *token = wcstok(buf, delimiter, &context);
    while(token)
    {
        const wchar_t* modify = NULL;
        int cursor = token - buf - 1;
        if (cursor >= 0)
            switch(src[cursor])
            {
            case L'@': modify = L"(at)"; break;
            case L'%': modify = L"<percent>"; break;
            case L'#': modify = L"<решетка>"; break;
            }

        //append modified text
        realloc_and_copy(&dst, &dstlen, modify);

        //append token
        realloc_and_copy(&dst, &dstlen, token);

        token = wcstok(NULL, delimiter, &context);
    }

    wprintf(L"%s\n", dst);

    free(buf);
    free(dst);

    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM