简体   繁体   English

在C中从char数组创建一个字符串

[英]Create a string from char array in C

I have a piece of code that loops through a char array string to try and detect words. 我有一段代码遍历char数组字符串以尝试检测单词。 It loops through and if the detects A - Z or a - z or an _ (underscore) it will add it to a char array. 它循环遍历,如果检测到A-Z或-z或_(下划线),它将把它添加到char数组中。 What I need, because they're words, is to be able to put them into a string which I can then use another function to check and then can be discarded. 因为它们是单词,所以我需要的是能够将它们放入字符串中,然后可以使用另一个函数进行检查然后将其丢弃。 This is my function: 这是我的功能:

char wholeProgramStr2[20000];
char wordToCheck[100] ="";

IdentiferFinder(char *tmp){
    //find the identifiers
    int count = 0;
    int i;
    for (i = 0; i < strlen(tmp); ++i){
        Ascii = toascii(tmp[i]);
        if ((Ascii >= 65 && Ascii <= 90) || (Ascii >= 97 && Ascii <= 122) || (Ascii == 95))
        {
            wordToCheck[i] = tmp[i];
            count++;
            printf("%c",wordToCheck[i]); 
        }
        else {
            if (count != 0){
            printf("\n");
        }
            count = 0;
        }
    }
    printf("\n");
}

At the moment I can see all of the words because it prints them out on separate lines. 目前,我可以看到所有单词,因为它将它们打印在单独的行上。

the content of WholeProgram2 is whatever all the lines are of the file. WholeProgram2的内容是文件中所有行的全部内容。 and it is the *tmp argument. 这是* tmp参数。

Thank you. 谢谢。

You describe breaking apart a big string, into little strings (words). 您描述了将一个大字符串分解成几个小字符串(单词)。
Assuming you are using normal delimiters to parse, such as spaces or tabs or newlines: 假设您使用常规定界符来进行分析,例如空格,制表符或换行符:

Here is a three step approach : 这是一个三步方法
First , get information about your source string. 首先 ,获取有关您的源字符串的信息。
Second , create your target array dynamically to fit your size needs 其次 ,动态创建目标数组以适应您的大小需求
Third , loop on strtok() to populate your target array of strings (char **) 第三 ,在strtok()上循环以填充目标字符串数组(char **)

(A forth would be to free memory created, which you will need to do) (第四种方法是释放创建的内存,这是您需要做的)
hint: the prototype could look like this: 提示:原型可能如下所示:
// void Free2DCharArray(char **a, int numWords); // void Free2DCharArray(char ** a,int numWords);

Code example: 代码示例:

void FindWords(char **words, char *source);
void GetStringParams(char *source, int *longest, int *wordCount);
char ** Create2DCharArray(char **a, int numWords, int maxWordLen);
#define DELIM " \n\t"

int main(void)
{
    int longestWord = 0, WordCount = 0;
    char **words={0};
    char string[]="this is a bunch of test words";

    //Get number of words, and longest word, use in allocating memory
    GetStringParams(string, &longestWord, &WordCount);

    //create array of strings with information from source string
    words = Create2DCharArray(words, WordCount, longestWord);

    //populate array of strings with words
    FindWords(words, string);

    //Do not forget to free words (left for you to do)
    return 0;   
}

void GetStringParams(char *source, int *longest, int *wordCount)
{
    char *tok;
    int i=-1, Len = 0, KeepLen = 0;
    char *cpyString = 0;
    cpyString = calloc(strlen(source)+1, 1);
    strcpy(cpyString, source);
    tok=strtok(source, DELIM);
    while(tok)
    {
        (*wordCount)++;
        Len = strlen(tok);
        if(Len > KeepLen) KeepLen = Len;
        tok = strtok(NULL, DELIM);
    }
    *longest = KeepLen;
    strcpy(source, cpyString);//restore contents of source
}

void FindWords(char **words, char *source)             
{
    char *tok;
    int i=-1;

    tok = strtok(source, DELIM);
    while(tok)
    {
        strcpy(words[++i], tok);
        tok = strtok(NULL, DELIM);
    }
}

char ** Create2DCharArray(char **a, int numWords, int maxWordLen)
{
    int i;
    a = calloc(numWords, sizeof(char *));
    if(!a) return a;
    for(i=0;i<numWords;i++)
    {
        a[i] = calloc(maxWordLen + 1, 1);       
    }
    return a;
}

If your goal is to look for words in an array of chars, you probably want to first find a valid sequence of character (and you seem to be trying to do that), and once you've found one , do that secondary check to know if it is a real word. 如果你的目标是寻找单词字符数组,你可能想先找到字符的有效序列(和你似乎是试图做到这一点),一旦你找到了一个 ,这样做二次检查知道这是一个真实的词。 If it is indeed a word, you may then decide to keep it for further usage. 如果确实是一个单词,则可以决定保留它以备将来使用。

The advantage of this approach is that you don't need to keep a large buffer of potential words, you only need a fixed one, of size matching the largest word in your dictionary. 这种方法的优点是您不需要保留大量可能的单词,而只需要固定一个大小与字典中最大单词匹配的单词即可。 In fact, you might not even need a buffer, but just a pointer sliding along the char array, pointing at the start of a possible word, and an int (though a byte might suffice) to keep track of the length of that word. 实际上,您甚至可能不需要缓冲区,而只是一个沿着char数组滑动的指针,指向一个可能的单词的开头,并使用一个int(尽管一个字节可能就足够了)来跟踪该单词的长度。

// structure to store a word match in array
typedef struct token_s {
  int length;
  const char *data;
} token_t;

void nextToken(const char *tmp, int len, token_t *to){
  char *start = NULL;
  while (len){
    if (start) {
      // search for end of current word
      if (!isalpha(*tmp)) {
        to->data = start;
        to->length = tmp - start;
        return;
      }
    } else { 
      // search for beginning of next word
      if (isalpha(*tmp))
        start = tmp;
    }
    tmp++;
    len--;
  } // while
  if (start) {
    to->data = start;
    to->length = tmp - start;  
  }
}

Simply pass: 只需通过:

  • the start of your char array, or to->data + to->length + 1 if it's not beyond the end of the array char数组的开始,如果不超出数组的末尾,则为to->data + to->length + 1
  • the raining length of the char array to scan 要扫描的char数组的长度
  • a pointer to a zeroed token_t 指向零token_t的指针

to each call to nextToken , and check the token's content to know if it found a candidate; 每次调用nextToken ,并检查令牌的内容以了解是否找到了候选对象; if it didn't, you know that the array has been scanned entirely. 如果没有,您就知道阵列已被完全扫描。

void scanArray(const char *tmp, int len){
  while (len > 0){
    token_t to;
    to.data = NULL;
    to.length =0;
    nextToken(tmp, len, &to);
    if (to.data) {
      tmp += to.length +1;
      len -= to.length +1;     
      // process token here...
    } else break;
  } // while
}

I used isalpha to test for valid characters, but you'll want to replace that by a function of your own. 我使用isalpha来测试有效字符,但是您需要用自己的函数替换它。 And you'll have to insert your own code for that secondary checking in the body of scanArray . 而且,您必须在scanArray主体中插入自己的代码以进行第二次检查。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM