从C中的文本文件读取解析数据的最佳方法？

Question

I am working on an assignment that deals with reading data from a text file, and parsing that data to various arrays. 我正在处理一个任务，该任务涉及从文本文件中读取数据，并将该数据解析为各种数组。 For example, a portion of my text file looks as follows: 例如，我的文本文件的一部分如下所示：

arbrick  pts/26       141.219.210.189  Thu Mar 29 11:23 - 11:24  (00:00)    
rjmcnama pts/27       141.219.205.107  Thu Mar 29 11:02   still logged in   
ajhoekst pts/26       99.156.215.40    Thu Mar 29 10:59 - 11:08  (00:08)    
eacarter pts/31       141.219.162.145  Thu Mar 29 10:50 - 10:51  (00:00)    
kmcolema pts/31       141.219.214.128  Thu Mar 29 09:44 - 09:47  (00:03)

I need to parse the data into the following arrays: user id, terminal, ip address, and event times. 我需要将数据解析为以下数组：用户ID，终端，IP地址和事件时间。 How can I do this considering that there isn't a consistant amount of white space between the columns? 考虑到列之间没有一致的空白空间，我该怎么做？

EDIT: I tried using the suggestion that Thiruvalluvar provided, but I just could not get it to work. 编辑：我尝试使用Thiruvalluvar提供的建议，但我无法使其正常工作。 However, I did switch to sscanf and that is working quite well almost ... 但是，我确实切换到了sscanf，并且几乎可以正常工作...

while(!feof(myfile)) {
        fgets(buffer, 256, myfile);
        sscanf(buffer, "%s %s %s %s", user_id[i], terminal_id[i], ip_addr[i], events[i]);
    } /*End while not EOF*/

What is working, is the user_id, terminal_id, and ip_addr arrays. 有效的是user_id，terminal_id和ip_addr数组。 However, the events array isn't working perfectly as of yet. 但是，事件数组到目前为止还无法完美运行。 Since the events array is a string that contains white space, how can I use sscanf to add the remainder of the buffer to the events array? 由于事件数组是一个包含空格的字符串，如何使用sscanf将缓冲区的其余部分添加到事件数组？

Answer 1

I think, the real part of the question is how to strore them in only 4 arrays. 我认为，问题的真正部分是如何仅以4个阵列将它们播种。 Eg: 例如：

arbrick  pts/26       141.219.210.189  Thu Mar 29 11:23 - 11:24  (00:00)

Tokenizing this line with whitespace is goin to give many strings. 用空格对这一行进行标记会带来很多字符串。 But we are only interested in splitting the entire line into only 4 lines, not more than that. 但是我们只想将整个行分为4行，而不是更多。

Solution: 解：

Read the line using fgets() . 使用fgets()读取该行。
Tokenize it using strtok() or strtok_r() (for thread-safe) with whitespace as delimiter. 使用strtok()或strtok_r() （对于线程安全）将其strtok_r() ，并以空格作为分隔符。
Read the 1st 3 strings into the arrays: user_id, terminal_id and ip_address 将第1个3个字符串读入数组：user_id，terminal_id和ip_address

Store ( and append) the rest of strings into the array events . 将其余字符串存储（并追加）到array events 。

 int i = 0; int line_index = 0; char *p; while(...) //loop to read the file { fgets(line); p = strtok(line, " "); i=0; while(p!=NULL) { if(i==0) strcpy(user_id[line_index], p); if(i==1) strcpy(terminal_id[line_index], p); if(i==2) strcpy(ip_addr[line_index], p); else strcat(events[line_index], p); //anything else goes into array events i++; } line_index++; } //end of file-reading loop.

Answer 2

Use fgets to read one line at a time. 使用fgets读取一行。 Operate on the line using sscanf calls to store the information, since the data is not in a consistent form (eg, "still logged in"). 由于数据的格式不一致（例如，“仍处于登录状态”），因此使用sscanf调用在线操作以存储信息。 sscanf will read and discard any whitespace between the format specifiers. sscanf将读取并丢弃格式说明符之间的所有空格。

Answer 3

Try this: 尝试这个：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char** split (char* string, const char* delim) {
char* p;
int i = 0;
char** array;

array = malloc(strlen(string) * sizeof (char*));
p = strtok (string, delim);
while (p != NULL) {
    array[i] = malloc(sizeof(char) );
    array[i++] = p;
    p = strtok(NULL, delim);
}
return array;
}

void parseLine(char *line, char *user, char term[], char ip[], char event[]) {
char *copy = line;
char **array = split(copy, " ");

strcpy(user, *array++);
strcpy(term, *array++);
strcpy(ip, *array++);
array++;array++;array++;
strcpy(event, *array++);
if (strcmp(*array, "-")) {
    strcat(event, " still logged in");
} else {
    array++;
    strcat(event, " - ");
    strcat(event, *array++);
}
}

int main(void) {

char line[2048];
char user[64], term[64], ip[64], event[64];

while (fgets(line, 2048, stdin) != NULL) {
    parseLine(line, user, term, ip, event);
    printf("[%s][%s][%s][%s]\n", user, term, ip, event);
    /* use an array to save them ... */
}
return 0;
}

and then: ./a.out < file.txt 然后： ./a.out < file.txt

Answer 4

For what it is worth, here is my suggestion. 对于它的价值，这是我的建议。 Roll your own string-tokeniser as follows: 滚动您自己的字符串令牌，如下所示：

static char *string_tok(char **stringp, const char *delim)
{
    char *tok = *stringp + strspn(*stringp, delim);
    char *end = tok + strcspn(tok, delim);

    if (*end) {
        *end++ = '\0';
        end += strspn(end, delim);
    }
    *stringp = end;
    return tok;
}

Then just call it in sequence for each token. 然后，只需为每个令牌依次调用它即可。 After the third call to string_tok the buffer buf holds a pointer to the start of the remainder of the string (the events). 在对string_tok的第三次调用string_tok ，缓冲区buf持有一个指向字符串其余部分（事件）开始的指针。 Note that buf must be writeable. 请注意，buf必须是可写的。

static void parse(char * buf)
{
    char * user_id = string_tok(&buf, " \t");
    char * term    = string_tok(&buf, " \t");
    char * ip      = string_tok(&buf, " \t");
    printf("user_id:  %s\n", user_id);
    printf("terminal: %s\n", term);
    printf("ip addr:  %s\n", ip);
    printf("events:   %s\n\n", buf);
}

从C中的文本文件读取解析数据的最佳方法？

问题描述

4 个解决方案

解决方案1
4 已采纳 2012-03-29 18:36:12

解决方案2
2 2012-03-29 18:02:39

解决方案3
0 2012-03-29 18:24:01

解决方案4
0 2012-03-30 03:21:22

从C中的文本文件读取解析数据的最佳方法？

问题描述

4 个解决方案

解决方案1 4 已采纳 2012-03-29 18:36:12

解决方案2 2 2012-03-29 18:02:39

解决方案3 0 2012-03-29 18:24:01

解决方案4 0 2012-03-30 03:21:22

解决方案1
4 已采纳 2012-03-29 18:36:12

解决方案2
2 2012-03-29 18:02:39

解决方案3
0 2012-03-29 18:24:01

解决方案4
0 2012-03-30 03:21:22