[英]Parsing a comma separated file using C using fscanf()
I have a file with data something like this - 我有一个数据类似的文件 -
Name, Age, Occupation
John, 14, Student
George, 14, Student
William, 23, Programmer
Now, I want to read the data such that each value (eg Name, Age etc.) are read as a string. 现在,我想读取数据,使每个值(例如Name,Age等)作为字符串读取。
This is my code snippet - 这是我的代码片段 -
....
if (!(ferror(input_fp) || ferror(output_fp))) {
while(fscanf(input_fp, "%30[^ ,\n\t]%30[^ ,\n\t]%30[^ ,\n\t]",
name, age_array, occupation) != EOF){
fprintf(stdout, "%-30s%-30s%-30s\n", name, age_array, occupation);
}
fclose(input_fp);
fclose(output_fp);
}
....
However, this goes into an infinite loop giving some random output. 然而,这会进入一个无限循环,给出一些随机输出。
This is how I understand my input conversion specifiers
. 这就是我理解
input conversion specifiers
。
%30[^ ,\\n\\t]
-> read a string that is at the maximum 30 characters long and that %30[^ ,\\n\\t]
- >读取一个长度最多为30个字符的字符串
DOES NOT include either a space, a comma, a newline or a tab character. 不包括空格,逗号,换行符或制表符。
And I am reading 3 such strings. 我正在阅读3个这样的字符串。
Where am I going wrong? 我哪里错了?
OP's OP的
fscanf(input_fp, "%30[^ ,\n\t]%30[^ ,\n\t]%30[^ ,\n\t]", ...
does not consume the ','
nor the '\\n'
in the text file. 不消耗文本文件中的
','
和'\\n'
。 Subsequent fscanf()
attempts also fail and return a value of 0, which not being EOF
, causes an infinite loop. 随后的
fscanf()
尝试也会失败并返回值0(不是EOF
)会导致无限循环。
Although OP requested a fscanf()
solution, a fgets()/sscanf()
better handles potential IO and parsing errors. 尽管OP请求了
fscanf()
解决方案,但fgets()/sscanf()
更好地处理潜在的IO和解析错误。
FILE *input_fp;
FILE *output_fp;
char buf[100];
while (fgets(buf, sizeof buf, input_fp) != NULL) {
char name[30]; // Insure this size is 1 more than the width in scanf format.
char age_array[30];
char occupation[30];
#define VFMT " %29[^ ,\n\t]"
int n; // Use to check for trailing junk
if (3 == sscanf(buf, VFMT "," VFMT "," VFMT " %n", name, age_array,
occupation, &n) && buf[n] == '\0') {
// Suspect OP really wants this width to be 1 more
if (fprintf(output_fp, "%-30s%-30s%-30s\n", name, age_array, occupation) < 0)
break;
} else
break; // format error
}
fclose(input_fp);
fclose(output_fp);
Rather than call ferror()
, check return values of fgets()
, fprintf()
. 而不是调用
ferror()
,检查fgets()
, fprintf()
返回值。
Suspect OP's undeclared field buffers were [30]
and adjusted scanf()
accordingly. 怀疑OP未声明的字段缓冲区为
[30]
并相应地调整了scanf()
。
[edit] [编辑]
Details about if (3 == sscanf(buf, VFMT "," ...
关于
if (3 == sscanf(buf, VFMT "," ...
详细信息if (3 == sscanf(buf, VFMT "," ...
The if (3 == sscanf(...) && buf[n] == '\\0') {
becomes true when: if (3 == sscanf(...) && buf[n] == '\\0') {
在以下情况下变为:
1) exactly the 3 "%29[^ ,\\n\\t]"
format specifiers each scanf in at least 1 char
each. 1)确切地说3
"%29[^ ,\\n\\t]"
格式说明每个scanf至少1个char
。
2) buf[n]
is the end of the string. 2)
buf[n]
是字符串的结尾。 n
is set via the "%n"
specifier. n
通过"%n"
说明符设置。 The preceding ' '
in " %n"
causes any following white-space after the last "%29[^ ,\\n\\t]"
to be consumed. 前面的
' '
in " %n"
会导致在消耗最后一个"%29[^ ,\\n\\t]"
之后的任何后续空格。 scanf()
sees "%n"
, which directs it to set the current offset from the beginning of scanning to be assign to the int
pointed to by &n
. scanf()
看到"%n"
,它指示它设置从扫描开始的当前偏移量,以分配给&n
指向的int
。
"VFMT "," VFMT "," VFMT " %n"
is concatenated by the compiler to "VFMT "," VFMT "," VFMT " %n"
由编译器连接到
" %29[^ ,\\n\\t], %29[^ ,\\n\\t], %29[^ ,\\n\\t] %n"
. " %29[^ ,\\n\\t], %29[^ ,\\n\\t], %29[^ ,\\n\\t] %n"
。
I find the former easier to maintain than the latter. 我发现前者比后者更容易维护。
The first space in " %29[^ ,\\n\\t]"
directs sscanf()
to scan over (consume and not save) 0 or more white-spaces ( ' '
, '\\t'
, '\\n'
, etc.). " %29[^ ,\\n\\t]"
的第一个空格指示sscanf()
扫描(消耗而不是保存)0个或更多个空格( ' '
, '\\t'
, '\\n'
等等) )。 The rest directs sscanf()
to consume and save any 1 to 29 char
except ','
, '\\n'
, '\\t'
, then append a '\\0'
. 其余的指示
sscanf()
使用并保存除 ','
, '\\n'
, '\\t'
之外 ','
任何 1到29个char
,然后追加'\\0'
。
You're not skipping the actual commas and spaces between the values. 您没有跳过值之间的实际逗号和空格。
Once the first %30[^ ,\\n\\t]
specifier has matched, the input probably contains a comma and a space, which aren't matched by the following thing in the format string. 一旦第一个
%30[^ ,\\n\\t]
说明符匹配,输入可能包含一个逗号和一个空格,它们与格式字符串中的以下内容不匹配。
Add comma and space to the formatting string where expected in the input: 将逗号和空格添加到输入中预期的格式字符串:
while(fscanf(input_fp, "%30[^ ,\n\t], %30[^ ,\n\t], %30[^ ,\n\t]", name, age_array, occupation) == 3)
^ ^
| |
\ /
add these to make
fscanf() skip them
in the input!
Also, your check of fscanf()
's return value is sub-optimal: before relying on the values to have been converted, you should check that the return value equals the number of conversions. 此外,检查
fscanf()
的返回值是次优的:在依赖于已转换的值之前,应检查返回值是否等于转换次数。
Plus, your use of the backslash line-continuation character is completely pointless and should be removed. 另外,你使用反斜杠线延续字符是完全没有意义的,应该删除。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.