简体   繁体   English

读取和解析之间的区别

[英]Difference between reading and parsing

我在某处读到用C编程语言编写的scanf 解析了数据( http://sekrit.de/webdocs/c/beginners-guide-away-from-scanf.html)-有人可以解释一下这是什么意思(我尝试使用Google搜索但无法正确获取)。

"Parse data" means that you take some raw data and put a specific meaning into the data. “解析数据”表示您获取了一些原始数据,并在数据中添加了特定含义。

"Read data" is a less specific term but in this context is can refer to "getting raw data into your program from some device". “读取数据”是一个不太具体的术语,但是在这种情况下可以指“从某些设备将原始数据获取到程序中”。 Here "some device" can be many things - it can be a terminal (stdin), a disk, a network connection and so on. 这里的“某些设备”可以有很多东西-它可以是终端(stdin),磁盘,网络连接等等。 A program can read raw data without knowing what the data means. 程序可以在不知道数据含义的情况下读取原始数据。 The program will just get a sequence of numbers (bytes) without knowing what these numbers mean. 该程序将只获得一个数字序列(字节),而不知道这些数字的含义。 In pseudo-code: 用伪代码:

 // READ DATA
 unsigned char array[some-size];
 size_t index = 0;
 while(input-device-has-data)
 {
     array[index] = get-input-from-device();
     ++index;
 }

After this loop that reads data, the array may contain: 在读取数据的此循环之后,数组可能包含:

array[0]: 74
array[1]: 79
array[2]: 69
array[3]: 32
array[4]: 52
array[5]: 50
array[6]: 0

A sequence of numbers that doesn't really seem mean anything. 看起来似乎并不意味着什么的数字序列。 So the next step is to setup some rules for the data to make them meaningful. 因此,下一步是为数据设置一些规则以使其有意义。 The first step is the encoding, ie what does the number 74 mean? 第一步是编码,即数字74是什么意思? One such encoding is the ascii table that defines how numbers are translated into characters. 一种这样的编码是ascii表,它定义了如何将数字转换为字符。 Using the ascii table the numbers above become: 使用ascii表,以上数字变为:

array[0]: J
array[1]: O
array[2]: E
array[3]: space
array[4]: 4
array[5]: 2
array[6]: NUL

Then you can setup rules for the data you want your program to receive. 然后,您可以为希望程序接收的数据设置规则。 In this simple example the rule would be something like: name age 在这个简单的示例中,规则将类似于: name age

So in your program you will like to "convert the raw data" into two variables. 因此,在您的程序中,您希望将“原始数据”转换为两个变量。 One variable that holds the name and another variable that holds the age. 一个包含名称的变量,另一个包含年龄的变量。 In pseudo-code: 用伪代码:

string name = get_name(array);
int age = get_age(array);

That is "to parse" data, ie take some raw data (a sequence of numbers) an put specific meaning (semantic) into the data while following some rules for the data (syntax). 那就是“解析”数据,即在遵循一些数据规则(语法)的同时,将一些原始数据(一系列数字)以特定的含义(语义)带入数据中。

And that is exactly what the scanf-family functions can do for you. 这正是scanf系列功能可以为您完成的工作。 In this case like: 在这种情况下,例如:

// Parsing
char name[20];
int age;
int result = sscanf(array, "%19s %d", name, &age);

Here sscanf parses the raw data contained in the array and tries to map the raw data into a word (aka name - max 19 characters) followed by a space followed by a number (aka age). sscanf在这里解析array包含的原始数据,并尝试将原始数据映射到一个单词(又名,最多19个字符),然后是一个空格,再加上一个数字(即age)。 This is the rules that the format specifier sets, ie the string "%19s %d" . 这是格式说明符设置的规则,即字符串"%19s %d"

If sscanf can parse the data according to those rules, it will return the value 2 to telle that data was parsed into 2 variables and the variable name will hold the first word and the variable age will hold the number. 如果sscanf可以根据这些规则解析数据,它将返回值2告诉您数据已解析为2个变量,变量name包含第一个单词,变量age将保留数字。 That's what parsing is about. 这就是解析的目的。

You read it wrong, the scanf() function scans information from stdin and converts the information read according to what type modifier characters (eg %s, %d, %c) are specified. 您读错了, scanf()函数从stdin扫描信息,并根据指定的类型修饰符(例如%s,%d,%c)转换读取的信息。 Parsing is a much more sophisticated technique that involves tokenizing the input, validating it against a set rules in a context free grammar, and building an abstract syntax tree to verify if the input is part of a language. 解析是一种更为复杂的技术,其中涉及对输入进行标记,根据上下文无关文法中的设置规则对输入进行验证,以及构建抽象语法树以验证输入是否为语言的一部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM