简体繁体 English

乔姆斯基层次结构——真实语言的例子

[英]Chomsky hierarchy - examples with real languages

原文 2020-05-22 10:24:48 3 1 nlp/ grammar/ context-free-grammar/ automata/ chomsky-hierarchy

I'm trying to understand the four levels of the Chomsky hierarchy by using some real languages as models.我试图通过使用一些真实语言作为模型来理解乔姆斯基层次结构的四个层次。 He thought that all the natural languages can be generated through a Context-free Grammar , but Schieber contradicted this theory proving that languages such as Swiss German can only be generated through Context-sensitive grammar .他认为所有自然语言都可以通过上下文无关文法生成，但席伯反驳了这一理论，证明瑞士德语等语言只能通过上下文相关文法生成。 Since Chomsky is from US, I guess that the American language is an example of Context-free grammar.由于乔姆斯基来自美国，我猜美国语言是上下文无关语法的一个例子。 My questions are:我的问题是：

Are there languages which can be generated by regular grammars (type 3)?是否有可以通过常规语法（类型 3）生成的语言？
Since the Recursively enumerable grammars can generate all languages, why not using that?既然递归可枚举语法可以生成所有语言，为什么不使用它呢？ Are they too complicated and less linear?它们是否过于复杂且线性度较低？
What the characteristic of Swiss German which make it impossible to be generated through Context-free grammars?瑞士德语有什么特点，无法通过上下文无关语法生成？

1 个解决方案

I don't think this is an appropriate question for StackOverflow, which is a site for programming questions.我不认为这对于 StackOverflow 来说是一个合适的问题，它是一个编程问题的网站。 But I'll try to address it as best I can.但我会尽我所能解决它。

I don't believe Chomsky was ever under the impression that natural languages could be described with a Type 2 grammar.我不相信乔姆斯基曾经认为自然语言可以用类型 2 语法来描述。 It is not impossible for noun-verb agreement (singular/plural) to be represented in a Type 2 grammar, because the number of cases is finite, but the grammar is awkward.在类型 2 语法中表示名词动词一致（单数/复数）并非不可能，因为案例的数量是有限的，但语法很尴尬。 But there are more complicated features of natural language, generally involving specific rules about how word order can be rearranged, which cannot be captured in a simple grammar.但自然语言还有更复杂的特征，一般都涉及到如何重新排列词序的特定规则，而这些规则无法用简单的语法来捕捉。 It was Chomsky's hope that a second level of analysis -- "transformational grammars" -- could useful capture these rearrangement rules without making the grammar computationally intractable.乔姆斯基希望第二级分析——“转换语法”——能够有用地捕捉这些重排规则，而不会使语法在计算上难以处理。 That would require finding some systematization which fit between Type 1 and Type 2, because Type 1 grammars are not computationally tractable.这将需要找到一些适合类型 1 和类型 2 的系统化，因为类型 1 语法在计算上不易于处理。

Since we do, in fact, correctly parse our own languages, it stands to reason that there be some computational algorithm.事实上，由于我们确实正确地解析了我们自己的语言，因此有一些计算算法是理所当然的。 But that line of reasoning might not actually be correct, because there is a limit to the complexity of a sentence which we can parse.但是这种推理实际上可能并不正确，因为我们可以解析的句子的复杂性是有限的。 Any finite language is regular (Type 3);任何有限语言都是正则的（类型 3）； only languages which have an unlimited number of potential sentences require more sophisticated grammars.只有具有无限数量的潜在句子的语言才需要更复杂的语法。 So a large collection of finite patterns could suffice to understand natural language.因此，大量有限模式的集合足以理解自然语言。 These patterns might be a lot more sophisticated than regular expressions, but as long as each pattern only applies to a sentence of limited length, the pattern could be expressed mathematically as a regular expression.这些模式可能比正则表达式复杂得多，但只要每个模式仅适用于长度有限的句子，该模式就可以在数学上表示为正则表达式。 (The most obvious one is to just list all possible sentences as alternatives, which is a regular expression if the number of possible sentences is finite. But in many cases, that might be simplified into something more useful.) （最明显的一个是列出所有可能的句子作为替代，如果可能的句子的数量是有限的，这是一个正则表达式。但在许多情况下，这可能会被简化为更有用的东西。）

As I understand it, modern attempts to deal with natural language using so-called "deep learning" are essentially based on pattern recognition through neural networks, although I haven't studied the field deeply and I'm sure that there are many complications I'm skipping over in that simple description.据我了解，现代使用所谓的“深度学习”处理自然语言的尝试基本上是基于通过神经网络进行模式识别，尽管我没有深入研究该领域并且我确信我有很多复杂性我跳过了那个简单的描述。

Noam Chomsky is an American, but "American" is not a language (y si fuera, podría ser castellano, hablado por la mayoría de los residentes de las Americas).诺姆乔姆斯基是美国人，但“美国人”不是一种语言（y si fuera, podría ser castellano, hablado por la mayoríade los residentes de las Americas）。 As far as I know, his first language is English, but he is not by any means unilingual , although I don't know how much Swiss German he speaks.据我所知，他的第一语言是英语，但他绝不是单语，虽然我不知道他会说多少瑞士德语。 Certainly, there have been criticisms over the years that his theories have an Indo-European bias.当然，多年来一直有人批评他的理论存在印欧偏见。 Certainly, I don't claim competence in Swiss German, despite having lived several years in Switzerland, but I did read Shieber's paper and some of the follow-ups and discussed them with colleagues who were native Swiss German speakers.当然，尽管我在瑞士生活了几年，但我并没有声称自己精通瑞士德语，但我确实阅读了 Shieber 的论文和一些后续文章，并与母语为瑞士德语的同事讨论了这些内容。 (Opinions were divided.) （意见分歧。）

The basic issue has to do with morphological agreement in lists.基本问题与列表中的形态一致性有关。 As I mentioned earlier, many languages (all Indo-European languages, as far as I know) insist that the form of the verb agrees with the form of the subject, so that a singular subject requires a singular verb and a plural subject requires a plural verb.正如我前面提到的，许多语言（据我所知，所有印欧语言）都坚持动词的形式与主语的形式一致，因此单数主语需要单数动词，复数主语需要复数动词。 [Note 1] [注1]

In many languages, agreement is also required between adjectives and nouns, and this is not just agreement in number but also agreement in grammatical gender (if applicable).在许多语言中，形容词和名词之间也需要一致，这不仅是数量上的一致，而且在语法上也一致（如果适用）。 Also, many languages require agreement between the specific verb and the article or adjective of the object of the verb.此外，许多语言要求特定动词与动词 object 的冠词或形容词一致。 [Note 2] [笔记2]

Simple agreement can be handled by a context-free (Type 2) grammar, but there is a huge restriction.简单的协议可以通过上下文无关（Type 2）语法来处理，但是有很大的限制。 To put it simply, a context-free grammar can only deal with parenthetic constructions.简单地说，上下文无关文法只能处理括号结构。 This can work even if there is more than one type of parenthesis, so a context-free grammar can insist that an [ be matched with a ] and not a ) .即使有不止一种类型的括号，这也可以工作，因此上下文无关语法可以坚持[匹配 a ]而不是 a ) 。 But the grammar must have this "inside-out" form: the matching symbols must be in the reverse order to the symbols being matched.但是语法必须具有这种“由内而外”的形式：匹配符号的顺序必须与被匹配符号的顺序相反。

One consequence of this is that there is a context-free grammar for palindromes -- sentences which read the same in both directions, which effectively means that they consist of a phrase followed by its reverse.这样做的一个结果是，回文有一种上下文无关的语法——在两个方向上读起来都一样的句子，这实际上意味着它们由一个短语后面跟着它的反义词组成。 But there is no context-free grammar for duplications : a language consisting of repeated phrases.但是对于重复没有上下文无关的语法：一种由重复短语组成的语言。 In the palindrome, the matching words are in the reverse order to the matched words;在回文中，匹配词与匹配词的顺序相反； in the duplicate, they are in the same order.在副本中，它们的顺序相同。 Hence the difference.因此差异。

Agreement in natural languages mostly follows this pattern, and some of the exceptions can be dealt with by positing simple rules for reordering finite numbers of phrases -- Chomsky's transformational grammar.自然语言中的一致性大多遵循这种模式，一些例外情况可以通过设定用于重新排序有限数量的短语的简单规则来处理——乔姆斯基的转换语法。 But Swiss German features at least one case where agreement is not parenthetic, but rather in the same order.但是瑞士德语至少有一个案例，其中协议不是括号，而是以相同的顺序。 [Note 3] This involves the feature of German in which many sentences are in the order Subject-Object-Verb, which can be extended to Subject Object Object Object... Verb Verb Verb... when the verbs have indirect objects. [注3] 这涉及到德语的特点，许多句子的顺序是主宾动词，可以扩展为主语Object Object Z497031794414A552435F90151AC... Shieber showed some examples in which object-verb agreement is ordered, even when there are intervening phrases. Shieber 展示了一些例子，其中对象-动词一致是有序的，即使有插入的短语。

In the general case, such "cross-serial agreement" cannot be expressed in a context-free grammar.在一般情况下，这种“跨串行协议”不能用上下文无关文法来表达。 But there is a huge underlying assumption: that the length of the agreeing series be effectively unlimited.但是有一个巨大的潜在假设：一致序列的长度实际上是无限的。 If, on the other hand, there are a finite number of patterns actually in common use, the "deep learning" model referred to above would certainly be able to handle it.另一方面，如果实际常用的模式数量有限，那么上面提到的“深度学习”model 肯定能够处理它。

(I want to say that I'm not endorsing deep learning here. In fact, the way "artificial intelligence" is "trained" involves the uses of trainers whose cultural biases may well not be sufficiently understood. This could easily lead to the same unfortunate consequences alluded to in my first footnode.) （我想说我在这里并不支持深度学习。事实上，“人工智能”被“训练”的方式涉及使用可能没有充分理解文化偏见的培训师。这很容易导致同样的问题在我的第一个脚节点中提到了不幸的后果。）

Notes笔记

This is not the case in many native American languages, as Whorf pointed out.正如沃尔夫所指出的，在许多美国本土语言中并非如此。 In those languages, using a singular verb with a plural noun implies that the action was taken collectively, while using a plural verb would imply that the action was taken separately.在这些语言中，使用单数动词和复数名词意味着该动作是集体采取的，而使用复数动词则意味着该动作是单独采取的。 Roughly transcribed to English, "The dogs run" would be about a bunch of dogs independently running in different directions, whereas "The dogs runs" would be about a single pack of dogs all running together.粗略地翻译成英语，“The dogs run”是关于一群狗在不同方向上独立奔跑，而“The dogs runs”是关于一群狗一起跑。 Some European "teachers" who imposed their own linguistic prejudices on native languages failed to correctly understand this distinction, and concluded that the native Americans must be too primitive to even speak their own language "correctly";一些将自己的语言偏见强加于母语的欧洲“老师”未能正确理解这种区别，并得出结论认为美洲原住民必须过于原始，甚至无法“正确”说自己的语言； to "correct" this "deficiency", they attempted to eliminate the distinction from the language, in some cases with success.为了“纠正”这种“缺陷”，他们试图消除语言中的区别，在某些情况下取得了成功。
These rules, not present in English, are one of the reasons some English speakers are tortured by learning German.这些规则在英语中不存在，是一些说英语的人因学习德语而受到折磨的原因之一。 I speak from personal experience.我从个人经历说。
Ordered agreement, as opposed to parenthetic agreement, is known as cross-serial dependency .有序协议，与括号协议相反，被称为跨序列依赖。