简体   繁体   English

如何定义一种不符合乔姆斯基层次结构的语言?

[英]How can one define a language which does not fit in the Chomsky Hierarchy?

I'm asking this question because I've stumbled across the accepted answer of Chomsky Language Types我问这个问题是因为我偶然发现了乔姆斯基语言类型的公认答案

This quote is referring to Type-0 Grammars:这句话指的是 Type-0 语法:

This means that if you have a language that is more expressive than this type (eg English) , you cannot write an algorithm that can list each an every (and only these) words of the language这意味着如果你有一种比这种类型更具表现力的语言(例如英语) ,你就不能编写一个算法来列出该语言的每一个(并且只有这些)单词

As far as I know:我所知道的:

  • There is no mathematical description for what English is so it is meaningless to argue about where it lands in the hierarchy of formal languages.英语是什么没有数学描述,因此争论它在形式语言层次结构中的位置是没有意义的。
  • If there was, then English would certainly be recognizable by some Type-0 Grammar by virtue of it being defined by a finite amount of reasoning - where it be axioms, a grammar, anything.如果有,那么英语肯定会被一些 Type-0 语法识别,因为它是由有限数量的推理定义的——它可以是公理、语法等等。 (If not - how could've someone define it if not by a finite amount of steps?) (如果不是——如果不是通过有限的步骤,别人怎么能定义它?)

Hence:因此:

  • We can't start talking about how 'expressive' a grammar needs to be to generate precisely an unknown mathematical object我们不能开始谈论语法需要多么“富有表现力”才能精确生成一个未知的数学 object

Therefore my problem:因此我的问题:

  • How can one define a language which does not fit in the Chomsky Hierarchy?如何定义一种不符合乔姆斯基层次结构的语言?
  • If (?) it takes a finite amount of steps for mathematicians to define sets with cardinalities that do not make them recursively enumerable - then grammars must exist which are more expressive than Type-0 since they (mathematicians) have followed a finite amount of rules (production rules if you will) to produce a non-RE set.如果(?)数学家需要有限数量的步骤来定义具有不使它们递归可枚举的基数的集合 - 那么必须存在比 Type-0 更具表现力的文法,因为它们(数学家)遵循有限数量的规则(生产规则,如果你愿意的话)生产一个非 RE 集。 Where are they?他们在哪里?

A language is a possibly-infinite set of finite words written with some finite alphabet.语言是用一些有限的字母表写成的一组可能无限的有限词。 Since the alphabet is finite and the length of each word is finite, the words of any language are enumerable, in the sense that there exists an enumeration.由于字母表是有限的,每个单词的长度也是有限的,因此任何语言的单词都是可枚举的,从存在枚举的意义上说。 In other words, the size of any language is at most countably infinite.换句话说,任何语言的大小至多是可数无限的。

However, since any subset of the Kleene closure of the alphabet is a language, the number of languages is not countably infinite.但是,由于字母表的 Kleene 闭包的任何子集都是一种语言,因此语言的数量不是无限的。 Hence, there is no enumeration of languages.因此,没有语言的枚举。

The Chomsky hierarchy is based on a formalism which can be expressed as a finite sentence with a finite alphabet (the same alphabet as the language being described, plus a couple of extra symbols).乔姆斯基层次结构基于一种形式,可以表示为具有有限字母的有限句子(与所描述的语言相同的字母,加上几个额外的符号)。 [Note 1] So the number of possible Type 0 grammars is countably infinite, and there cannot be a correspondence between the set of grammars and the set of languages. [注 1] 所以可能的 Type 0 文法的数量是可数无限的,文法集和语言集之间不存在对应关系。

However.然而。 The existence of languages (ie sets) for which no generative grammar exists does not necessarily mean that there is some other way of describing these languages which is "more expressive" than generative grammars.不存在生成语法的语言(即集合)的存在并不一定意味着存在比生成语法“更具表现力”的其他方式来描述这些语言。 Any description which can be written as a finite string using a finite alphabet can only describe a countable infinity of sets.任何可以使用有限字母表写成有限字符串的描述只能描述可数的无穷集合。 Whether or not it is the same countable infinity will depend on the formalisms, and in general there will be no algorithm which can demonstrate homomorphism.是否相同的可数无穷大取决于形式,并且通常不会有可以证明同态的算法。 But some equivalences are known (such as the equivalence with Turing machines, which is a particularly interesting equivalence).但是一些等价是已知的(例如与图灵机的等价,这是一个特别有趣的等价)。

So, we have an interesting little conundrum, which is (of course) related to Gödel's Incompleteness Theorems.所以,我们有一个有趣的小难题,它(当然)与哥德尔的不完备性定理有关。 That is, there are more languages than ways of describing a language, no matter what system we use to describe a language.也就是说,无论我们使用什么系统来描述一种语言,语言都比描述语言的方式多。 So the question "How do we describe a language for which no description is available?"所以问题是“我们如何描述一种没有可用描述的语言?” does not have a good answer (and if we answer it, by calling some set "Sue", then there will still be an uncountable infinitude of possible sets for which no name exists).没有一个好的答案(如果我们通过调用某个集合“Sue”来回答它,那么仍然会有无数个不存在名称的可能集合)。

While all this foraging into infinitudes is interesting, it has a few issues:虽然所有这些无限的探索都很有趣,但它有一些问题:

  1. It has very little (if anything) to do with programming, so it's questionable whether it's on topic for StackOverflow.它与编程几乎没有关系(如果有的话),因此它是否是 StackOverflow 的主题值得怀疑。

  2. Kurt Gödel and Georg Cantor, the two mathematicians responsible for most of the concepts in this answer, both suffered from severe depression. Kurt Gödel 和 Georg Cantor 这两位数学家负责这个答案中的大部分概念,他们都患有严重的抑郁症。 Just saying.只是说。


Notes笔记

  1. Although at first glance it might appear that the alphabet for a Type 0 grammar might be arbitrarily larger than the alphabet of the language being described, that is not actually the case.尽管乍一看,Type 0 语法的字母表可能比所描述语言的字母表任意大,但实际上并非如此。 The grammar's alphabet consists of the target alphabet plus a finite set of non-terminals plus an → symbol;语法的字母表由目标字母表加上一组有限的非终结符加上一个→符号组成; the non-terminals can be written using numbers in any convenient base, say binary.可以使用任何方便的基数(例如二进制)来编写非终结符。 So only three additional symbols are required (and you could reduce that to two by arbitrarily designating one of the non-terminal numbers to be the arrow).所以只需要三个额外的符号(你可以通过任意指定一个非终端数字作为箭头将其减少到两个)。 (It might seem like you need a third symbol to delimit the names of non-terminals, but you can use a fibonacci encoding to produce codes which always start with a 1 and never include two 1s, so that you can use an extra 1 at the beginning to unambiguously mark the start of the symbol.) (看起来您似乎需要第三个符号来分隔非终结符的名称,但您可以使用斐波那契编码来生成始终以 1 开头且从不包含两个 1 的代码,这样您就可以在开始明确地标记符号的开始。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM