简体繁体 English

“类型”在物理上意味着什么？

[英]What does “Type” mean, physically?

原文 2010-08-10 13:46:23 1 10 .net/ language-agnostic/ types/ strong-typing

I have heard a lot about " type system", "strongly typed language" and so on. 我听过很多关于“ 类型系统”，“强类型语言”等等。 Currently I am working on some .NET COM interop problem, which addressed "marshaling" a lot. 目前我正在研究一些.NET COM互操作问题，它解决了“编组”问题。 And AFAIK, marshaling is quite about conversion between .NET types and COM types. 而AFAIK，marshaling则是关于.NET类型和COM类型之间的转换。

In many scenarios such as programming language, when talking about types , we are concerned about the logic meaning. 在诸如编程语言的许多场景中，当谈论类型时 ，我们关注逻辑意义。

Now I am wondering: what does "type" mean physically ? 现在我想知道： “类型”在物理上意味着什么？ In a way we can watch & touch. 在某种程度上，我们可以观看和触摸。

My current understanding is that "type" is nothing but the in-memory representation of an computation entity. 我目前的理解是“类型”只不过是计算实体的内存表示 。

Many thanks to your replies. 非常感谢您的回复。

Adding-1 添加-1

Some quotation from MSDN : 来自MSDN的一些引用：

Marshaling simple, blittable structures across the managed/unmanaged boundary first requires that managed versions of each native structure be defined. 在托管/非托管边界上编组简单的blittable结构首先要求定义每个本机结构的托管版本。 These structures can have any legal name; 这些结构可以有任何合法的名称; there is no relationship between the native and managed version of the two structures other than their data layout. 除了数据布局之外，两个结构的本机版本和托管版本之间没有任何关系。 Therefore, it is vital that the managed version contains fields that are the same size and in the same order as the native version. 因此，托管版本包含与本机版本大小相同且顺序相同的字段至关重要。 (There is no mechanism for ensuring that the managed and native versions of the structure are equivalent, so incompatibilities will not become apparent until run time. It is the programmer's responsibility to ensure that the two structures have the same data layout.) （没有机制确保结构的托管版本和本机版本是等效的，因此不兼容性在运行时才会变得明显。程序员有责任确保两个结构具有相同的数据布局。）

So as far as Marshaling is concerned, it is the layout matters. 因此，就Marshaling而言，布局至关重要。

10 个解决方案

I think there are three aspects to “types” in programming (and they probably overlap, so don't take this as a hard-and-fast separation): 我认为编程中“类型”有三个方面（它们可能重叠，所以不要将其视为一种快速分离）：

A type is an element of a set of types , and every program/assembly/unit defines such a set. 类型是一组类型的元素，并且每个程序/程序集/单元定义这样的集合。 This is the most theoretical idea I can think of and is probably most useful to logicians and mathematicians. 这是我能想到的最理论化的想法，可能对逻辑学家和数学家最有用。 It is very general, and it allows you to define the idea of a type system on top of it. 它非常通用，它允许您在其上定义类型系统的想法。 For example, a programming environment might define a relation on those types, eg the is-assignable-to relation. 例如，编程环境可以定义这些类型的关系，例如is-assignable-to relation。
A type is a semantic category. 类型是语义类别。 This is a linguistic or cognitive idea; 这是一种语言或认知的想法; in other words, it is most useful to humans who are thinking about how to program the computer. 换句话说，对于正在考虑如何编程计算机的人来说，它是最有用的。 The type encapsulates what we think of as “things that belong in a category”. 该类型封装了我们所认为的“属于某一类别的东西”。 A type might be defined by a common purpose of entities. 类型可以由实体的共同目的来定义。 This categorisation according to purpose is, of course, arbitrary, but that's okay, since the declaration of types in programming is arbitrary too. 当然，这种根据目的的分类是任意的，但这没关系，因为编程中的类型声明也是任意的。
A type is a specification of how data is layed out in memory. 类型是数据如何在内存中布局的规范。 This is the most low-level idea I can think of. 这是我能想到的最低级别的想法。 Under this point of view, a type says nothing about the purpose or semantics of the data, but only how the computer is going to construct it, process it, etc. In this idea a type is somewhat more like a data encoding or a communications protocol . 在这种观点下，一个类型没有说明数据的目的或语义，而只是计算机将如何构建它，处理它等等。在这个想法中，类型更像是数据编码或通信协议 。

Which meaning of type you go by depends on your domain. 您选择的类型含义取决于您的域名。 As already hinted, if you're a logician doing research on how to prove properties of a program, the first definition is going to be more useful than the third because the data layout is (usually) irrelevant to the proof. 正如已经暗示的那样，如果你是一位正在研究如何证明程序属性的逻辑学家，那么第一个定义将比第三个定义更有用，因为数据布局（通常）与证明无关。 If you're a hardware designer or the programmer of a low-level system such as the CLR or the JavaVM, then you need the third idea and you don't really care about the first. 如果您是硬件设计人员或CLR或JavaVM等低级系统的程序员，那么您需要第三个想法并且您并不真正关心第一个想法。 But to the common programmer who just wants to get on with their task, it is probably the middle one that applies. 但对于那些只想继续完成任务的普通程序员来说，它可能是适用的中间程序员 。

In many languages, physically the types only exists at compile time. 在许多语言中，物理上这些类型仅在编译时存在。 This is especially true of older languages. 对于较旧的语言尤其如此。 I would guess that C has such types that never exist in memory at all, in any way , while the program is running. 我猜想，C有这种类型的永远存在于内存中所有，以任何方式 ，在程序运行时。

In other languages - specifically those which allow run-time type information access (for example C++ with RTTI , or C#, or any dynamic language like Python) - the types are just metadata. 在其他语言中 - 特别是那些允许运行时类型信息访问的语言（例如带有RTTI的 C ++，或C＃，或者像Python这样的任何动态语言） - 类型只是元数据。 A binary description of the type. 类型的二进制描述。 You know, the kind of stuff you would get if you tried to serialize data into a binary stream. 你知道，如果你试图将数据序列化为二进制流，你会得到的东西。

I would say just the opposite. 我会说恰恰相反。 It is the language representation of the bits and bytes in memory. 它是内存中位和字节的语言表示。

类型是关于位和字节的元数据，它定义了如何以有意义和安全的方式操作它们。

I would say type can have several meanings. 我会说类型可以有多种含义。

I tend to prefer its meaning as an Interface constraints. 我倾向于将其意义作为接口约束。 (Well written object code defines all in-memory data as private). （编写良好的目标代码将所有内存数据定义为私有）。

And in such case, type is absolutely NOT related to in-memory representation. 在这种情况下，类型绝对与内存中表示无关。 On the contrary, it's only a contract on its member methods. 相反，它只是其成员方法的合同。

A "type" is a set whose members ("objects") have a discrete finite representation and a useful set of shared attributes. “类型”是一个集合，其成员（“对象”）具有离散的有限表示和一组有用的共享属性。

The actual in-memory representation of an object is not necessarily part of the definition of a type. 对象的实际内存中表示不一定是类型定义的一部分。 That is to say that a single object may have multiple in-memory representations. 也就是说，单个对象可能具有多个内存中表示。 The important thing is that an object may not be infinite or analog. 重要的是物体可能不是无限的或模拟的。

The shared attributes of a type can be anything. 类型的共享属性可以是任何东西。 In object-oriented system, the attributes would include (at a low level) data and behavior. 在面向对象的系统中，属性将包括（在较低级别）数据和行为。 Event notifications are also common. 事件通知也很常见。 Some attributes may be conditional without violating the type definition (if boolean attribute X is true, then attribute Y also exists), so long as the rules are consistent across all objects in the type. 某些属性可能是有条件的而不违反类型定义（如果布尔属性X为真，则属性Y也存在），只要规则在类型中的所有对象中是一致的。

A "subtype" is a subset of a type whose members have a wider set of shared attributes. “子类型”是类型的子集，其成员具有更广泛的共享属性集。

This way of thinking about types is very different from what you pose in the question, and I believe this distinction is important. 这种思考类型的方式与你在问题中提出的方式有很大不同，我相信这种区别很重要。

If one sees types as an in-memory representation, then that representation will be viewed as the salient feature of the type, and it will be taken for granted. 如果将类型视为内存中表示，那么该表示将被视为该类型的显着特征，并且它将被视为理所当然。 Interop will be achieved through low-level conversions and reinterpretations of existing byte sequences. Interop将通过低级转换和现有字节序列的重新解释来实现。 This could lead to problems in some instances when that representation changes. 在某些情况下，当表示发生变化时，这可能会导致问题。

If, however, one sees types in terms of their attributes, then conversions from one type system to another will involve high-level conversions of data fields between corresponding objects. 但是，如果人们根据其属性看到类型，那么从一个类型系统到另一个类型系统的转换将涉及相应对象之间的数据字段的高级转换。 A determination of whether objects are compatible will be based on their salient attributes, and problems become less likely. 确定对象是否兼容将基于其显着属性，并且问题变得不太可能。

Even in the world of interop, knowledge of the internal details of types should not be relied upon. 即使在互操作的世界中，也不应该依赖对类型内部细节的了解。 That is to say, features of an implementation of a type that are not part of the definition of that type should not be used as though they were a part of that type. 也就是说，不应该使用不属于该类型定义的类型的实现的特征，就好像它们是该类型的一部分一样。

This depends on the programming paradigm you're working with. 这取决于您正在使用的编程范例。 In OO types can represent real world objects, in other words, all of the data of a real world object that a computer can represent (or the parts you're interested in anyway). 在OO类型中，可以表示现实世界对象，换句话说，计算机可以表示的现实世界对象的所有数据（或者您感兴趣的部分）。

IIRC strongly type languages enforce the object types at compile time eg a number must be an int, float etc type. IIRC强类型语言在编译时强制执行对象类型，例如，数字必须是int，float等类型。 In weakly typed languages you can say giraffe = 1 + frog * $100 / 'May 1' and the types are resolved at run time. 在弱类型语言中，你可以说giraffe = 1 + frog * $ 100 /'May 1'，并且类型在运行时被解析。 And you usually get lots of runtime errors. 而且您通常会遇到很多运行时错误。

In data interchange situations (like COM, CORBA, RPC etc) it is very hard to enforce types because of binary compatability (big endian, little endian) and formats (how do you represent strings and dates when passing from one language to another, each with different compilers?). 在数据交换情况下（如COM，CORBA，RPC等），由于二进制兼容性（big endian，little endian）和格式（如何在从一种语言传递到另一种语言时表示字符串和日期，每种语言都很难实现类型）与不同的编译器？）。 Hence the marshaling to try and resolve the types of each parameter. 因此，编组尝试解决每个参数的类型。 ASN.1 was one of many attempts to build a 'universal types' framework when interchanging data between machines. ASN.1是在机器之间交换数据时构建“通用类型”框架的众多尝试之一。

A type is a human-readable logical blueprint for how data should be represented and organized in memory. 类型是人类可读的逻辑蓝图，用于表示如何在内存中表示和组织数据。 It is a way of allowing humans to segregate how a concept can be rationalized into a digital sequence in a standard manner. 这是一种允许人类以标准方式将概念如何合理化为数字序列的方式。 The machine and the compiler really don't care about the difference between a string, integer, fooClass. 机器和编译器真的不关心字符串，整数，fooClass之间的区别。 These "types" are simply agreed upon organizational units that all human programmers to translate logical concepts into a rational data structures within the memory. 这些“类型”简单地同意组织单元，所有人类程序员都将逻辑概念转换为内存中的合理数据结构。

Type is a bundle word. 类型是捆绑字。 When you know something's type, you know how much memory it takes up, how the pieces of it are stored, but more importantly you also know what you can do with it. 当你知道什么类型的东西时，你就会知道它占用了多少内存，它是如何存储的，但更重要的是你也知道你可以用它做什么。 For example there are several integer types that take up the same amount of memory as a pointer. 例如，有几种整数类型占用与指针相同的内存量。 However you can multiply one integer type by another (eg 3 times 4) but you cannot multiply two pointers together. 但是，您可以将一个整数类型乘以另一个整数类型（例如，3乘4），但不能将两个指针相乘。 You can call the Foo() method on some user-defined-type (struct or class) that has a Foo method, writing x.Foo() for example, but you can't do that for a different user-defined-type that doesn't have a Foo method. 你可以在一些具有Foo方法的用户定义类型（struct或class）上调用Foo（）方法，例如编写x.Foo（），但是你不能为不同的用户定义类型执行此操作没有Foo方法。 You can cast between some pairs of types, but not between others, or you can cast an A to a B but not a B to an A. And so on. 您可以在某些类型之间进行转换，但不能在其他类型之间转换，或者您可以将A转换为B而不是B转换为A.依此类推。 In some languages there are also distinctions like whether it is const or not. 在某些语言中，也存在区别，例如它是否为常数。

Compilers and runtimes carry around a large amount of information all of which adds up to the item's type. 编译器和运行时会携带大量信息，所有这些信息都会增加项目的类型。 The physicality of how many bytes it takes up (or anything else you could plausibly claim to be tangible) is really not the point. 它占用多少字节的物理性（或者你可以合理地声称有形的任何其他东西）实际上并不重要。