简体   繁体   English

C#中可空类型的替代方案

[英]Alternatives to nullable types in C#

I am writing algorithms that work on series of numeric data, where sometimes, a value in the series needs to be null. 我正在编写适用于一系列数字数据的算法,有时,系列中的值必须为null。 However, because this application is performance critical, I have avoided the use of nullable types. 但是,由于此应用程序对性能至关重要,因此我避免使用可空类型。 I have perf tested the algorithms to specifically compare the performance of using nullable types vs non-nullable types, and in the best case scenario nullable types are 2x slower, but often far worse. 我已经对这些算法进行了性能测试,专门比较了使用可空类型和非可空类型的性能,在最好的情况下,可空类型的速度慢了2倍,但往往差得多。

The data type most often used is double, and currently the chosen alternative to null is double.NaN. 最常用的数据类型是double,目前选择null的替代方法是double.NaN。 However I understand this is not the exact intended usage for the NaN value, so am unsure whether there are any issues with this I cannot foresee and what the best practise would be. 但是我知道这不是NaN值的确切用途,所以我不确定是否有任何问题,我无法预见,最佳做法是什么。

I am interested in finding out what the best null alternatives are for the following data types in particular: double/float, decimal, DateTime, int/long (although others are more than welcome) 我有兴趣找出以下数据类型的最佳空替代品:double / float,decimal,DateTime,int / long(尽管其他数据类型非常受欢迎)

Edit: I think I need to clarify my requirements about performance. 编辑:我想我需要澄清我对性能的要求。 Gigs of numerical data are processed through these algorithms at a time which takes several hours. 数字数据的演出通过这些算法在几个小时的时间内处理。 Therefore, although the difference between eg 10ms or 20ms is usually insignificant, in this scenario it really does makes a significant impact to the time taken. 因此,虽然例如10毫秒或20毫秒之间的差异通常是微不足道的,但在这种情况下,它确实对所花费的时间产生了重大影响。

Well, if you've ruled out Nullable<T> , you are left with domain values - ie a magic number that you treat as null. 好吧,如果你已经排除了Nullable<T> ,那么你将留下域名值 - 即你认为是空值的幻数。 While this isn't ideal , it isn't uncommon either - for example, a lot of the main framework code treats DateTime.MinValue the same as null. 虽然这并不理想 ,但这种情况并不少见 - 例如,许多主框架代码将DateTime.MinValue视为null。 This at least moves the damage far away from common values... 这至少使损害远离共同的价值......

edit to highlight only where no NaN 编辑以仅突出显示没有NaN的位置

So where there is no NaN , maybe use .MinValue - but just remember what evils happen if you accidentally use that same value meaning the same number... 所以在没有NaN ,也许可以使用.MinValue - 但是请记住,如果你不小心使用相同的值意味着相同的数字,会发生什么样的弊端......

Obviously for unsigned data you'll need .MaxValue (avoid zero!!!). 显然,对于未签名的数据,您需要.MaxValue (避免零!!!)。

Personally, I'd try to use Nullable<T> as expressing my intent more safely... there may be ways to optimise your Nullable<T> code, perhaps. 就个人而言,我尝试使用Nullable<T>来更安全地表达我的意图......也许有可能有方法来优化你的Nullable<T>代码。 And also - by the time you've checked for the magic number in all the places you need to, perhaps it won't be much faster than Nullable<T> ? 而且 - 当你在所有需要的位置检查神奇数字时,也许它不会比Nullable<T>快得多?

I somewhat disagree with Gravell on this specific edge case: a Null-ed variable is considered 'not defined', it doesn't have a value. 在这个特定的边缘情况下,我有点不同意Gravell:Null-ed变量被认为是“未定义”,它没有值。 So whatever is used to signal that is OK: even magic numbers, but with magic numbers you have to take into account that a magic number will always haunt you in the future when it becomes a 'valid' value all of a sudden. 因此,无论用什么信号都可以:即使是魔术数字,但是你必须考虑到一个神奇的数字在将来突然成为一个“有效”值时会困扰你。 With Double.NaN you don't have to be afraid for that: it's never going to become a valid double. 使用Double.NaN,您不必为此担心:它永远不会成为有效的双倍。 Though, you have to consider that NaN in the sense of the sequence of doubles can only be used as a marker for 'not defined', you can't use it as an error code in the sequences as well, obviously. 但是,您必须考虑到双精度序列意义上的NaN只能用作“未定义”的标记,显然,您也不能将它用作序列中的错误代码。

So whatever is used to mark 'undefined': it has to be clear in the context of the set of values that that specific value is considered the value for 'undefined' AND that won't change in the future. 因此无论用什么标记'未定义':必须在值集的上下文中清楚地表明该特定值被认为是'未定义'的值,并且将来不会改变。

If Nullable give you too much trouble, use NaN, or whatever else, as long as you consider the consequences: the value chosen represents 'undefined' and that will stay. 如果Nullable给你带来太多麻烦,可以使用NaN或其他任何东西,只要你考虑后果:选择的值代表'undefined'并且将保留。

I am working on a large project that uses NaN as a null value. 我正在开发一个使用NaN作为null值的大型项目。 I am not entirely comfortable with it - for similar reasons as yours: not knowing what can go wrong. 我对此并不十分满意 - 出于与你类似的原因:不知道会出现什么问题。 We haven't encountered any real problems so far, but be aware of the following: 到目前为止,我们还没有遇到任何实际问题,但请注意以下事项:

NaN arithmetics - While, most of the time, "NaN promotion" is a good thing, it might not always be what you expect. NaN算术 - 虽然大多数时候,“NaN促销”是一件好事,但它可能并不总是你所期望的。

Comparison - Comparison of values gets rather expensive, if you want NaN's to compare equal. 比较 - 如果你想让NaN比较相等,价值的比较会变得相当昂贵。 Now, testing floats for equality isn't simple anyway, but ordering (a < b) can get really ugly, because nan's sometimes need to be smaller, sometimes larger than normal values. 现在,测试浮点数是否相等并不简单,但排序(a <b)可能会变得非常难看,因为nan有时需要更小,有时需要大于正常值。

Code Infection - I see lots of arithmetic code that requires specific handling of NaN's to be correct. 代码感染 - 我看到许多算术代码需要特定处理NaN才是正确的。 So you end up with "functions that accept NaN's" and "functions that don't" for performance reasons. 因此,出于性能原因,您最终会得到“接受NaN的功能”和“不接受NaN的功能”。

Other non-finites NaN is nto the only non-finite value. 其他非有限的 NaN是唯一的非有限值。 Should be kept in mind... 应该牢记......

Floating Point Exceptions are not a problem when disabled. 禁用时, 浮点异常不是问题。 Until someone enables them. 直到某人启用它们。 True story: Static intialization of a NaN in an ActiveX control. 真实故事:ActiveX控件中NaN的静态初始化。 Doesn't sound scary, until you change installation to use InnoSetup, which uses a Pascal/Delphi(?) core, which has FPU exceptions enabled by default. 听起来不可怕,直到你改变安装使用InnoSetup,它使用Pascal / Delphi(?)核心,默认情况下启用了FPU异常。 Took me a while to figure out. 我花了一段时间才弄明白。

So, all in all, nothing serious, though I'd prefer not to have to consider NaNs that often. 所以,总而言之,没有什么是严重的,尽管我不想经常考虑NaNs。


I'd use Nullable types as often as possible, unless they are (proven to be) performance / ressource constraints. 我会尽可能经常使用Nullable类型,除非它们(已证明是)性能/资源约束。 One case could be large vectors / matrices with occasional NaNs, or large sets of named individual values where the default NaN behavior is correct . 一种情况可能是具有偶然NaN的大型矢量/矩阵,或者大型命名的单个值,其中默认的NaN行为是正确的


Alternatively, you can use an index vector for vectors and matrices, standard "sparse matrix" implementations, or a separate bool/bit vector. 或者,您可以使用矢量和矩阵的索引向量,标准“稀疏矩阵”实现或单独的bool /位向量。

Partial answer: 部分答案:

Float and Double provide NaN (Not a Number). Float和Double提供NaN(非数字)。 NaN is a little tricky since, per spec, NaN != NaN. NaN有点棘手,因为根据规格,NaN!= NaN。 If you want to know if a number is NaN, you'll need to use Double.IsNaN(). 如果你想知道一个数字是否是NaN,你需要使用Double.IsNaN()。

See also Binary floating point and .NET . 另请参见二进制浮点和.NET

One can avoid some of the performance degradation associated with Nullable<T> by defining your own structure 通过定义自己的结构,可以避免与Nullable<T>相关的一些性能下降

struct MaybeValid<T>
{
    public bool isValue;
    public T Value;
}

If desired, one may define constructor, or a conversion operator from T to MaybeValid<T> , etc. but overuse of such things may yield sub-optimal performance. 如果需要,可以定义构造函数或从TMaybeValid<T>等的转换运算符,但是过度使用这些东西可能会产生次优性能。 Exposed-field structs can be efficient if one avoids unnecessary data copying. 如果避免不必要的数据复制,暴露场结构可以是有效的。 Some people may frown upon the notion of exposed fields, but they can be massively more efficient that properties. 有些人可能会对外露场的概念不以为然,但它们可以大大提高性能。 If a function that will return a T would need to have a variable of type T to hold its return value, using a MaybeValid<Foo> simply increases by 4 the size of thing to be returned. 如果将返回的函数T将需要具有类型的变量T保持其返回值,使用MaybeValid<Foo>简单地通过增加4要返回的事情的大小。 By contrast, using a Nullable<Foo> would require that the function first compute the Foo and then pass a copy of it to the constructor for the Nullable<Foo> . 相比之下,使用Nullable<Foo>将要求函数首先计算Foo ,然后将其副本传递给Nullable<Foo>的构造函数。 Further, returning a Nullable<Foo> will require that any code that wants to use the returned value must make at least one extra copy to a storage location (variable or temporary) of type Foo before it can do anything useful with it. 此外,返回Nullable<Foo>将要求任何想要使用返回值的代码必须至少为Foo类型的存储位置(变量或临时)创建一个额外的副本,然后才能对其执行任何有用的操作。 By contrast, code can use the Value field of a variable of type Foo about as efficiently as any other variable. 相比之下,代码可以使用Foo类型变量的Value字段与任何其他变量一样高效。

Maybe the significant performance decrease happens when calling one of Nullable's members or properties (boxing). 当调用Nullable的一个成员或属性(拳击)时,可能会发生显着的性能下降。

Try to use a struct with the double + a boolean telling whether the value is specified or not. 尝试使用带有double +布尔值的结构来告知是否指定了值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM