简体   繁体   English

何时使用 std::size_t?

[英]When to use std::size_t?

I'm just wondering should I use std::size_t for loops and stuff instead of int ?我只是想知道我应该使用std::size_t循环和东西而不是int吗? For instance:例如:

#include <cstdint>

int main()
{
    for (std::size_t i = 0; i < 10; ++i) {
        // std::size_t OK here? Or should I use, say, unsigned int instead?
    }
}

In general, what is the best practice regarding when to use std::size_t ?一般来说,关于何时使用std::size_t的最佳实践是什么?

A good rule of thumb is for anything that you need to compare in the loop condition against something that is naturally a std::size_t itself.一个好的经验法则是,您需要在循环条件中与自然是std::size_t本身的东西进行比较。

std::size_t is the type of any sizeof expression and as is guaranteed to be able to express the maximum size of any object (including any array) in C++. std::size_t是任何sizeof表达式的类型,并且保证能够在 C++ 中表达任何对象(包括任何数组)的最大大小。 By extension it is also guaranteed to be big enough for any array index so it is a natural type for a loop by index over an array.通过扩展,它也保证对于任何数组索引都足够大,因此它是数组索引循环的自然类型。

If you are just counting up to a number then it may be more natural to use either the type of the variable that holds that number or an int or unsigned int (if large enough) as these should be a natural size for the machine.如果您只是计算一个数字,那么使用保存该数字的变量类型或intunsigned int (如果足够大)可能更自然,因为这些应该是机器的自然大小。

size_t is the result type of the sizeof operator. size_tsizeof运算符的结果类型。

Use size_t for variables that model size or index in an array.size_t用于对数组中的大小或索引进行建模的变量。 size_t conveys semantics: you immediately know it represents a size in bytes or an index, rather than just another integer. size_t传达语义:您立即知道它表示以字节或索引为单位的大小,而不仅仅是另一个整数。

Also, using size_t to represent a size in bytes helps making the code portable.此外,使用size_t表示以字节为单位的大小有助于使代码具有可移植性。

The size_t type is meant to specify the size of something so it's natural to use it, for example, getting the length of a string and then processing each character: size_t类型旨在指定某物的大小,因此使用它是很自然的,例如,获取字符串的长度,然后处理每个字符:

for (size_t i = 0, max = strlen (str); i < max; i++)
    doSomethingWith (str[i]);

You do have to watch out for boundary conditions of course, since it's an unsigned type.当然,必须注意边界条件,因为它是无符号类型。 The boundary at the top end is not usually that important since the maximum is usually large (though it is possible to get there).顶端的边界通常不是那么重要,因为最大值通常很大(尽管可能到达那里)。 Most people just use an int for that sort of thing because they rarely have structures or arrays that get big enough to exceed the capacity of that int .大多数人只是将int用于此类事情,因为他们很少有结构或数组足够大以超过该int的容量。

But watch out for things like:但请注意以下事项:

for (size_t i = strlen (str) - 1; i >= 0; i--)

which will cause an infinite loop due to the wrapping behaviour of unsigned values (although I've seen compilers warn against this).由于无符号值的包装行为,这将导致无限循环(尽管我已经看到编译器对此提出警告)。 This can also be alleviated by the (slightly harder to understand but at least immune to wrapping problems):这也可以通过(稍微难以理解但至少不受包装问题的影响)来缓解:

for (size_t i = strlen (str); i-- > 0; )

By shifting the decrement into a post-check side-effect of the continuation condition, this does the check for continuation on the value before decrement, but still uses the decremented value inside the loop (which is why the loop runs from len .. 1 rather than len-1 .. 0 ).通过将减量转换为继续条件的检查后副作用,这会检查减量之前的值是否继续,但仍使用循环内的减量值(这就是循环从len .. 1运行的原因而不是len-1 .. 0 )。

By definition, size_t is the result of the sizeof operator.根据定义, size_tsizeof运算符的结果。 size_t was created to refer to sizes.创建size_t是为了引用尺寸。

The number of times you do something (10, in your example) is not about sizes, so why use size_t ?您做某事的次数(在您的示例中为 10 次)与大小无关,那么为什么要使用size_t呢? int , or unsigned int , should be ok. intunsigned int应该没问题。

Of course it is also relevant what you do with i inside the loop.当然,您在循环中对i所做的操作也很重要。 If you pass it to a function which takes an unsigned int , for example, pick unsigned int .例如,如果将它传递给采用unsigned int的函数,请选择unsigned int

In any case, I recommend to avoid implicit type conversions.无论如何,我建议避免隐式类型转换。 Make all type conversions explicit. 使所有类型转换显式。

short answer:简短的回答:

almost never几乎从不

long answer:长答案:

Whenever you need to have a vector of char bigger that 2gb on a 32 bit system.每当您需要在 32 位系统上拥有大于 2gb 的 char 向量时。 In every other use case, using a signed type is much safer than using an unsigned type.在所有其他用例中,使用有符号类型比使用无符号类型更安全。

example:例子:

std::vector<A> data;
[...]
// calculate the index that should be used;
size_t i = calc_index(param1, param2);
// doing calculations close to the underflow of an integer is already dangerous

// do some bounds checking
if( i - 1 < 0 ) {
    // always false, because 0-1 on unsigned creates an underflow
    return LEFT_BORDER;
} else if( i >= data.size() - 1 ) {
    // if i already had an underflow, this becomes true
    return RIGHT_BORDER;
}

// now you have a bug that is very hard to track, because you never 
// get an exception or anything anymore, to detect that you actually 
// return the false border case.

return calc_something(data[i-1], data[i], data[i+1]);

The signed equivalent of size_t is ptrdiff_t , not int . size_t的有符号等效项是ptrdiff_t ,而不是int But using int is still much better in most cases than size_t.但是在大多数情况下使用int仍然比 size_t 好得多。 ptrdiff_t is long on 32 and 64 bit systems. ptrdiff_t在 32 和 64 位系统上很long

This means that you always have to convert to and from size_t whenever you interact with a std::containers, which not very beautiful.这意味着每当您与 std::containers 交互时,您总是必须在 size_t 之间进行转换,这不是很漂亮。 But on a going native conference the authors of c++ mentioned that designing std::vector with an unsigned size_t was a mistake.但是在一个正在进行的本地会议上,c++ 的作者提到用无符号 size_t 设计 std::vector 是一个错误。

If your compiler gives you warnings on implicit conversions from ptrdiff_t to size_t, you can make it explicit with constructor syntax:如果您的编译器在从 ptrdiff_t 到 size_t 的隐式转换时向您发出警告,您可以使用构造函数语法使其显式:

calc_something(data[size_t(i-1)], data[size_t(i)], data[size_t(i+1)]);

if just want to iterate a collection, without bounds cheking, use range based for:如果只是想迭代一个集合,没有边界检查,使用基于范围的:

for(const auto& d : data) {
    [...]
}

here some words from Bjarne Stroustrup (C++ author) at going native这里是 Bjarne Stroustrup(C++ 作者)在生化时的一些话

For some people this signed/unsigned design error in the STL is reason enough, to not use the std::vector, but instead an own implementation.对于某些人来说,STL 中的这种有符号/无符号设计错误是足够的理由,不使用 std::vector,而是使用自己的实现。

size_t是一种非常易读的方式来指定项目的大小维度 - 字符串的长度、指针占用的字节数等。它还可以跨平台移植 - 你会发现 64 位和 32 位在系统函数和size_t - unsigned int可能不会做的事情(例如,什么时候应该使用unsigned long

Use std::size_t for indexing/counting C-style arrays.使用 std::size_t 对 C 样式数组进行索引/计数。

For STL containers, you'll have (for example) vector<int>::size_type , which should be used for indexing and counting vector elements.对于 STL 容器,您将拥有(例如) vector<int>::size_type ,它应该用于索引和计数向量元素。

In practice, they are usually both unsigned ints, but it isn't guaranteed, especially when using custom allocators.在实践中,它们通常都是无符号整数,但不能保证,尤其是在使用自定义分配器时。

Soon most computers will be 64-bit architectures with 64-bit OS:es running programs operating on containers of billions of elements.很快,大多数计算机将采用 64 位体系结构和 64 位操作系统:运行在包含数十亿个元素的容器上运行的程序。 Then you must use size_t instead of int as loop index, otherwise your index will wrap around at the 2^32:th element, on both 32- and 64-bit systems.然后您必须使用size_t而不是int作为循环索引,否则您的索引将在 32 位和 64 位系统上的第 2^32:th 元素处环绕

Prepare for the future!为未来做准备!

size_t is returned by various libraries to indicate that the size of that container is non-zero. size_t 由各种库返回以指示该容器的大小非零。 You use it when you get once back :0当你回来时使用它:0

However, in the your example above looping on a size_t is a potential bug.但是,在上面的示例中,在 size_t 上循环是一个潜在的错误。 Consider the following:考虑以下:

for (size_t i = thing.size(); i >= 0; --i) {
  // this will never terminate because size_t is a typedef for
  // unsigned int which can not be negative by definition
  // therefore i will always be >= 0
  printf("the never ending story. la la la la");
}

the use of unsigned integers has the potential to create these types of subtle issues.使用无符号整数有可能产生这些类型的微妙问题。 Therefore imho I prefer to use size_t only when I interact with containers/types that require it.因此恕我直言,我更喜欢仅在与需要它的容器/类型交互时才使用 size_t 。

When using size_t be careful with the following expression使用 size_t 时请注意以下表达式

size_t i = containner.find("mytoken");
size_t x = 99;
if (i-x>-1 && i+x < containner.size()) {
    cout << containner[i-x] << " " << containner[i+x] << endl;
}

You will get false in the if expression regardless of what value you have for x.无论您对 x 有什么值,您都将在 if 表达式中得到错误。 It took me several days to realize this (the code is so simple that I did not do unit test), although it only take a few minutes to figure the source of the problem.我花了几天的时间才意识到这一点(代码太简单了,我没有做单元测试),尽管只需要几分钟就可以找出问题的根源。 Not sure it is better to do a cast or use zero.不确定进行强制转换或使用零会更好。

if ((int)(i-x) > -1 or (i-x) >= 0)

Both ways should work.两种方式都应该有效。 Here is my test run这是我的测试运行

size_t i = 5;
cerr << "i-7=" << i-7 << " (int)(i-7)=" << (int)(i-7) << endl;

The output: i-7=18446744073709551614 (int)(i-7)=-2输出:i-7=18446744073709551614 (int)(i-7)=-2

I would like other's comments.我想听听别人的意见。

It is often better not to use size_t in a loop.通常最好不要在循环中使用 size_t。 For example,例如,

vector<int> a = {1,2,3,4};
for (size_t i=0; i<a.size(); i++) {
    std::cout << a[i] << std::endl;
}
size_t n = a.size();
for (size_t i=n-1; i>=0; i--) {
    std::cout << a[i] << std::endl;
}

The first loop is ok.第一个循环没问题。 But for the second loop:但是对于第二个循环:
When i=0, the result of i-- will be ULLONG_MAX (assuming size_t = unsigned long long), which is not what you want in a loop.当 i=0 时, i-- 的结果将是 ULLONG_MAX(假设 size_t = unsigned long long),这不是您在循环中想要的。
Moreover, if a is empty then n=0 and n-1=ULLONG_MAX which is not good either.此外,如果 a 为空,则 n=0 且 n-1=ULLONG_MAX 也不好。

size_t is an unsigned type that can hold maximum integer value for your architecture, so it is protected from integer overflows due to sign (signed int 0x7FFFFFFF incremented by 1 will give you -1) or short size (unsigned short int 0xFFFF incremented by 1 will give you 0). size_t是一种无符号类型,可以为您的体系结构保存最大整数值,因此可以防止由于符号(有符号 int 0x7FFFFFFF递增 1 将给您 -1)或短大小(无符号短整数 0xFFFF 递增 1 将给你0)。

It is mainly used in array indexing/loops/address arithmetic and so on.主要用于数组索引/循环/地址运算等。 Functions like memset() and alike accept size_t only, because theoretically you may have a block of memory of size 2^32-1 (on 32bit platform). memset()之类的函数只接受size_t ,因为理论上你可能有一块大小为2^32-1的内存(在 32 位平台上)。

For such simple loops don't bother and use just int.对于这样简单的循环,不要打扰,只需使用 int。

I have been struggling myself with understanding what and when to use it.我一直在努力理解什么以及何时使用它。 But size_t is just an unsigned integral data type which is defined in various header files such as <stddef.h>, <stdio.h>, <stdlib.h>, <string.h>, <time.h>, <wchar.h> etc.但是 size_t 只是一个无符号整数数据类型,它在各种头文件中定义,例如<stddef.h>, <stdio.h>, <stdlib.h>, <string.h>, <time.h>, <wchar.h>

It is used to represent the size of objects in bytes hence it's used as the return type by the sizeof operator.它用于以字节为单位表示对象的大小,因此它被 sizeof 运算符用作返回类型。 The maximum permissible size is dependent on the compiler;最大允许大小取决于编译器; if the compiler is 32 bit then it is simply a typedef (alias) for unsigned int but if the compiler is 64 bit then it would be a typedef for unsigned long long.如果编译器是 32 位,那么它只是 unsigned int 的 typedef(别名),但如果编译器是 64 位,那么它将是 unsigned long long 的 typedef。 The size_t data type is never negative(excluding ssize_t) Therefore many C library functions like malloc, memcpy and strlen declare their arguments and return type as size_t . size_t 数据类型永远不会是负数(不包括 ssize_t),因此许多 C 库函数,如malloc, memcpy and strlen将它们的参数和返回类型声明为size_t

/ Declaration of various standard library functions.
  
// Here argument of 'n' refers to maximum blocks that can be
// allocated which is guaranteed to be non-negative.
void *malloc(size_t n);
  
// While copying 'n' bytes from 's2' to 's1'
// n must be non-negative integer.
void *memcpy(void *s1, void const *s2, size_t n);
  
// the size of any string or `std::vector<char> st;` will always be at least 0.
size_t strlen(char const *s);

size_t or any unsigned type might be seen used as loop variable as loop variables are typically greater than or equal to 0. size_t或任何无符号类型可能被视为循环变量,因为循环变量通常大于或等于 0。

size_t is an unsigned integral type, that can represent the largest integer on you system. size_t 是无符号整数类型,可以表示系统上的最大整数。 Only use it if you need very large arrays,matrices etc.仅当您需要非常大的数组、矩阵等时才使用它。

Some functions return an size_t and your compiler will warn you if you try to do comparisons.一些函数返回一个 size_t,如果您尝试进行比较,您的编译器会警告您。

Avoid that by using a the appropriate signed/unsigned datatype or simply typecast for a fast hack.通过使用适当的有符号/无符号数据类型或简单地进行类型转换以进行快速破解来避免这种情况。

size_t is unsigned int. size_t 是无符号整数。 so whenever you want unsigned int you can use it.所以每当你想要 unsigned int 时,你都可以使用它。

I use it when i want to specify size of the array , counter ect...当我想指定数组的大小时使用它,计数器等...

void * operator new (size_t size); is a good use of it.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM