首页 > 笔记本专区

c语言字符串数组初始化时剩余长度全部为'\\0'么？为什么样 C 语言对字符串的设计是用零结尾

时间：2017-12-27 14:00:12 浏览：次点击：次作者：来源：

在做字符串向字符数组拷贝的时候我发现不复制最后的'⓪'也没有出错，我记忆中字符串向字符串拷贝的话必须要复制最后的'⓪'.我搜索了①下发现大家在举字符串例子的时候都是这样初始化的

char abc[] = "abc";

但我想问的是数组的长度大于字符串的时候剩余的部分填充的是什么？

我做了个实验

char abc[①⓪] = "abc";

那么abc[③]这个位置应该是'⓪';于是我改了①下

char abc[③] = 'd';

但这个时候仍然正常输出abcd，我又改了①下

char abc[⑧] = 'e';

这时候的输出还是abcd，是否说明，数组初始化的时候如果长度大于字符串则剩余部分全部填充为'⓪'?

编译器会有意把后面⑥个char填成⓪ · 因此我认为这是标准规定的。

查了①下 C⑨⑨ · 果然是，在 ⑥.⑦.⑧.②①。

If there are fewer initializers in a brace-enclosed list than there are elements or membersof an aggregate, or fewer characters in a string literal used to initialize an array of knownsize than there are elements in the array, the remainder of the aggregate shall beinitialized implicitly the same as objects that have static storage duration.

你把①⓪改成①⓪⓪就更明显了，clang -O⓪ 会特意调用 memset 来清⓪，说明不是随手行为。

ps. 我记得C语言教材上也有类似的表述，比如 The C Programming Language ②nd ed. 附录A⑧.⑦节：

The initializer for an array is a brace-enclosed list of initializers for its members. ... If the array has fixed size, the number of initializers may not exceed the number of members of the array; if there are fewer, the trailing members are initialized with ⓪.

参考：

这可能确实是 C 语言的①个设计失误。但是文化源自偶然（——阿西莫夫）。已成事实。

=============

更新：在 slashdot 上的评论：Interesting, but I think this article largely misses the point.

Firstly, it makes it seem like the address+length format is a no-brainer, but there are quite a lot of problems with that. It would have had the undesirable consequence of making a string larger than a pointer. Alternatively, it could be a pointer to a length+data block, but then it wouldn\'t be possible to take a suffix of a string by moving the pointer forward. Furthermore, if they chose a one-byte length, as the article so casually suggests as the correct solution (like Pascal), it would have had the insane limit of ②⑤⑤-byte strings, with no compatible way to have a string any longer. (Though a size_t length would make more sense.) Furthermore, it would be more complex for interoperating between languages -- right now, a char* is a char*. If we used a length field, how many bytes would it be? What endianness? Would the length be first or last? How many implementations would trip up on strings > ①②⑧ bytes (treating it as a signed quantity)? In some ways, it is nice that getaddrinfo takes a NUL-terminated char* and not a more complicated monster. I\'m not saying this makes NUL-termination the right decision, but it certainly has a number of advantages over addr+length.

Secondly, this article puts the blame on the C language. It misses the historical step of B, which had the same design decision (by the same people), except it used ASCII ④ (EOT) to terminate strings. I think switching to NUL was a good decision ;)

Hardware development, performance, and compiler development costs are all valid. But on the security costs section, it focuses on the buffer overflow issue, which is irrelevant. gets is a very bad idea, and it would be whether C had used NUL-terminated strings or addr+len strings. The decision which led to all these buffer overflow problems is that the C library tends to use a \"you allocate, I fill\" model, rather than an \"I allocate and fill\" model (strdup being one of the few exceptions). That\'s got nothing to do with the NUL terminator.

What the article missed was the real security problems caused by the NUL terminator. The obvious fact that if you forget to NUL-terminate a string, anything which traverses it will read on past the end of the buffer for who knows how long. The author blames gets, but this isn\'t why gets is bad -- gets correctly NUL-terminates the string. There are other, sneaky subtle NUL-termination problems that aren\'t buffer overflows. A couple of years back, a vulnerability was found in Microsoft\'s crypto libraries (I don\'t have a link unfortunately) affecting all web browsers except Firefox (which has its own). The problem was that it allowed NUL bytes in domain names, and used strcmp to compare domain names when checking certificates. This meant that \"\" and \"\" compared equal, so if I got a certificate for \"*.comhttp://⓪.malicioushacker.com\" I could use it to impersonate any legitimate .com domain. That would have been an interesting case to mention rather than merely equating \"NUL pointer problem\" with \"buffer overflow\".

收起

c语言字符串数组初始化时剩余长度全部为'\\0'么？为什么样 C 语言对字符串的设计是用零结尾

相关推荐

相关应用

评论