1. Changelog
1.1. Revision 5 - October 1st, 2024
-
Additional minor wording improvements.
-
Revision 5.1 (Homework+):
-
We were supposed to have
on everything as from a CD Ballot comment arguing for this on all the other[[ unsequenced ]]
functions. We added them to the rotate functions but they were not supposed to be on the< stdbit . h >
andstdc_memreverse8
/stdc_load *
functions, as those touch memory that may exist in places where synchronization is required (and also can type-pun because they are accessed VIAstdc_store *
). The functions which take and return values directly can have it, though.unsigned char -
The "
and zero is allowed" paper seems to necessitate changes to hownullptr
for astatic n
is handled. Right now this is not necessarily allowed by the language (n == 0
is not allowed for anynullptr
marking, as far as we’re aware) so a paper should be written to change to allowstatic n
withstatic n
to say "aN == 0
is fine here".nullptr
-
1.2. Revision 4 - September 16th, 2024
-
Additional minor wording improvements.
1.3. Revision 3 - September 4th, 2024
-
Minor typo and wording fixes in preparation for C2y/C3a.
-
Additional updates for Minneapolis meeting for wording.
1.4. Revision 2 - February 7th, 2023
-
The wrong paper was sent in for Revision 1, so this paper was re-published as revision 2.
1.5. Revision 1 - January 31st, 2023
-
Change specification for
/stdc_rotate_left
to account for potential integer promotion issues.stdc_rotate_right -
Add footnote about preventing promotion issues from using
/unsigned char
being promoted tounsigned short
.int -
Make sure constraints on
are applied to the value, not the type.generic_count_type
-
1.6. Revision 0 - December 14th, 2022
-
Initial release. ✨
-
Removed all of the components from [n3022] that were accepted, and migrated all of the non-accepted parts to this in response to feedback from many individuals.
-
Changed the conditions on the 8-bit memory reversal functions to be
, rather than a multiple of 8. This excludes many DSP, FPGA, ASIC, and other specialized processors and hardware and implementations on that hardware.CHAR_BIT == 8 -
Users who want this functionality on those platforms should begin to converge on a design that works for them, preferably in the fashion as described by this paper and its predecessors.
-
That designs should be implemented by those users on those platforms and brought forward for standardization at another point in time.
-
2. Polls
No polls except for the ones noted in the older version of this paper for C23 was taken ([n3022]).
3. Introduction & Motivation
There is a lot of proposals and work that goes into figuring out the "byte order" of integer values that occupy more than 1 octet (8 bits). This is nominally important when dealing with data that comes over network interfaces and is read from files, where the data can be laid out in various orders of octets for 2-, 3-, 4-, 6-, or 8-tuples of octets. The most well-known endian structures on existing architectures include "Big Endian", where the least significant bit comes "last" and is featured prominently in network protocols and file protocols; and, "Little Endian", where the least significant bit comes "first" and is typically the orientation of data for processor and user architectures most prevalent today.
In more legacy architectures (Honeywell, PDP), there also exists other orientations called "mixed" or "middle" endian. The uses of such endianness are of dubious benefit and are vanishingly rare amongst commodity and readily available hardware today, but nevertheless still represent an applicable ordering of octets.
In other related programming interfaces, the C functions/macros
("network to host") and
("host to network") (usually suffixed with
or
or others to specify which native data type it was being performed on such as
) were used to change the byte order of a value ([ntohl]). This became such a common operation that many compilers - among them Clang and GCC - optimized the code down to use an intrinsic
/
(for MSVC, for Clang, and for GCC). These intrinsics often compiled into binary code representing cheap, simple, and fast byte swapping instructions available on many CPUs for 16, 32, 64, and sometimes 128 bit numbers. The
/
intrinsics were used as the fundamental underpinning for the
and
functions, where a check for the translation-time endianness of the program determined if the byte order would be flipped or not.
This proposal puts forth the fundamentals that make a homegrown implementation of
,
, and other endianness-based functions possible in Standard C code. It also addresses many of the compiler-based intrinsics found to generate efficient machine code, with a few simpler utilities layered on top of it.
4. Design
This is a library addition. It is meant to expose both macros and functions that can be used for translation time-suitable checks. It provides a way to check endianness within the preprocessor, and gives definitive names that allow for knowing whether the endianness is big, little, or neither. We state big, little, or neither, because there is no settled-upon name for the legacy endianness of "middle" or "mixed", nor any agreed upon ordering for such a "middle" or "mixed" endianness between architectures. This is not the case for big endian or little endian, where one is simply the reverse of the other, always, in every case, across architectures, file protocols, and network specifications.
The next part of the design is functions for working with groupings of 8 bits. They are meant to communicate with network or file protocols and formats that have become ubiquitous in computing for the last 30 years.
Finally, this design adds new left/right bit rotation functions, all within the
header.
4.1. Preliminary: Why the stdc_
prefix?
We use the
prefix for these functions so that we do not have to struggle with taking common words away from the end user. Because we now have 31 bytes of linker name significance, we can afford to have some sort of prefix rather than spend all of our time carving out reserved words or header-specific extensions. This will let us have good names that very clearly map to industry practice, without replacing industry code or being forced to be compatible with existing code that already has taken the name with sometimes-conflicting argument conventions.
4.2. Charter: unsigned char const ptr [ static sizeof ( uintN_t )]
and More?
There are 2 choices on how to represent sized pointer arguments. The first is a
convention for functions arguments in this proposal. The second is an
/
convention.
To start, we still put any
+
arguments in the proper "size first, pointer second" configuration so that implementation extensions which allow
can exist no matter what choice is made here. That part does not change. The
argument convention mean that pointers to structures, or similar, can be passed to these functions without needing a cast. This represents the totality of the ease of use argument. The
argument convention can produce both better compile-time safety and articulate requirements using purely the function declaration, without needing to look up prose from the C Standard or implementation documentation. The cost is that any use of the function will require a cast in strictly conforming code.
One of the tipping arguments in favor of our choice of
is that
can be dangerous, especially since we still do not have a
constant in the language and
can be used for both the size and the pointer argument. (Which is, very sadly, an actual bug that happens in existing code. Especially when users mix
and
calls and use the wrong
argument because of writing one and meaning the other, and copying values over a large part of their 0-pointer in their low-level driver code.) Using an
(or its statically-sized array function argument form) means that usage of the functions below would require explicit casting on the part of the user. This is, in fact, the way it is presented in [portable-endianness]: as far as existing practice is concerned, users of the code would rather cast and preserve safety rather than easily use something like
with the guts of their structure.
4.3. Signed vs. Unsigned
This paper has gone back and forth between signed vs. unsigned
offsets for the
/
instruction-based functions, and similarly the return types for many of the types which return purely a "count"-style value. Some important properties and facts follow:
-
All of the values returned from the functions here return conceptually unsigned/natural numbers (0 to potentially infinity, but not negative).
-
Some existing practice — e.g., C++ — has in recent years struggled against unsigned integers and tried to move towards signed. "Anything that is a count should just be an
", and similar guidance, grows from these functions and their types.int -
Conversely, some of C’s most fierce proponents use unsigned numbers almost exclusively until they have a proper justification for a signed number. For them,
/unsigned
is the default.size_t -
Whatever decision we make for one (e.g., for the second argument type of
orrotate_left
), we must make the identical decision for the return values of other functions (e.g.,rotate_right
/count_ones
, or similar). This will avoid sign conversion errors.popcount
This brings up a lot of questions about whether or not the functions here should be signed or unsigned. We will analyze this primarily from the standpoint of
and
, as that has the greatest impacts for the portability and semantics of the code presented here.
4.3.1. In Defense of Signed Integers
Let us consider a universe where
and friends take a signed
. This allows negative numbers to be passed to the count value for the rotate left. So, when
is called, it will call itself again with
; if (e.g.)
is called, it will call itself again with
. This is because, specification-wise, these functions are symmetric and cyclical in what they are meant to do. This matches the behavior from C++ and avoids undefined behavior for negative numbers, while also avoiding too-large shift errors from signed-to-unsigned conversions.
SDCC and several other compilers optimize for left and right shifts ([sdcc]). Texas Instruments and a handful of other specialist architectures also have "variable shift" instructions (SSHVL), which uses the sign of the argument to shift in one direction or the other ([ti-tms320c64x]). Having a
where the a negative number produces the opposite
cyclic operation (and vice-versa) means that both of these architectures can optimize efficiently in the case of hardcoded constants, and still produce well-defined behavior otherwise (
instructions just deploy a "negated by default" for the count value or not, depending on whether the
or
variant is called, other architectures propagate the information to shift left or right). This also follows existing practice with analogous functions from the C++ standard library.
To test code generation for using a signed integer and 2’s complement arithmetic, we used both C++ and C code samples. The C++ code is based on its accepted C++20 proposal, [p0463]. It’s a fairly accurate predictor of how notable compilers handle this kind of specification. The generated assembly for the compilers turns out to be optimal, so long as an implementation does not do a literal copy-paste of the specification’s text
Using non-constant offset, with generated x86_64 assembly:
#include <bit>extern unsigned int x ; extern int offset ; int main () { int l = std :: rotl ( x , offset ); int r = std :: rotr ( x , offset ); return l + r ; }
main : # @main mov eax , dword ptr [ rip + x ] mov cl , byte ptr [ rip + offset ] mov edx , eax rol edx , cl ror eax , cl add eax , edx ret
— And, using constant offset, with generated x86_64 assembly.
#include <bit>extern unsigned int x ; int main () { int l = std :: rotl ( x , -13 ); int r = std :: rotr ( x , -13 ); return l + r ; }
main : # @main mov eax , dword ptr [ rip + x ] mov ecx , eax rol ecx , 19 rol eax , 13 add eax , ecx ret
The generated code shows that the compiler understands the symmetric nature of the operations (from the constant code) and also shows that it will appropriately handle it even when it cannot see through constant values. The same can be shown when writing C code using a variety of styles, as shown here:
#if UNSIGNED_COUNT == 1 static unsigned int rotate_right ( unsigned int value , unsigned int count ); inline static unsigned int rotate_left ( unsigned int value , unsigned int count ) { unsigned int c = count % 32 ; return value >> c | value << ( 32 - c ); } inline static unsigned int rotate_right ( unsigned int value , unsigned int count ) { unsigned int c = count % 32 ; return value << c | value >> ( 32 - c ); } #elif TWOS_COMPLEMENT_CAST == 1 static unsigned int rotate_right ( unsigned int value , int count ); inline static unsigned int rotate_left ( unsigned int value , int count ) { unsigned int c = ( unsigned int ) count ; c = c % 32 ; return value >> c | value << ( 32 - c ); } inline static unsigned int rotate_right ( unsigned int value , int count ) { unsigned int c = ( unsigned int ) count ; c = c % 32 ; return value << c | value >> ( 32 - c ); } #else static unsigned int rotate_right ( unsigned int value , int count ); inline static unsigned int rotate_left ( unsigned int value , int count ) { int c = count % 32 ; if ( c < 0 ) { return rotate_right ( value , - c ); } return value >> c | value << ( 32 - c ); } inline static unsigned int rotate_right ( unsigned int value , int count ) { int c = count % 32 ; if ( c < 0 ) { return rotate_left ( value , - c ); } return value << c | value >> ( 32 - c ); } #endif #if UNSIGNED_COUNT == 1 unsigned int f ( unsigned int x , unsigned int offset ) { #else unsigned int f ( unsigned int x , int offset ) { #endif unsigned int l = rotate_left ( x , offset ); unsigned int r = rotate_right ( x , offset ); return l + r ; }
When using the various definitions, we find that the generated assembly for
is identically good using either the internal unsigned "two’s complement" cast, or by just using an unsigned number. Because of how poorly basic mathematics with unsigned numbers happens, we want to avoid a situation where negation or subtraction with unsigned qualities may yield undesirable results or promotions. Therefore, we used signed integers for both the offset count and the return values of these functions. Note that even in purely standard C, converting from a signed integer to an unsigned integer is perfectly well-defined behavior and does not raise any signals:
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
— §6.3.1.3, ¶2, ISO/IEC 9899:202x "C2x" Standard
Finally, the vast majority of existing practice takes the offset value in as a signed integer, and all the return types are also still some form of signed integer (unless the intrinsic is returning the exact same unsigned value put in that was manipulated). It also allows "plain math" being done on the type to naturally manifest negative numbers without accidentally having roundtripping or signed/unsigned conversion issues.
4.3.2. In Defense of Unsigned
Unsigned, on the other hand, has existing practice in hardware. While the intrinsics defined by glibc, C++'s standard libraries, and many more use signed integers, they are conceptually unsigned in their implementations. For example, for a 32-bit rotate, most standard libraries taking an
offset parameter perform:
count = count & 31 ;
This is critical for optimization here. Note that, if we were to provide a specification using a
offset, our specification has to very deliberately specify that we are going to negate the value and then pass it to the rotate of the opposite direction. This is, effectively, the same as obliterating the sign value and then calling the (symmetrical, cyclical) rotate: a 32-bit rotate therefore can get identical codegen as a signed variant by using the a bit
(NOT a normal
, as that preserves the sign as we do NOT want that). For an unsigned variant, no such trickery is necessary. Simply truncating the value using:
count = count % 32 ;
produces optimal code generation for most compilers, as they understand that bit
for hexadecimal
(decimal
) is identical to modulus of decimal
. This means that, by default, unsigned values are the same here. Abusing 2’s complement, one can save this by simply doing
and then perform modulus to get the same behavior as performing bit
with
. The "obvious" code is the efficient code here, as shown by the example of the assembly above.
Rust is one of the few languages that provides optimal versions of this code using
[rust-rotate_left]. Their code is optimal under both optimizations and a lack thereof, compared to C and C++ code which struggles with function call elision and similar. This may be aided in the future by having this paper put into the C standard, which would allow compilers to treat standard-specific rotate calls as intrinsics to be replaced with the instructions directly.
All in all, unsigned naturally optimizes better and matches the size type of C. It has no undefined behavior on overflow and produces better assembly in-general when it comes to bit intrinsics. Shifting behavior is also well-defined for unsigned types and not signed types, further compounding unsigned types as far better than their signed counterparts.
4.3.3. Which Does This Paper Choose?
Ultimately, this paper chooses unsigned integer types. The core of the reasoning is that this avoids undefined behavior in the specification and is also morally and spiritually what architectures expect (at least for 2’s complement). Since C23 embraces 2’s complement, using unsigned here would work more or less identically to a signed integer anyways and therefore it just does not matter enough to want to write an exception for
/
into the specification for handling e.g. the
parameter of
or
.
4.4. Generic 8-bit Memory Reverse and Exact-width 8-bit Memory Reverse
In order to accommodate both a wide variety of architectures but also support minimum-width integer optimized intrinsics, this proposal takes from the industry 2 forms of byteswap:
-
one generic
version which takes a pointer and the number of bytes to perform a reverse operation; and,mem_ -
a sequence of exact-width byte swapping instructions which (typically) map directly to intrinsics available in compilers and instructions in hardware.
These end up inhabiting the
header and have the following interface:
#include <stdbit.h>#include <limits.h>#include <stdint.h>#if (CHAR_BIT) == 0 void stdc_memreverse8 ( size_t n , unsigned char ptr [ static n ]); uintN_t stdc_memreverse8uN ( uintN_t value ) [[ unsequenced ]]; #endif
where
is one of the exact-width integer types such as
,
,
,
,
,
, and others. On most architectures, this matches the builtins (MSVC, Clang, GCC) and the result of compiler optimizations that produce instructions for many existing architectures as shown in the README of this portable endianness function implementation. We use the exact-width values for the
-suffixed functions because we expect that C compilers would want to lower the
call to existing practice of
instructions and compiler intrinsics. Using
reduces the ability to match these existing optimizations in the case where
functions are not defined. This makes for the following code on most implementations to not trigger a failure:
// On typical CHAR_BIT == 8, i686/x86_32/x86_64/ARM32/AARCH64 implementations const uint32_t normal = 0xAABBCCDDu ; const uint32_t reversed = 0xDDCCBBAAu ; assert ( stdc_memreverse8u32 ( normal ) == reversed );
4.4.1. But Memory Reverse Is Dangerous?
Byte swapping, by itself, is absolutely dangerous in terms of code portability. Users often program strictly for their own architecture when doing serialization, and do not take into consideration that their endianness can change. This means that, while
functions can compile down to intrinsics, those intrinsics get employed to change "little endian" to "big endian" without performing the necessary "am I already in the right endianness" check. Values that are already in the proper byte order for their target serialization get swapped, resulting in an incorrect byte order for the target network protocol, file format, or other binary serialization target.
The inclusion of the
header reduces this problem by giving access to the
macro definition, but does not fully eliminate it. This is why many Linux and BSDs include functions which directly transcribe from one endianness to another. This is why the Byte Order Fallacy has spread so far in Systems Programming communities, and why many create their own versions of this both in official widespread vendor code ([linux-endian]) and in more personal code used for specific distributions ([portable-endianness]). Thusly, this proposal includes some endianness functions, specified further below.
4.4.2. Vetting the Implementation / Algorithm for memreverse
In previous iterations of the paper, there were various off-by-one errors in transcribing the algorithm used to get the job done. Therefore, we more directly lifted the code for the algorithm from the example implementation here. To further prove that it works on "bytes" that may be larger than 8 bits, we also took the following steps.
-
Implemented it as a macro (as shown from the link above).
-
Use that macro implementation in the normal
-based implementation;unsigned char -
Use that macro implementation all unsigned integer types that are larger than
to test if it deals with sub-8-bit-groups correctly;unsigned char -
Apply
or equivalent flag to the compiler and test across platforms.- fno - strict - alias
All of the tests pass across the three major compilers (MSVC, GCC, and Clang) and across platforms (Windows, Linux, Mac OS). We find this to be compelling enough to ensure that the implementation and the algorithm in the wording is suitably correct. Nevertheless, any wording failures present here represent the authors' collective inability to properly serialize wording, not that an implementation is not possible or too inventive.
Regardless of how well it’s tested and checked, the paper still restricts the content to
bits, as we know that this has both wide implementation experience and is implementable.
4.5. stdc_load8_ *
/stdc_store8_ *
Endian-Aware Functions
Functions meant to transport bytes to a specific endianness need 3 pieces of information:
-
the sign of the input/output;
-
the byte order of the input; and,
-
the desired byte order of the output.
To represent any operation that goes from/to the byte order that things like
s are kept in, the Linux/BSD/etc. APIs use the term "host", represented by
. Every other operation is represented by explicitly naming it, particularly as
or
for "big endian" or "little endian". Again, because of the severe confusion that comes from what the exact byte order a "mixed endian" multi byte scalar is meant to be in, there seems not to exist any widely available practice regarding what to call a PDP/Honeywell endian configuration. Therefore, mixed/bi/middle-endian is not included in this proposal. It can be added at a later date if the community ever settles on a well-defined naming convention that can be shared between codebases, standards, and industries.
The specification for the endianness functions borrows from many different sources listed above, and is as follows:
#include <stdbit.h>#include <limits.h>#include <stdint.h>#if (CHAR_BIT) == 8 void stdc_store8_leuN ( uint_leastN_t value , unsigned char ptr [ static ( N / 8 )]); void stdc_store8_beuN ( uint_leastN_t value , unsigned char ptr [ static ( N / 8 )]); uint_leastN_t stdc_load8_leuN ( const unsigned char ptr [ static ( N / 8 )]); uint_leastN_t stdc_load8_beuN ( const unsigned char ptr [ static ( N / 8 )]); void stdc_store8_aligned_leuN ( uint_leastN_t value , unsigned char ptr [ static ( N / 8 )]); void stdc_store8_aligned_beuN ( uint_leastN_t value , unsigned char ptr [ static ( N / 8 )]); uint_leastN_t stdc_load8_aligned_leuN ( const unsigned char ptr [ static ( N / 8 )]); uint_leastN_t stdc_load8_aligned_beuN ( const unsigned char ptr [ static ( N / 8 )]); void stdc_store8_lesN ( int_leastN_t value , unsigned char ptr [ static ( N / 8 )]); void stdc_store8_besN ( int_leastN_t value , unsigned char ptr [ static ( N / 8 )]); int_leastN_t stdc_load8_lesN ( const unsigned char ptr [ static ( N / 8 )]); int_leastN_t stdc_load8_besN ( const unsigned char ptr [ static ( N / 8 )]); void stdc_store8_aligned_lesN ( int_leastN_t value , unsigned char ptr [ static ( N / 8 )]); void stdc_store8_aligned_besN ( int_leastN_t value , unsigned char ptr [ static ( N / 8 )]); int_leastN_t stdc_load8_aligned_lesN ( const unsigned char ptr [ static ( N / 8 )]); int_leastN_t stdc_load8_aligned_besN ( const unsigned char ptr [ static ( N / 8 )]); #endif
Thanks to some feedback from implementers and librarians, this first implementation would also need an added signed variant to the load and store functions as well as aligned and unaligned loads and stores. While C23 mandates a two’s complement representation for integers, because we are using the
functions (which may be larger than the intended
or
specification), it is important for the sign bit to be properly serialized an transported. Therefore, during
/
operations, the sign bit will be directly serialized into resulting signed value or byte array where necessary.
This specification is marginally more complicated than the
functions because they operate on
, where
is the minimum-width bit value. These functions, on most normal implementations, will just fill in the exact number of 8, 16, 32, 64, etc. bits.
We are fine with not making these precisely
/
because the C23 Standard includes a specific requirement that if
/
exist, then
/
must match their exact-width counterparts exactly, which has been existing practice on almost all implementations for quite some time now.
4.5.1. Vetting the Implementation / Algorithm for 8-bit loads and stores
In previous iterations of the paper, getting the algorithm written down properly in a way that does not rely on any kind of implementation-defined behavior for signed and unsigned endian-aware loads and stores was tough and resulted in many errors in the wording. Still, we know that the implementation is solid because we have tested it (both theoretically and factually) by writing implementations which base "unit" for writing into has a width greater than
. It is similar to the design
-
Implemented the core bodies of the functions as macros whose base unit is not necessarily
(as shown here).unsigned char -
Use that macro implementation in the normal
-based implementation;unsigned char -
Use that macro implementation all unsigned integer types that are larger than
to test if it deals with sub-8-bit-groups correctly;unsigned char -
Apply
or equivalent flag to the compiler and test across platforms.- fno - strict - alias
All of the tests pass across the three major compilers (MSVC, GCC, and Clang) and across platforms (Windows, Linux, Mac OS). We find this to be compelling enough to ensure that the implementation is suitably correct, even if the wording may not be proper or ideal. Therefore, we hope this can serve as a good basis in establishing that, at the very least, this is both implementable and usable. This also corroborates additional materials outside of compilers who always target
, such as the F2838x/F28069 series, and C28x series, of chips from Texas Instruments. For example, the TMS320C28x reference guide gives a listing for how to properly and effectively swap 8-bit bytes of a 32-bit integer, despite being a 16-bit architecture (Page 292). It is, at least in some cases, important enough to include in reference material and programming guides for these chips, even if the authors could not personally find implementations of publicly-discussable compilers which provided a C-style intrinsic for a
platform.
Regardless of how well it’s tested and checked, the paper still restricts the content to
bits, as we know that this has both wide implementation experience and is implementable.
4.6. Additional Modern Bit Utilities
Additionally to this, upon first review of the paper there was a strong supporting groundswell for bit operations that have long been present in both hardware and as compiler intrinsics. This idea progressed naturally from the
and
discussion. As indicated in [p0553] (merged into C++20 already), here’s a basic rundown of some common architectures and their support for various bit functionality:
operation | Intel/AMD | ARM | PowerPC |
---|---|---|---|
| ROL | - | rldicl |
| ROR | ROR, EXTR | - |
Many of the below bit functions are defined below to ease portability to these architectures. For places where specific compiler idioms and automatic detection are not possible, similar assembly tricks or optimized implementations can be provided by C. Further bit functions were also merged into C++, resulting in the current state of the C++ bit header.
There is further a bit of an "infamous" page amongst computer scientists for Bit Twiddling Hacks. These may not all map directly to instructions but they provide a broad set of useful functionality commonly found in not only CPU-based programming libraries, but GPU-based programming libraries and other high performance computing resources as well.
We try to take the most useful subset of these functions that most closely represent functionality on both old and new CPU architectures as well as common, necessary operations that have been around in the last 25 years for various industries. We have left out operations such as sign extension, parity computation, bit merging, clear/setting bits, fast negation, bit swaps, lexicographic next bit permutation, and bit interleaving. The rest are more common and appear across a wide range of industries from cryptography to graphics to simulation to efficient property lookup and kernel scheduling.
4.6.1. Type-Generic Macros and Counts for Types
All of the functions below have type generic macros associated with them. This can bring up an interesting question: if the return value depends on the type of the argument going into the function (i.e. for
, and
), is it bad for literal arguments? The answer to this question, however, is the same as its always been when dealing with literal values in C: use the suffix for the appropriate type, or cast, or put it in a
variable so that it can be used with the expected semantics. We cannot sink macro-based generic code use cases in the off-chance that someone calls
and thinks it returns a dependable answers. Integers (and their literals) are the least portable part of Standard C code: use the exact-width types if you are expecting exact-width semantics. Or, call the fundamental-type suffixed versions to get answers dependable for that given fundamental type (e.g.,
), even if it means the size of that fundamental type might change.
4.6.2. Argument Types
Many of the functions below are defined over the fundamental unsigned integer types, rather than their minimum width or exact width counterparts. This is done to provide maximum portability: users can combine information from the recently-introduced
macros to determine the width of the sizes at translation time as well as enjoy a disjoint and distinct set of fundamental types over which generic selection will always work.
The
types also have
macros, but those macros are not exactly guaranteed to cover a wide range of actual bit sizes either (if the
types do not exist, then a conforming implementation can simply just name all of the types as typedefs for
and call it a day). While an implementation could also define each of the distinct fundamental types from
to
to all be the same width as well, we are at the very least guaranteed that they are, in fact, distinct types. This makes selection over types in
predictable and usable (i.e.
is not guaranteed to compile since those types are not required to form a mutually exclusive or disjoint set).
The exact-width types suffer from non-availability on specific platforms, which makes little sense for functions which do not depend on a no-padding bits requirement. As long as the values read from the array only involve
bits (including the sign bit), and the rest are zero-initialized, we can have predictable semantics.
Extended integer types, least-width integer types, and exact-width integer types, can all be used with the type-generic macros since the type-generic macros are required to work over all standard (unsigned) integer types and extended (unsigned) integer types, while excluding
and bit-precise (
) integer types that do not match pre-existing type widths. This provides a complete set of functionality that is maximally portable while also allowing for precise semantic control with exact or least-width types.
Finally, in general
objects are disallowed from the above functions. There is just not a meaningful body of functionality that can be provided, and there is a fundamental difference between something that is expected to be a boolean value and something that is expected to be a 1-bit number (even if they can both serve similar purposes). It is also questionable to compute things such as rotation for
objects. If we can grow a consistent set of answers for these operations across the industry, than we can weaken the requirements and add the behavior in. (Note that if we put it in now and choose a behavior, we cut off any improvements made in the future, so it is best to be conservative here.)
4.6.3. Return Types
Ostensibly, part of the motivation to capture here should be that the types used to do things such as rotations should be identical to the return type used to do things like count zeros, e.g.
. This is mostly non-problematic until someone uses
: Clang already supports several megabyte-large
. On platforms where
is actually 16 bits, this is far too small to accommodate even a 1 MB
.
At the moment, the functions do not accept all bit-precise integer types (just ones that are bit-width equivalent to the existing standard and extended integer types), so this is technically a non-issue. But, if and when bit-precise integer types are given better handling in
macros or similar features that make them more suitable for type-generic macro implementations, this could become a problem. At the moment, we use wording to defer the issue by saying that type generic macros return a type suitably large for the range of the computed value. This allows us forward compatibility while fixing non-type-generic macro return types to
. The type-generic macros will have the flexibility from the specification to return larger signed integer types to aid in a smooth transition once bit-precise integer types sees more standard support.
4.6.4. stdc_rotate_left
/stdc_rotate_right
/
are common CPU instructions and the forms of the commonly-used circular shifts. They are common operations with applications in cyclic codes. They are commonly expressed (for 32-bit numbers) as
(rotate left) or
(rotate right).
#include <stdbit.h>unsigned char stdc_rotate_left_uc ( unsigned char value , unsigned int count ) [[ unsequenced ]]; unsigned short stdc_rotate_left_us ( unsigned short value , unsigned int count ) [[ unsequenced ]]; unsigned int stdc_rotate_left_ui ( unsigned int value , unsigned int count ) [[ unsequenced ]]; unsigned long stdc_rotate_left_ul ( unsigned long value , unsigned int count ) [[ unsequenced ]]; unsigned long long stdc_rotate_left_ull ( unsigned long long value , unsigned int count ) [[ unsequenced ]]; unsigned char stdc_rotate_right_uc ( unsigned char value , unsigned int count ) [[ unsequenced ]]; unsigned short stdc_rotate_right_us ( unsigned short value , unsigned int count ) [[ unsequenced ]]; unsigned int stdc_rotate_right_ui ( unsigned int value , unsigned int count ) [[ unsequenced ]]; unsigned long stdc_rotate_right_ul ( unsigned long value , unsigned int count ) [[ unsequenced ]]; unsigned long long stdc_rotate_right_ull ( unsigned long long value , unsigned int count ) [[ unsequenced ]]; // type-generic macro generic_value_type stdc_rotate_left ( generic_value_type value , generic_count_type count ) [[ unsequenced ]]; generic_value_type stdc_rotate_right ( generic_value_type value , generic_count_type count ) [[ unsequenced ]];
They cover all of the built-in unsigned integer types. A discussion of signed vs. unsigned integer types for the count type and the return type can be found in a previous section, here § 4.3 Signed vs. Unsigned.
As for choosing a single function like
that chooses left / right based on the value, it unfortunately imposes the worst code generation properties of all the options. When using entirely runtime values, unless you have a deliberately have a variable-rotate/shift instruction, you are required to emit a branch in order to handle the two cases, as rotate left / right - despite being symmetric - need some help. Here is the assembly for a technically optimal left/right rotate:
f : # @f mov r8d , edi mov ecx , esi rol r8d , cl mov edx , edi ror edx , cl mov ecx , esi neg ecx mov eax , edi rol eax , cl ror edi , cl test esi , esi cmovs edx , r8d cmovle eax , edi add eax , edx ret
This is more than double the size of the rotates found using left/right directly in § 4.3 Signed vs. Unsigned. Due to this, we decided that it was not advantageous to have a signed count with an unknown left/right: it is important to be capable of biasing the optimizer to whether a given rotate is left/right oriented.
5. Wording
The following wording is relative to the latest draft standard.
5.1. Add a new §7.18.✨17 and §7.18.✨18 sub-sub-clauses for "8-bit Memory Reversal" in §7.18
7.18.✨17 8-bit Memory ReversalSynopsis#include <stdbit.h>#include <limit.h>#if (CHAR_BIT) == 8 void stdc_memreverse8 ( size_t n , unsigned char ptr [ static n ]); #endif DescriptionThe
function reverses the order of the bytes pointed to by
stdc_memreverse8 . For every
ptr at offset
unsigned char in the range
index , swap the values at
0 <= index && index < n / 2 and
ptr [ index ] .
ptr [ n - index - 1 ] §7.18.✨18 Exact-width 8-bit Memory ReversalSynopsis#include <stdbit.h>#include <limits.h>#include <stdint.h>#if (CHAR_BIT) == 8 uintN_t stdc_memreverse8uN ( uintN_t value ) [[ unsequenced ]]; #endif DescriptionThe
functions provide an interface to swap the bytes of a corresponding
stdc_memreverse8u N object, where N matches one of the exact-width integer types (7.20.1.1). If an implementation provides the corresponding
uint N _t typedef, it shall define the corresponding exact-width memory reversal function for that value of
uint N _t .
N ReturnsThe
functions returns the 8-bit memory reversed
stdc_memreverse8u N value, as if by invoking
uint N _t and then returning
stdc_memreverse8 ( sizeof ( value ), ( unsigned char * ) & value ) .
value
5.2. Add new §7.18.✨19 and §7.18.✨20 sub-sub-clause for "Endian-Aware" functions in §7.18
7.18.✨19 Endian-Aware 8-bit LoadSynopsis#include <stdbit.h>#if (CHAR_BIT) == 8 uint_leastN_t stdc_load8_leuN ( const unsigned char ptr [ static ( N / 8 )]); uint_leastN_t stdc_load8_beuN ( const unsigned char ptr [ static ( N / 8 )]); uint_leastN_t stdc_load8_aligned_leuN ( const unsigned char ptr [ static ( N / 8 )]); uint_leastN_t stdc_load8_aligned_beuN ( const unsigned char ptr [ static ( N / 8 )]); int_leastN_t stdc_load8_lesN ( const unsigned char ptr [ static ( N / 8 )]); int_leastN_t stdc_load8_besN ( const unsigned char ptr [ static ( N / 8 )]); int_leastN_t stdc_load8_aligned_lesN ( const unsigned char ptr [ static ( N / 8 )]); int_leastN_t stdc_load8_aligned_besN ( const unsigned char ptr [ static ( N / 8 )]); #endif DescriptionThe 8-bit load family of functions functions read an
or
int_least N _t object from the provided
uint_least N _t in an endian-aware (7.18.2) manner, where N matches an existing minimum-width integer type (7.20.1.2). If an implementation provides the corresponding
ptr typedef, it shall define the corresponding endian-aware load function function for that value of
uint_least N _t . If this function is present, N shall be a multiple of 8 and
N shall be 8. The functions containing
CHAR_BIT in the name shall assume that
_aligned is suitably aligned to access a signed or unsigned integer of width N for a signed or unsigned variant of the function, respectively. If the function name contains the
ptr suffix in the name, it is a signed variant. Otherwise, the function is an unsigned variant. If the function name contains the
s N or
les N suffix, it is a little-endian variant. Otherwise, if the function name contains the
leu N or
bes N suffix, it is a big-endian variant.
beu N ReturnsLet the computed value $result$ be:
$$\sum_{index = 0}^{(N \div{} 8) - 1} b_{index} \times{} 2^{8 \times{} index}$$
where $b_{index}$ is:
—
, if the function is the little-endian variant;
ptr [ index ] — otherwise,
, if the function is the the big-endian variant.
ptr [ N - index - 1 ] If the function is an unsigned variant, return $result$. Otherwise, if the function is a signed variant, return:
$result$, if $result$ is less than $2^{N-1}$;
otherwise, $result - 2^{N}$.
7.18.✨20 Endian-Aware 8-bit StoreSynopsis#include <stdbit.h>#if (CHAR_BIT) == 8 void stdc_store8_leuN ( uint_leastN_t value , unsigned char ptr [ static ( N / 8 )]); void stdc_store8_beuN ( uint_leastN_t value , unsigned char ptr [ static ( N / 8 )]); void stdc_store8_aligned_leuN ( uint_leastN_t value , unsigned char ptr [ static ( N / 8 )]); void stdc_store8_aligned_beuN ( uint_leastN_t value , unsigned char ptr [ static ( N / 8 )]); void stdc_store8_lesN ( int_leastN_t value , unsigned char ptr [ static ( N / 8 )]); void stdc_store8_besN ( int_leastN_t value , unsigned char ptr [ static ( N / 8 )]); void stdc_store8_aligned_lesN ( int_leastN_t value , unsigned char ptr [ static ( N / 8 )]); void stdc_store8_aligned_besN ( int_leastN_t value , unsigned char ptr [ static ( N / 8 )]); #endif DescriptionThe 8-bit store family of functions functions write a
or
int_least N _t object into the provided
uint_least N _t in an endian-aware (7.18.2) manner, where N matches an existing minimum-width integer type (7.20.1.2). If an implementation provides the corresponding
ptr typedef, it shall define the corresponding endian-aware store function function for that value of
uint_least N _t . If this function is present, N shall be a multiple of 8 and
N shall be 8. The functions containing
CHAR_BIT in the name shall assume that
_aligned is suitably aligned to access a signed or unsigned integer of width N. If the function name contains the
ptr suffix in the name, it is a signed variant. Otherwise, the function is an unsigned variant. If the function name contains the
s N or
les N suffix, it is a little-endian variant. Otherwise, if the function name contains the
leu N or
bes N suffix, it is a big-endian variant.
beu N Let
be
value_unsigned if the function is a unsigned variant. Otherwise, let
value be the conversion of
value_unsigned to its corresponding unsigned type, if the function is a signed variant.
value Let
be an integer in a sequence that
index
— starts from 0 and increments by 8 in the range of [0, N), if the function is a little-endian variant;
— starts from
and decrements by 8 in the range of [0, N), if the function is a big-endian variant.
N - 8 Let
be an integer that starts from 0. For each
ptr_index in the order of the above-specified sequence:
index
Sets the 8 bits in
to
ptr [ ptr_index ] .
( value_unsigned >> index ) & 0xFF
Increments
by 1.
ptr_index
5.3. Add a new §7.18.✨21 sub-sub-clause for Rotate Left and Rotate Right Bit Utilities in §7.18
7.18.✨21 Rotate LeftSynopsisunsigned char stdc_rotate_left_uc ( unsigned char value , unsigned int count ) [[ unsequenced ]]; unsigned short stdc_rotate_left_us ( unsigned short value , unsigned int count ) [[ unsequenced ]]; unsigned int stdc_rotate_left_ui ( unsigned int value , unsigned int count ) [[ unsequenced ]]; unsigned long stdc_rotate_left_ul ( unsigned long value , unsigned int count ) [[ unsequenced ]]; unsigned long long stdc_rotate_left_ull ( unsigned long long value , unsigned int count ) [[ unsequenced ]]; generic_value_type stdc_rotate_left ( generic_value_type value , generic_count_type count ) [[ unsequenced ]]; DescriptionThefunctions perform a bitwise rotate left. This operation is typically known as a left circular shift.
stdc_rotate_left ReturnsLet N be the width corresponding to the type of the input
. Let
value be
no_promote_value . Let
( unsigned _BitInt ( N )) value be
r .
count % N
— If r is 0, returns
;
value — otherwise, returns
FN0✨).
( no_promote_value < < r ) | ( no_promote_value >> ( N - r )) The type-generic function (marked by its
argument) returns the above described result for a given input value so long as the
generic_value_type is
generic_value_type
— a standard unsigned integer type, excluding
;
bool — an extended unsigned integer type;
— or, a bit-precise unsigned integer type whose width matches any standard or extended integer type, excluding
.
bool The
argument shall be a non-negative value of signed or unsigned integer type, or
generic_count_type count .
char FN0✨) Bit-precise integer types do not perform integer promotion in these expressions and therefore prevent potential undefined behavior from a literal implementation which would usein place of
value (e.g., from a function call such as
no_promote_value with an
stdc_rotate_left_us ( 0xFFFF , 15 ) width of 16 with an
unsigned short width of 24).
int
7.18.✨22 Rotate RightSynopsisunsigned char stdc_rotate_right_uc ( unsigned char value , unsigned int count ) [[ unsequenced ]]; unsigned short stdc_rotate_right_us ( unsigned short value , unsigned int count ) [[ unsequenced ]]; unsigned int stdc_rotate_right_ui ( unsigned int value , unsigned int count ) [[ unsequenced ]]; unsigned long stdc_rotate_right_ul ( unsigned long value , unsigned int count ) [[ unsequenced ]]; unsigned long long stdc_rotate_right_ull ( unsigned long long value , unsigned int count ) [[ unsequenced ]]; generic_value_type stdc_rotate_right ( generic_value_type value , generic_count_type count ) [[ unsequenced ]]; DescriptionThefunctions perform a bitwise rotate right. This operation is typically known as a right circular shift.
stdc_rotate_right ReturnsLet N be the width corresponding to the type of the input
. Let
value be
no_promote_value . Let r be
( unsigned _BitInt ( N )) value .
count % N
— If r is 0, returns
;
value — otherwise, returns
FN1✨);
( no_promote_value >> r ) | ( no_promote_value << ( N - r )) FN1✨) Bit-precise integer types do not perform integer promotion in these expressions and therefore prevent potential undefined behavior from a literal implementation which would usein place of
value (e.g., from a function call such as
no_promote_value with an
stdc_rotate_right_us ( 0xFFFF , 15 ) width of 16 with an
unsigned short width of 24).
int The type-generic function (marked by its
argument) returns the above described result for a given input value so long as the
generic_value_type is
generic_value_type
— a standard unsigned integer type, excluding
;
bool — an extended unsigned integer type;
— or, a bit-precise unsigned integer type whose width matches any standard or extended integer type, excluding
.
bool The
argument shall be a non-negative value of signed or unsigned integer type, or
generic_count_type count .
char
6. Appendix
A collection of miscellaneous and helpful bits of information and implementation.
6.1. Example Implementations in Publicly-Available Libraries
Optimized routines following the naming conventions present in this paper can be found in the Shepherd’s Oasis Industrial Development Kit (IDK) library, compilable with a conforming C11 compiler and tested on MSVC, GCC, and Clang on Windows, Mac, and Linux:
Optimized routines following the basic principles present in this paper and used as motivation to improve several C++ Standard Libraries can be found in the Itsy Bitsy Bit Libraries, compilable with a conforming C++17 and tested on MSVC, GCC, and Clang on Windows, Mac, and Linux:
-
Bit Intrinsics (Declarations) (Sources)
Endianness routines and original motivation that spawned this proposal came from David Seifert’s Portable Endianness library and its deep dive into compiler optimizations and efficient code generation when alignment came into play:
-
Endian Load/Store (Declarations) (Sources)
7. Acknowledgements
Many thanks to David Seifert, Aaron Bachmann, Jens Gustedt, Tony Finch, Erin AO Shepherd, and many others who helped fight to get the semantics and wording into the right form, providing motivation, giving example code, pointing out existing libraries, and helping to justify this proposal.