Title: Improved Normal Enumerations
Shortname: 3029
Revision: 3
!Previous Revisions: N2997 (r2), N2964 (r1), N2908 (r0), Derived from N2575 (r2)
Status: P
Date: 2022-07-19
Group: WG14
!Proposal Category: Feature Request
!Target: C23
Editor: JeanHeyd Meneide (https://thephd.dev), phdofthehouse@gmail.com
Editor: Shepherd (Shepherd's Oasis LLC), shepherd@soasis.org
URL: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Improved%20Normal%20Enumerations.html
!URL: https://thephd.dev/_vendor/future_cxx/papers/C - Improved Normal Enumerations.html
!Paper Source: GitHub ThePhD/future_cxx
Issue Tracking: GitHub https://github.com/ThePhD/future_cxx/issues
Metadata Order: Previous Revisions, Editor, Latest, Paper Source, Issue Tracking, Project, Audience, Proposal Category, Target
Markup Shorthands: markdown yes
Toggle Diffs: no
Abstract: Enumerations should allow values greater than INT_MAX and smaller than INT_MIN, in order to provide a value-preserved set of integer constants.
path: resources/css/bikeshed-wording.html
# Changelog # {#changelog} ## Revision 3 - July 19th, 2022 ## {#changelog-r3} - All insertions and,deletions are highlighted specially so they can be observed, as well as being listed below this bullet in this part of the changelog. - Minor/editorial wording changes: - Removed implementation-defined from Paragraph 5. (Paragraph 6 already explains that the enumerated type is compatible with some implementation-defined integer type.) - Typo fix: `enumeration member types` -> `enumeration member type`. - Editorial fixes to FN✨1). - Wrote down additional motivation for why we have the long explanation of type/math computation for the values in [[#design-preservation]]. ## Revision 2 - June 17th, 2022 ## {#changelog-r2} - Address the use of **both** overflow and wraparound for signed and unsigned types in the "add `1` to previous constant" warning. - Separate out two terms: - *enumerated type* for the type of the enumeration itself; and, - *enumeration member type* for the type of the enumeration members. (Thanks, Joseph Myers!) - Various typo and grammar fixes. - Added a new section describing whether or not this is a breaking change insofar for the compilers we surveyed at [[#design-miscompiles]]. ## Revision 1 - April 12th, 2022 ## {#changelog-r1} - More directly specify the algorithm for selecting the types of enumeration constants, both after and midway through the definition of an enumeration. - Move all of the specification for the new algorithm into Β§6.7.2.2 in the [[#wording-specification-6.7.2.2]]. - Add more rationale in [[#design-midway]] for the problems found in current implementation extensions. ## Revision 0 - January 1st, 2022 ## {#changelog-r0} - Initial release πŸŽ‰! # Introduction and Motivation # {#intro} C always designates `int` as the type for the enumerators of its enumerations, but it's entirely unspecified what the (underlying compatible) type for the `enum` will end up being. These constants (and the initializers for those constants) must fit within an `int`, otherwise it is a constraint violation. For decades, compilers have been silently providing extensions in their default build modes for enumerations to be larger than `int`, even if `_Generic` and friends always detects the type of such an enumerator to be `int`. It is problematic to only have enumerators which are `int`, especially since it is only guaranteed to be 16-bits wide. Portability breaks happen between normal 32-bit `int` environments like typical GCC and Clang x86 platforms vs. 16-bit `int` environments like SDCC microcontroller targets, which is not desirable. This proposal provides for enumerations with enumerators of values greater than `INT_MAX` and smaller than `INT_MIN` to have enumerators that are of a different type than `int`, allowing the underlying type and the enumeration constants themselves to be of a different type. It does not change behavior for any enumeration constants which were within the `[INT_MAX, INT_MIN]` range already. # Prior Art # {#prior} The design of this feature is to enable what has been existing practice on implementations for a long time now, including GCC, SDCC, Clang, and several other compilers. Compilers have allowed for values greater than `INT_MAX` and values less than `INT_MIN` for a long time in their default compilation modes. We capture this as part of the design discussion below, for how we structure these proposed changes to the C Standard. # Design # {#design} This is a very small change that only makes previously ill-formed code now well-formed. It does not provide any other guarantees from the new wording besides allowing constants larger than `int` to be used with enumerations. Better enumerated types and values are better left with the sister paper [on Enhanced Enumerations](https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Improved%20Normal%20Enumerations.html). ## Type-preserving, Value-Preserving ## {#design-preservation} More specifically: - values for enumerators that are outside of the range `[INT_MIN, INT_MAX]` are allowed and change the type of the enumerators from `int`; - and, the underlying type for enumerations with such types may be larger than `int`, but is still implementation-defined for backwards compatibility. As put more eloquently by Aaron Ballman, this paper aims to be **value-preserving** wherever possible for enumeration constants without breaking backwards compatibility. The sister paper aims to provide **type-preserving** properties, and both papers are meant to complement the existing ecosystem. Particularly, this code: ```cpp enum a { a0 = 0xFFFFFFFFFFFFFFFFULL }; int main () { return _Generic(a0, unsigned long long: 0, int: 1, default: 2); } ``` Should produce a value of `0` on a normal implementations (but can give other values, so long as the underlying type is big enough to fit a number (264 - 1)). It shall also not produce a diagnostic on even the most strict implementations. ## Using the Enumerators Midway in the Definition List ## {#design-midway} Given this following code snippet: ```cpp #include #define GET_TYPE_INT(x) _Generic(x, \ char: 1,\ unsigned char: 2,\ signed char: 3,\ short: 4,\ unsigned short: 5,\ int: 6,\ unsigned int: 7,\ long: 8,\ unsigned long: 9,\ long long: 10,\ unsigned long long: 11,\ default: 0xFF\ )\ enum x { a = INT_MAX, b = ULLONG_MAX, a_type = GET_TYPE_INT(a), b_type = GET_TYPE_INT(b) }; #include int main () { printf("sizeof(long)=%d\n", (int)sizeof(long)); printf("sizeof(long long)=%d\n", (int)sizeof(long long)); printf("a_type=%d\n", (int)a_type); printf("b_type=%d\n", (int)b_type); printf("GET_TYPE_INT(a), outside=%d\n", (int)GET_TYPE_INT(a)); printf("GET_TYPE_INT(b), outside=%d\n", (int)GET_TYPE_INT(b)); return 0; } ``` Compilers [are not consistent](https://godbolt.org/z/qe1fzTbYr), depending on how far with extensions they like to go. Anyone who was depending on a specific type was not relying on either (a) compilable C code, according to the C standard, or (b) widely-existing cross-platform C code, according to what implementation extensions do. Therefore, we attempt to enshrine the best of the available existing practice and improve the status quo. ### Midway Type: Whatever the Compiler Chooses ### {#design-midway-during} During the definition of an enumerated type, the enumeration constants themselves have whatever type the enumeration wants if they are not representable by `int`. This is as consistent as we can be for existing code that relies on using existing enumeration constants at definition time. ### Final Type: the Enumerated Type ### {#design-midway-after} After the enumerated type's definition is completed (after the `}`), the enumeration constants have the enumerated type. Because this could be a breaking change, it only applies when all the constants are not representable by `int` as in the previous version. The reason we want to specify it this way is because implementations are wildly varying on how they handle this today in their extensions, with no clear consensus on how it is done. That is, using existing extensions today in various compilers, adding 3 extra lines to modify the snippet from the up-level section: ```cpp #include #define GET_TYPE_INT(x) _Generic(x, \ char: 1,\ unsigned char: 2,\ signed char: 3,\ short: 4,\ unsigned short: 5,\ int: 6,\ unsigned int: 7,\ long: 8,\ unsigned long: 9,\ long long: 10,\ unsigned long long: 11,\ default: 0xFF\ )\ enum x { a = INT_MAX, b = ULLONG_MAX, a_type = GET_TYPE_INT(a), b_type = GET_TYPE_INT(b) }; extern enum x e_a; extern __typeof(b) e_a; extern __typeof(a) e_a; // … ``` results in various failures on today's implementations. This is because `a` and `b` are of different types (`a` is an `int` and not compatible with `enum x;` or `typeof(b)`, since those are both compatible with `long` or `long long` depending on the selection done by the implementation). We feel it is worthwhile to clarify this and make it more consistent. There is no way to make such code portable, as it was (A) already using an extension that was not standardized until before C23 and (B) already relied in an implementation detail that could change between implementations, but also **within** a given implementation (e.g., `-fshort-enum`). The above code is utterly non-portable, and outside of what we can possibly concern ourselves with when trying to provide at least some degree of standardized behavior. We can fix this by providing a consistent choice of the underlying integer type of `enum x` for the integer constants when used after the closing brace of `}`. We expect this to be the best possible and most consistent choice for enumerations going forward. ## Mis-Compiles ## {#design-miscompiles} Additionally, some compilers produce what could probably be considered a "mis-compile" of the code. For example, when given this code snippet: ```cpp enum a { a0 = 0xFFFFFFFFFFFFFFFFULL }; _Bool e () { return a0; } int f () { return a0; } unsigned long g () { return a0; } unsigned long long h () { return a0; } int main () { return f(); } ``` One compiler will do the expected behavior of converting implicitly to each integer type and `_Bool` (returns `true`/`1`), but for the case of `f` returns `0` (SDCC version 4.1.0). It only does this for integer returns in this specific case. Others will not compile the code at all (Tendra, GCC and Clang confingured with `-pedantic` + `-Werror` mode). (See a sampling of such compilations at the [link here](https://godbolt.org/z/h5efjs1j1).) All the others that work properly manifest the behavior described in this proposal, in so far as we checked (some 15 different compilers for various architectures). It seems like the behavior here is the culmination of the best existing practice we have. For the cases where it is not, the compilers are either mis-compiling (producing demonstrably unexpected and wrong output) or not accepting it (strictly standards compliant). Therefore, this proposal is not - insofar as understood by the author and the various folks asked to test for their compilers - breaking any code that was not already broken. ## "Why not use the C++ Wording?" (Or: How We Ended Up With a Very Big ΒΆ5 of Steps) ## {#design-enumeration-steps} During the July 2022 Virtual C Meeting, it was repeatedly asked why we did not use the C++ wording, or just used the words "adds one to the previous enumerator" and left all the type computation of the intermediate values to the implementation. This was what the original wording in [Revision 0](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2908.htm) did. Unfortunately, this is not compatible with - (a) how math and operations are performed in C, wherein you need a type to determine the properties of a given operation; and, - (b) what implementers were comfortable with when they read the wording. While I personally am glad C++ was able to achieve an unspoken consensus on how that worked for their compilers and everyone has more or less solved that problem in their compilers in a portable fashion with untyped enumerations, that is not where C is between its implementations. Given C enjoys a far wider number of implementations and many (several dozens) more implementations than C++, in some cases we need to be more strict and more explicit. This is especially true for cases like the following: ```cpp #include enum big_enum { a = LONG_MAX, b = a + 1, c = ULLONG_MAX } ``` The implementation needs to know, definitively, how to handle what `a + 1` means, and given that C has a much more restricted constant expression apparatus than C++ the type semantics of such an operation matter much more for standards-compliant code. # Proposed Wording # {#wording} The following wording is [relative to N2912](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf). ## Intent ## {#wording-intent} The intent of the wording is to provide the ability to express enumerations with the underlying type present. In particular: - enumerations without an underlying type can have enumerators initialized with integer constant expressions whose type is `int` or some implementation-defined type capable of representing the constant expression's value; - bit-precise types cannot be used for enumerations without an underlying type; - the type of the enumeration constants during definition may differ from when after the enumeration is completed; - it is an error (constraint violation) to make an enumeration with a range of values that cannot be represented as as a single signed or unsigned type; and, - it is an error (constraint violation) to make an enumeration with a value ## Proposed Specification ## {#wording-specification} ### Modify Section Β§6.4.4.3 Enumeration constants ### {#wording-specification-6.4.4.3}
6.4.4.3 Enumeration constants
Syntax
*enumeration-constant:* :: *identifier*
Semantics
An identifier declared as an enumeration constant for an enumeration has either type `int` or the enumerated type, as defined in 6.7.2.2.
Forward references: enumeration specifiers (6.7.2.2).
### Modify Section Β§6.7.2.2 Enumeration constants ### {#wording-specification-6.7.2.2}
6.7.2.2 Enumeration specifiers
Syntax
*enum-specifier:* :: **enum** attribute-specifier-sequenceopt identifieropt { *enumerator-list* **}** :: **enum** attribute-specifier-sequenceopt identifieropt { *enumerator-list* , **}** :: **enum** *identifier* *enumerator-list:* :: *enumerator* :: *enumerator-list* **,** *enumerator* *enumerator:* :: *enumeration-constant* attribute-specifier-sequenceopt :: *enumeration-constant* attribute-specifier-sequenceopt **=** *constant-expression*
Constraints
The expression that defines the value of an enumeration constant shall be an integer constant expression. that has a value representable as an `int`. For all the integer constant expressions which make up the values of the enumeration constant, there shall be an implementation-defineda signed or unsigned integer type (excluding the bit-precise integer types) capable of representing all of the values.
Semantics
The identifiers in an enumerator list are declared as constants and may appear wherever such are permitted.133) An enumerator with **=** defines its enumeration constant as the value of the constant expression. If the first enumerator has no **=**, the value of its enumeration constant is 0. Each subsequent enumerator with no **=** defines its enumeration constant as the value of the constant expression obtained by adding 1 to the value of the previous enumeration constant. (The use of enumerators with **=** may produce enumeration constants with values that duplicate other values in the same enumeration.) The enumerators of an enumeration are also known as its members.
The type for the members of an enumeration is called the *enumeration member types**enumeration member type*.
During the processing of each enumeration constant in the enumerator list, the type of the enumeration constant shall be: :: β€” `int`, if there are no previous enumeration constants in the enumerator list and no explicit **=** with a defining integer constant expression; or, :: β€” `int`, if given explicitly with **=** and the value of the integer constant expression is representable by an `int`; or, :: β€” the type of the integer constant expression, if given explicitly with **=** and if the value of the integer constant expression is not representable by `int`; or, :: β€” the type of the value from lastthe previous enumeration constant with `1` added to it. If such an integer constant expression would overflow or wraparound the value of the previous enumeration constant from the addition of `1`, the type takes on either: :: :: β€” a suitably sized signed integer type (excluding the bit-precise signed integer types) capable of representing the value of the previous enumeration constant plus `1`; or, :: :: β€” a suitably sized unsigned integer type (excluding the bit-precise unsigned integer types) capable of representing the value of the previous enumeration constant plus `1`. :: A signed integer type is chosen if the previous enumeration constant being added is of signed integer type. An unsigned integer type is chosen if the previous enumeration constant is of unsigned integer type. If there is no suitably sized integer type described previous which can represent the new value, then the enumeration has no type which is capable of representing all of its valuesFN0✨).
Each enumerated type shall be compatible with `char`, a signed integer type, or an unsigned integer type (excluding the bit-precise integer types). The choice of type is implementation-defined139), but shall be capable of representing the values of all the members of the enumeration.
The enumerated type is incomplete until immediately after the **}** that terminates the list of enumerator declarations, and complete thereafter. The enumeration member type upon completion is: :: β€” `int` if all the values of the enumeration are representable as an `int`; or, :: β€” the enumerated typeFN1✨).
**EXAMPLE** The following fragment: ```cpp enum hue { chartreuse, burgundy, claret=20, winedark }; enum hue col, *cp; col = claret; cp = &col; if (*cp != burgundy) /* ... */ ``` makes hue the tag of an enumeration, and then declares col as an object that has that type and `cp` as a pointer to an object that has that type. The enumerated values are in the set {0, 1, 20, 21}.
138)Thus, the identifiers of enumeration constants declared in the same scope are all required to be distinct from each other and from other identifiers declared in ordinary declarators.
139)An implementation can delay the choice of which integer type until all enumeration constants have been seen.
FN0✨)Therefore, a constraint has been violated.
FN1✨)The integer type selected during processing of the enumerator list (before completion) of the enumeration may not be the same as the selected compatible implementation-defined integer type selected for the completed enumeration.
### Add implementation-defined enumeration behavior to Annex J ### {#wording-specification-annex-j} # Acknowledgements # {#acknowledgements} Thanks to: - Aaron Ballman for help with the initial drafting; - Aaron Ballman, Aaron Bachmann, Jens Gustedt & Joseph Myers for questions, suggestions and offline discussion; - Joseph Myers, for detailed review of the many components and fine-tuning of the way enumeration constants are handled; - Jens Gustedt, for detailed review and suggestions helping to shape this paper; - Robert Seacord for editing suggestions; and, - Clive Pygott for the initial draft of this paper. We hope this paper serves you all well.