Title: _Record types
H1: _Record
types
Shortname: 3332
Revision: 0
!Previous Revisions: None
Status: P
Date: 2024-09-06
Group: WG14
!Proposal Category: Change Request, Feature Request
!Target: C2y
Editor: JeanHeyd Meneide, phdofthehouse@gmail.com
Editor: Shepherd (Shepherd's Oasis LLC), shepherd@soasis.org
URL: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20_Record%20types.html
!Paper Source: GitHub
Issue Tracking: GitHub https://github.com/ThePhD/future_cxx/issues
Metadata Order: Previous Revisions, Editor, This Version, Paper Source, Implementation, Issue Tracking, Project, Audience, Proposal Category, Target
Markup Shorthands: markdown yes
Toggle Diffs: no
Abstract: User-controlled ways of handling compatibility for types in C, as a means to strengthen Type-Based Alias Analysis but also give users an explicit handle on increasing type diversity in C.
path: resources/css/bikeshed-wording.html# Changelog # {#changelog} ## Revision 0 ## {#changelog-r0} - Initial release. ✨ # Introduction and Motivation # {#intro} The need for greater and more expansive compatibility due to various aspects of C programming — including macro-based generic programming for anonymous, in-line defined structures — has been increasingly high in C software. Furthermore, software which has been unable to get compatibility rules and aliasing rules to accommodate their code have worked largely by simply turning off strict type-based alias analysis with flags such as `-fno-strict-aliasing`. (One major compiler vendor has simply decided to not implement any serious type-based aliasing analysis and forego all of it.) This has put C in a tenuous situation, where its potentially rich type system is deeply at odds with some of its more serious and prominent users. ## Type Compatibility Issues with Anonymous Types ## {#intro-anonymous} During the discussion of improving tag compatibility from Martin Uecker's [[N3037]], it was shown in a previous version of the paper that making all unnamed types compatible within the same translation unit would create a serious problem for type safety in C. Consider the following snippet of code: ```cpp typedef struct { int value; } fahrenheit; typedef struct { int value; } celsius; ``` Under previous iterations of [[N3037]], these two would be considered compatible. This was seen as overreaching, as despite current C rules saying these two types are technically compatible if they are somehow accessed outside of the translation unit they are defined in, it violated safety within the translation unit itself. That is neither intended nor helpful, and could very well violate fundamental safety guarantees in a large body of software, particularly simulation and other units-heavy software. Therefore, this provision was removed from in [[N3037]] before the paper was approved for C23. Speculation as to other ways of solving this problem were brought forward, such as making all anonymous structures identical except in the case where they are nested within a `typedef` declaration. None have been formally proposed yet, but the authors believe this subtle interaction would result in a greater complexity burden than is reasonably advisable. It is also unintuitive to go about things in this manner, as it would result in different behavior if a structure is named or not and specifically only in cases where a `typedef` is present or not. ## Lack of Structural Typing ## {#intro-structural} Additionally, other kinds of code contain repetitive definitions of the same structure which logically, spiritually, and for all intents and purposes are exactly identical. Take these various range definitions in the headers of Andre Weissflog's sokol, a library that allows various different programming languages to interoperate with graphics in WASM, Nim, Zig, and others through C: - `typedef struct sfetch_range_t { const void* ptr; size_t size; } sfetch_range_t;` in [sokol_fetch.h](https://github.com/floooh/sokol/blob/df71cc24cb273c0cf68ccef91932c09893006b18/sokol_fetch.h#L1002-L1005) - `typedef struct sspine_range { const void* ptr; size_t size; } sspine_range;` in [sokol_spine.h](https://github.com/floooh/sokol/blob/df71cc24cb273c0cf68ccef91932c09893006b18/util/sokol_spine.h#L996) - `typedef struct sdtx_range { const void* ptr; size_t size; } sdtx_range;` in [sokol_debugtext.h](https://github.com/floooh/sokol/blob/df71cc24cb273c0cf68ccef91932c09893006b18/util/sokol_debugtext.h#L592-L595) Due to the current compatibility rules, these types are not compatible. And yet, the author of sokol has stated the only reason they are different types is for compilation time optimization: > They're all meant to be interchangeable. The reason they are separate is because I want the various sokol headers to be as standalone as possible (e.g. not require a shared "sokol_common.h" header). > > — [September 2nd, 2024](https://twitter.com/FlohOfWoe/status/1830557635822723503) Before this, Weissflog has also gone on to state: > I sometimes wish C had optional structurally typed structs (so that two differently named structs with the same interior are assignable to each other, would help to send data from one library to another without ‘entangling’ them via shared types. > > — [August 23rd, 2024](https://twitter.com/FlohOfWoe/status/1827000924377698395) This has also been a routine problem for developers who end up being the user of larger libraries or coordinating bigger projects, where disparate mathematics libraries and the like can be common. An example includes a frustration from a Doctor of Computer Science and Autodesk Meshmixer Creator Ryan Schmidt, offhandedly remarking on the current state of programming languages: > an utterly insane thing in (most? all?) programming languages is that you can have two separate math libraries, that each define their own vector-math types in exactly the same way, and there is no way to make `MyVector3f = YourVector3f` work transparently > > — [July 30th, 2024](https://twitter.com/rms80/status/1818307726960710139) Indeed, given the different definitions of even a 2 or 3 dimensional vector in [[linmath-h|Datenwolf's linmath.h]] versus [[cglm|recp's cglm]] versus older libraries like [[mathc|feselva's mathc]], it can be frustrating coordinating these structures and types to work together with one another. Certain languages like [[ocaml-types|OCaml]] have types that use what is known as [[structural-typing|"structural typing"]], as is alluded to by Weissflog in his yearning for a better type system for C. These types only consider their members/fields/properties in order to determine type compatibility and identity. There are various "sub-flavors" of structural typing, but it effectively behaves exactly like C type compatibility with the caveat that the top-level structure or union name is not considered relevant when performing compatibility checks. ## Macro-generic Data Structure Issues ## {#intro-macro} Martin Uecker's [[N3037]] enabled type-generic datastructures and macros with identical names at file and function scope with the same inner contents to be considered compatible types in C23. This meant that defining an e.g. dynamic array macro as: ```cpp #define array(T) struct array_##T##_t { size_t size; T* values; } ``` worked very well and no longer required a single "stable" pre-declaration of a given use of this type before using e.g. `array(int)` at function scope. However, this became slightly problematic with types that had spaces in them or were otherwise weird, such as with `array(unsigned char)`. The workaround was to use `typedef`s, but it still left a problem: how come it was not possible to make an unnamed type that, within a single translation unit, compatible with other types like it? While the "strong typing" case is indeed important, it seemed that we had to sacrifice one use case for another: this is, ultimately, not a good place to be in the ecosystem. ## A Solution ## {#intro-solution} We are proposing a new keyword to allow **users** to explicitly annotate structures and unions which may have their top-level name ignored for compatibility purposes, as well as additional opt-in changes that are specified by the standard, and then further by implementations. The spelling of the keyword is `_Record` and `_Record( record-attributes … )`, and it creates new record modifiers which changes how types are considered compatible. Changing this allows explicit opt-in support for: - assigning between types that are fundamentally identical but differ in their top-level given tag name, as is the desire from the sokol example and the mathematics library example; - pointer-casting between types of identical type and name field layouts to allow for the long-standing practice of type-punning between two often "identical" types without an explicit `memcpy`; - passing such objects to morally, spiritually, and intentfully identical function calls without needing an explicit cast or conversion function; - protecting existing code which relies on the current and well-supported interpretation that nameless structures are all uniquely named and different within a single translation unit - and many, many more use cases covered in the [[#design]] section. This alleviates the pressure from having to find a precise formulation of `typedef` or empty strucutres which could disrupt and negatively impact existing code pointed out in previous minuted discussion of [[N3037]], while also giving users better explicit control of compatibility and sharing between different disparate and unconnected libraries. The design for `_Record` types is as below. # Design # {#design} The design of `_Record` and it's parenthesized counterpart is meant to accomplish a few critical goals that benefit both end-users and implementation vendors alike: - allow the user to explicitly opt a type into a shared space with similar types; - strengthen the case for type-based alias analysis by making the compatibility for many near-identical classes explicitly understood from explicit user opt-in; - promote ease of use between morally, spiritually, and semantically identical types **without** impacting existing code; - and, give vendors mechanisms to provide additional, separate-from-the-standard behaviors for compatibility. Note: This does NOT provide standards-blessed semantics for aliasing or assigning disparate types of different field types. That is covered under the [[#design-vendor.impl|vendor-provided record modifiers]] portion, and we hope that in providing that level of space we can further strengthen the case for type-based alias analysis by giving users and vendors more controls over layout-based compatibility. However, this proposal at this time does not have any mechanisms for allowing the punning of e.g. a structure with a single `int32_t` member and a structure with a single `float` member. A quick example looks as follows: ```cpp typedef struct _Record range { void* ptr; size_t size; } range; typedef struct _Record slice { void* ptr; size_t size; } slice; void slice_func(slice value); int main () { unsigned char data[1]; struct range r = { .ptr = data, .size = sizeof(data) }; struct slice s = { .ptr = nullptr, .size = 0 }; slice_func(r); // ok struct range* slice_ptr_thru_range = &s; // ok r = s; // ok return 0; } ``` This design achieves those goals, in various ways. First, let us review the syntax. ## Syntax ## {#design-syntax} Syntactically, `_Record` is a new keyword that is part of the declarator for a type definition. It goes between the tag type of `struct` or `union` and the identifier, and before the attribute specifier sequence: ```cpp struct _Record meow { char __padding; }; // ok struct _Record [[some, attribs]] bark { char __padding; }; // ok struct _Record { char __padding; }; // ok struct _Record [[other_attrib]] { char __padding; }; // ok union _Record purr { char __padding; }; // ok union _Record [[some, attribs]] woof { char __padding; }; // ok union _Record { char __padding; }; // ok union _Record [[other_attrib]] { char __padding; }; // ok ``` Every definition of a type must agree and either have `_Record` on it or not have `_Record` on it. `_Record` does not need to be placed on the type when forwarding declaring or referencing the type anymore: it will always be considered a `_Record` type. A type that was previously defined without `_Record` cannot be redefined with the `_Record` keyword on it. A forward declaration also cannot contain `_Record`, because it only carries meaning on the defining declaration: ```cpp struct _Record meow; // constraint violation struct _Record bark { char __padding; }; // ok int main () { struct bark b = {}; // ok struct _Record bark b2 = {}; // constraint violation; struct _Record woof { char __padding; } w0 = {}; // ok struct _Record purr* p0; // constraint violation return 0; } ``` ## `_Record` for Macro-Generic Datastructures ## {#design-generics} Given the example in [[#intro-macro]], we can now side step any issues of non-identifier type names such as `int*` or `unsigned char` or similar by simply defining the structure to be an empty struct that is marked with `_Record`: ```cpp #define array(T) struct _Record { size_t size; T* values; } ``` The structure remains unnamed, which means no extra effort needs to be taken to avoid colliding with user namespaced entities either. It works at file and function scope without issue. And there's no compatibility problems either, which preserves all of the intended effects of [[N3037]]. ## Shared Space vs. Fully Closed ## {#design-shared} The proposed semantics for `_Record` types is meant to be lenient and shared (also sometimes known as "*viral*"); that is, rather than needing both structures on both sides of a comparison, argument pass, or assignment to be annotated with `_Record`, only one of the structures or unions must be. This is very imporant because of preexisting code. Requiring that both sides of an assignment or argument pass requires the arduous task of modifying every existing library to have better semantics. It is against the charter and general nature of C over the last 40 years to require sweeping changes or steep investment in existing code to make these things work. Many fundamental libraries can be perfectly valid and usable, even if not well-maintained or locked into a specific era VIA contractual obligation. To bring more immediate usability outside of closely-knit ecosystems, the design of this system is so that only **one** of the two types for an assignment is record modified. Similarly, if there are two different kinds of record modified types, the standard defines the "order" in which record modifications take priority. In general, this priority can be considered as simply being from "most lenient" to "least lenient". Note that this does not subject any piece of code using known and well-understood mechanisms such as incomplete types / private source file definitions to suddenly be more or less compatible than they used to be. Record types are a property only of types with source-available definitions marked with `_Record` in a translation unit. This does, however, add a new way to access an old problem in C. ### Type-Based Alias Analysis under "Viral" Compatibility ### {#design-shared-tbaa} Consider the following (probably illegal, but nonetheless compiling and running) code: ```cpp // alice.h typedef struct alice_vec { int x; int y; } alice_vec; ``` ```cpp // bob.h typedef struct bob_vec { int x; int y; } bob_vec; ``` ```cpp // TU #1 #include
## Modify Section §6.7.3.2 "Structure and union specifiers" ## {#wording-6.7.3.2}6.2.7 Compatible type and composite typeTwo types are *compatible types* if they are the same. Additional rules for determining whether two types are compatible are described in 6.7.3 for type specifiers, in 6.7.4 for type qualifiers, and in 6.7.7 for declarators.45) Moreover, two complete structure, union, or enumerated types declared with the same tag are compatible if members satisfy the following requirements: :: — there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types; :: — if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; :: — and, if one member of the pair is declared with a name, the other is declared with the same name. For two structures, corresponding members shall be declared in the same order. For two unions declared in the same translation unit, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same widths. For two enumerations, corresponding members shall have the same values; if one has a fixed underlying type, then the other shall have a compatible fixed underlying type. For determining type compatibility, anonymous structures and unions are considered a regular member of the containing structure or union type, and the type of an anonymous structure or union is considered compatible to the type of another anonymous structure or union, respectively, if their members fulfill the preceding requirements. Furthermore, two structure, union, or enumerated types declared in separate translation units are compatible in the following cases: :: — both are declared without tags and they fulfill the preceding requirements; :: — both have the same tag and are completed somewhere in their respective translation units and they fulfill the preceding requirements; :: — both have the same tag and at least one of the two types is not completed in its translation unit. Additionally, if one of two structure or union types is a standard record type, then the types are compatible in the additional following cases: :: — if one of the types is a types-only record type (✨6.7.3.3), both the tag of the structures or unions and the names of its corresponding members are not considered while fulfilling the preceding requirements; :: — otherwise, if one of the types is a basic record type (✨6.7.3.3), the tag of the structures or unions is not considered while fulfilling the preceeding requirements; Otherwise, the structure, union, or enumerated types are incompatible.6.7.3.2 Structure and union specifiersSyntax*struct-or-union-specifier:* :: *struct-or-union* *record-modifier*opt *attribute-specifier-sequence*opt *identifier*opt **{** *member-declaration-list* **}** :: *struct-or-union* *attribute-specifier-sequence*opt *identifier* …## Add a new Section §6.7.3.3 "Record modifiers" ## {#wording-6.7.3.3}## Modify Section §6.7.3.4 Tags ## {#wording-6.7.3.4}6.7.3.3 Structure and union specifiersSyntax*record-modifier:* :: **_Record** :: **_Record** **(** *attribute-list* **)**A structure or union type with a record modifier is a *record type*. A record type with a record modifier of the form `_Record`, `_Record ( )`, or `_Record` followed by a parenthesized standard attribute listed in this subclause is a *standard record type*. Otherwise, it is an *implementation record type*.ConstraintsIf present, the identifier in a standard attribute for a record moifier shall be `types`. Standard attributes shall only be present once in the attribute list for a record modifier.A structure or union shall contain identical record modifiers on all of its definitions, if present. If a structure or union in two different translation units does not contain identical record modifiers, the behavior is undefined.A record modifier shall not contain an attribute unrecognized by the implementation.SemanticsRecord types provide additional ways to specify the compatibility of types that would otherwise be incompatible.The use of standard attributes in record modifiers is defined by this document. The use of attribute prefixed token sequences in record modifiers is implementation-defined. The order in which attribute tokens in a record modifier is not significant.A record modifier of the form `_Record` or `_Record ( )` classifies a standard record type as a *basic record type*. Basic record types modify their compatibility with other types as specified in 6.2.7.A record modifier which contains the attribute token `types` classifies a standard record type as a *types-only record type*. Types-only record types modify their compatibility with other types as specified in 6.2.7.Implementation record types, if any, have implementation-defined semantics.**EXAMPLE 1** Record modifiers allows for assignment between types that are meant to be related but otherwise would not be considered compatible: ```cpp typedef struct _Record catlib_range { void* ptr; size_t size; } catlib_range; typedef struct _Record doglib_slice { void* ptr; size_t size; } doglib_slice; void doglib_func(doglib_slice value); void catlib_func(catlib value); void doglib_ptr_func(doglib_slice *mem); void catlib_ptr_func(catlib *mem); int main () { unsigned char data[1]; catlib_range cats = { .ptr = data, .size = sizeof(data) }; doglib_slice dogs = { .ptr = data, .size = sizeof(data) }; // dogs and cats, working together doglib_func(cats); // ok catlib_func(cats); // ok doglib_func(dogs); // ok catlib_func(dogs); // ok doglib_ptr_func(&cats); // ok catlib_ptr_func(&cats); // ok doglib_ptr_func(&dogs); // ok catlib_ptr_func(&dogs); // ok return 0; } ```**EXAMPLE 2** Types-only record types allows for compatibility even if the name of members are different, even if only one of the types is considered compatible: ```cpp typedef struct liba_vec2 { float x; float y; } liba_vec2; typedef struct _Record(types) libd_vec2 { float mx; float my; } libd_vec2; // compatible with libd_vec2 void f(liba_vec2 v); int main () { liba_vec2 vec_a = {}; libd_vec2 vec_d = { 1.0f, 1.0f }; libd_vec2* d_thru_a = &vec_a; // ok vec_a = vec_d; // ok f(vec_d); // ok return 0; } ```**EXAMPLE 3** Compatibility between types with different record modifiers works by checking: if either of the type is a types-only record type, then, if either is a basic record type; then, typical non-record type compatibility rules. ```cpp struct meow { char a; }; struct _Record miaou { char b; }; struct _Record(types) nya { char c; }; int main () { struct meow ecat = {}; struct miaou fcat = {}; struct nya jcat = {}; ecat = fcat; // constraint violation: incompatible types (basic record type), // tag names ignored, member names are different ecat = jcat; // ok: compatible types (types-only record type), // tag names ignored, member names ignored fcat = ecat; // constraint violation: incompatible types (basic record type), // tag names ignored, member names are different fcat = jcat; // ok: compatible types (types-only record type), // tag names ignored, member names ignored jcat = ecat; // ok: compatible types (types-only record type), // tag names ignored, member names ignored jcat = fcat; // ok: compatible types (types-only record type), // tag names ignored, member names ignored return 0; } ```Recommended PracticeImplementations interested in blessing specific forms of assignment, casting, and so-called "type-punning" between types typically not considered related should use implementation record types as a means of providing such behaviors to their end-users.6.7.3.4 TagsConstraints…A type specifier of the form :: *struct-or-union* *record-modifier*opt *attribute-specifier-sequence*opt *identifier*opt **{** *member-declaration-list* **}** …## Automatically Update Annex J entries for implementation-defined behavior ## {#wording-annex.j} # Acknowledgements # {#acknowledgements} Many thanks to the individuals who voiced their frustration with C's current system to help spur this proposal along. Thanks to Martin Uecker for addressing the original problem, and Jens Gustedt for the compelling counterexample.{ "N3037": { "authors": [ "Martin Uecker" ], "title": "N3037 - Improved Rules for Tag Compatibility", "href": "https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3037.pdf", "date": "July 7th, 2022" }, "N3260": { "authors": [ "Aaron Ballman" ], "title": "N360 - Generic selection expression with a type operand", "href": "https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3260.pdf", "date": "May 12th, 2024" }, "structural-typing": { "authors": [ "Wikipedia" ], "title": "Structural type system", "date": "September 2nd, 2024", "href": "https://en.wikipedia.org/wiki/Structural_type_system" }, "ocaml-types": { "authors": [ "OCaml Contributors" ], "title": "OCaml Basic Data Types and Pattern Matching", "date": "September 2nd, 2024", "href": "https://ocaml.org/docs/basic-data-types" }, "linmath-h": { "authors": [ "Datenwolf" ], "title": "linmath.h", "date": "September 2nd, 2024", "href": "https://github.com/datenwolf/linmath.h" }, "cglm": { "authors": [ "recp" ], "title": "cglm", "date": "September 2nd, 2024", "href": "https://github.com/recp/cglm" }, "mathc": { "authors": [ "feselva" ], "title": "mathc", "date": "September 2nd, 2024", "href": "https://github.com/felselva/mathc" } }