N2927
Not-so-magic - typeof for C

Draft Proposal,

Previous Revisions:
N2899 (r4), N2724 (r3), N2685 (r2), N2619 (r1), N2593 (r0)
Authors:
Paper Source:
GitHub
Issue Tracking:
GitHub
Project:
ISO/IEC JTC1/SC22/WG14 9899: Programming Language — C
Proposal Category:
Change Request, Feature Request
Target:
General Developers, Library Developers

Abstract

Getting the type of an expression in Standard C code.

1. Changelog

1.1. Revision 5 - February 2nd, 2022

1.2. Revision 4 - January 1st, 2022

1.3. Revision 3 - May 15th, 2021

Keyword Options:

Use _Typeof keyword, with <stdtypeof.h> header. 6/7/5

Use typeof keyword, no header. 16/2/1

Use some other spelling (qualified_typeof, or similar). 1/14/3

This was very strong direction to use the keywords directly, and not use an alternate spelling.

On the subject of using Expressions / types within typeof/remove_quals.

typeof with type names going in, in addition to expressions (voting "No" means no type names, just expressions) 17/1/4

remove_quals applied to expressions, in addition to type names (voting No means no expressions are allowed) 11/2/5

This was very strong direction to allow both types and expressions in both constructs.

1.4. Revision 2 - March 7th, 2021

1.5. Revision 1 - December 5th, 2020

1.6. Revision 0 - October 25th, 2020

2. Introduction & Motivation

typeof is a extension featured in many implementations of the C standard to get the type of an expression. It works similarly to sizeof, which runs the expression in an "unevaluated context" to understand the final type, and thusly produce a size. typeof stops before producing a byte size and instead just yields a type name, usable in all the places a type currently is in the C grammar.

There are many uses for typeof that have come up over the intervening decades since its first introduction in a few compilers, most notably GCC. It can, for example, help produce a type-safe generic printing function that even has room for user extension (see example implementation). It can also help write code that can use the expansion of a macro expression as the return type for a function, or used within a macro itself to correctly cast to the desired result of a specific computation’s type (for width and precision purposes). The use cases are vast and endless, and many people have been locking themselves into implementation-specific vendorship. This keeps their code out of other compilers (for example, Microsoft’s Visual C Compiler) and weakens the ISO C ecosystem overall.

3. Implementation & Existing Practice

Every implementation in existence since C89 has an implementation of typeof. Some compilers (GCC, Clang, EDG, tcc, and many, many more) expose this with the implementation extension typeof. But, the Standard already requires typeof to exist. Notably, with emphasis (not found in the standard) added:

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. — [N2596, Programming Languages C - Working Draft, §6.5.3.4 The sizeof and _Alignof operators, Semantics](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf)

Any implementation that can process sizeof("foo") is already doing sizeof(typeof("foo")) internally. This feature is the most "existing practice"-iest feature to be proposed to the C Standard, possibly in the entire history of the C standard. The feature was also mentioned in an "extension round up" paper that went over the state of C Extensions in 2007[^N1229]. typeof was also considered an important extension during the discussion of that paper, but nobody brought forth the paper previously to make it a reality.

3.1. Corner cases: Variably Modified Types and VLAs

Putting a normal or VLA-type computation results in an idempotent type computation that simply yields that type in most implementations that support the feature. If the compiler supports Variable Length Arrays, then __typeof -- if it is similar to GCC, Clang, tcc, and others -- it is already supported with these semantics. These semantics also match how sizeof would behave (computing the expression or having an internal placeholder "VLA" type), so we propagate that same ability in an identical manner.

Notably, this is how current implementations evaluate the semantics as well. The standard claims that whether or not any computation done for Variably Modified Types -- with side effects -- is actually unspecified behavior, so there’s no additional guarantees about the evaluation for such types.

3.2. Taking both expressions and types

The goal was to be compatible with sizeof(...), which takes both expressions and types. Existing __typeof(...) expressions also take this design choice. We see this as a good thing, since it is compatible with the usage of typeof(...) extensions in existing Macros and code, where occasionally programmers use type names directly into these macros with the fore-knowledge that it will be used exclusively in __typeof(...) or sizeof(...) operations.

3.3. Why not "decltype"?

C++ has a feature it calls decltype(...), which serves most of the same purpose. "Most" is because it has a subtle difference which would wreak havoc on C code if it was employed in shared header code:

int value = 20;

#define GET_TARGET_VALUE (value)

inline decltype(GET_TARGET_VALUE) g () {
	return value;
}

int main () {
	int& r = g();
	return r;
}

The return type of g would be int& in C++, and int in C. Other expressions, such as array indexing and pointer dereferencing, also have this same issue. This is due to the parentheses in the expression. Macros in both languages frequently see extra parentheses employed around expressions to prevent mixing of precedence or other shenanigans from token-based macro expansion and subsequent language parsing; this would be a footgun of large proportions for C and C++ users, and create a divergence in standard use that would rise to the level of a liaison issue that may become unfixable. This is also part of the reason why decltype was given that keyword in C++, and not typeof: they did not want this kind of subtle and brutal change to afflict C and C++ code. typeof does not have this problem because -- if a Sister Paper ever proposes it for C++ -- it will have identical behavior to std::remove_reference_t<decltype(T)>.

This was also addressed when C++ was itself trying to introduce dectlype and competing with typeof in WG21 for C++.

3.4. C++ Compatibility

A similar feature should be proposed in C++, albeit it will likely take the keyword name typeof rather than _Typeof. This paper intends to have a similar paper brought before the C++ Committee -- WG21 -- through its Liaison Study Group, if this paper is successful.

3.5. Qualifiers

There is some discussion about what happens with qualifiers, both standard and implementation-defined. For example, "Named Address Space" qualifiers are subject to issues with GCC"s typeof extension, as shown here. The intention of one of the GCC maintainers from that thread is:

Well, I think we should fix typeof to not retain the address space. It’s probably our implementation detail of having those in TYPE_QUALS that exposes the issue... — Richard Biener, GCC Maintainer, November 5th, 2020

There is also some disagreement between implementations about what qualifiers are worth keeping with respect to _Atomic between implementations. Therefore, typeof as proposed does not strips all qualifiers from the computed type result. The reason for this is that a user can add specifiers and qualifications to a type, but can not take them away once they are part of the expression. For example, consider the specification of <complex.h> that contains macro-provided constants like _Imaginary_I. These constants have the type const float _Imaginary: should all typeof(_Imaginary_I) expressions therefore result in a const float _Imaginary, or a float _Imaginary? What about volatile? And so on, and so forth.

There is an argument to strip all type qualifiers (_Atomic, const, restrict, and volatile) from the final type expression is because they can be added back by the programmer easily. However, the opposite is not true: you cannot add back qualifiers or create macros where those qualifiers can be taken in as parameters and re-applied to the function. This does leave some room to be desired: some folk may want to deliberately propagate the const-ness, volatile-ness, or _Atomic-ness of an expression to its end users.

3.5.1. Qualifiers - The Solution

Originally, the idea of a _Typeof and an _Unqual_typeof was explored. This was a tempting direction but ultimately unsuitable as it duplicated functionality with a slight caveat and did not have a targeted purpose. A much better set name for the functionality is typeof and remove_quals. typeof is an all-qualifier-preserving type reproduction of the expression (or pass-through if a type is given) . It suitably envelopes the total space of existing practice. The only reason _Unqual_typeof would exist is to... well, remove qualifiers. It only makes sense to just name it appropriately by using remove_quals as a keyword. The benefits of choosing this name are also clear:

This means that we need not entertain the idea of needing a header or some other choice and can simply directly name remove_quals as a keyword in the code instead, saving ourselves a massive debate about what should and should not be a keyword.

3.5.2. In General

Separately, we should consider a Macro Programming facility for C that can address larger questions. This paper strives to focus on the material gains from existing practice and the pitfalls of said existing practice. Therefore, this paper proposes only typeof and remove_quals.

After this paper is handled, further research should be given to handling qualifiers, function types, and arrays in Macros for generic programming. This paper focuses only on what we can find existing practice for.

4. Proposed Changes

The below changes are for adding the two keywords.

4.1. Proposed Wording

The following wording is relative to [N2596].

4.1.1. Modify §6.3.2.1 Lvalues, arrays, and function designators, paragraphs 3 and 4 with footnote 68:

Except when it is the operand of the sizeof operator sizeof, or typeof operators , or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.

A function designator is an expression that has function type. Except when it is the operand of the sizeof operator sizeof operator, a typeof operator 69)or the unary & operator, a function designator with type "function returning type" is converted to an expression that has type "pointer to function returning type".

69)Because this conversion does not occur, the operand of the sizeof operator remains a function designator and violates the constraints in 6.5.3.4.

4.1.2. Add a keyword to the §6.4.1 Keywords:

    _Thread_local
    typeof
    remove_quals

4.1.3. Modify §6.6 Constant expressions, paragraphs 6 and 8:

An integer constant expression125) shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, _Alignof expressions, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof typeof operators, sizeof operator, or _Alignof operator.

...

An arithmetic constant expression shall have arithmetic type and shall only have operands that are integer constants, floating constants, enumeration constants, character constants,sizeof expressions whose results are integer constants, and _Alignof expressions. Cast operators in an arithmetic constant expression shall only convert arithmetic types to arithmetic types, except as part of an operand to the sizeof typeof operators, sizeof operator, or _Alignof operator.

4.1.4. Adjust the footnote 131) in §6.7.1 Storage-class specifiers:

131) Thus, the only operator that can be applied to an array declared with storage-class specifier register is sizeof and the typeof operators.

4.1.5. Adjust the Syntax grammar of §6.7.2 Type specifiers, the paragraph 2 list, and paragraph 4 Semantics:

type-specifier:
    void
    ...
    typedef-name
    typeof-specifier

...

Specifiers for structures, unions, enumerations, and atomic types structures, unions, enumerations, atomic types, and typeof specifiers are discussed in 6.7.2.1 through 6.7.2.4 6.7.2.5 . Declarations of typedef names are discussed in 6.7.8. The characteristics of the other types are discussed in 6.2.5.

4.1.6. Adjust the footnote 133) in §6.7.2.1 Structure and union specifiers:

133)As specified in 6.7.2 above, if the actual type specifier used is int or a typedef-name defined as int, then it is implementation-defined whether the bit-field is signed or unsigned. This includes an int type specifier produced by the use of the typeof specifier (6.7.2.5).

4.1.7. Add a new §6.7.2.5 The Typeof specifiers:

§6.7.2.5     The Typeof specifiers

Syntax

typeof-specifier:
    typeof ( typeof-specifier-argument )
    remove_quals ( typeof-specifier-argument )

typeof-specifier-argument:
    expression
    type-name

The typeof and remove_quals tokens are collectively called the typeof operators.

Constraints

The typeof operators shall not be applied to an expression that designates a bit-field member.

Semantics

The typeof-specifier applies the typeof operators to an expression (6.5) or a type-name. If the typeof operators are applied to an expression, they yield the type-name representing the type of their operand11�0). Otherwise, they produce the type-name with any nested typeof-specifier evaluated 11�1). If the type of the operand is a variably modified type, the operand is evaluated; otherwise, the operand is not evaluated.

All qualifiers (6.7.3) on the type from the result of a remove_quals operation are removed, including the _Atomic qualifier11�2). Otherwise, for typeof operations, all qualifiers are preserved.

11�0) When applied to a parameter declared to have array or function type, the typeof operator yields the adjusted (pointer) type (see 6.9.1).

11�1) If the typeof-specifier-argument is itself a typeof-specifier, the operand will be evaluated before evaluating the current typeof operation. This happens recursively until a typeof-specifier is no longer the operand.

11�2) _Atomic ( type-name ), with parentheses, is considered an _Atomic-qualified type.

4.1.8. Add the following examples to new §6.7.2.5 The Typeof specifier:

EXAMPLE 1 Type of an expression.
The following program:
typeof(1+1) main () {
	return 0;
}
is equivalent to this program:
int main() {
	return 0;
}
EXAMPLE 2 Types and qualifiers.
The following program:
const _Atomic int purr = 0;
const int meow = 1;
const char* const mew[] = {
	"aardvark",
	"bluejay",
	"catte",
};

remove_quals(meow) main (int argc, char* argv[]) {
	remove_quals(purr)           plain_purr;
	typeof(_Atomic typeof(meow)) atomic_meow;
	typeof(mew)                  mew_array;
	remove_quals(mew)            mew2_array;
	return 0;
}
is equivalent to this program:
const _Atomic int purr = 0;
const int meow = 1;
const char* const mew[] = {
	"aardvark",
	"bluejay",
	"catte",
};

int main (int argc, char* argv[]) {
	int               plain_purr;
	const _Atomic int atomic_meow;
	const char* const mew_array[3];
	const char*       mew2_array[3];
	return 0;
}
EXAMPLE 3 Equivalence of sizeof and typeof.
int main (int argc, char* argv[]) {
	// this program has no constraint violations

	_Static_assert(sizeof(typeof('p')) == sizeof(int));
	_Static_assert(sizeof(typeof('p')) == sizeof('p'));
	_Static_assert(sizeof(typeof((char)'p')) == sizeof(char));
	_Static_assert(sizeof(typeof((char)'p')) == sizeof((char)'p'));
	_Static_assert(sizeof(typeof("meow")) == sizeof(char[5]));
	_Static_assert(sizeof(typeof("meow")) == sizeof("meow"));
	_Static_assert(sizeof(typeof(argc)) == sizeof(int));
	_Static_assert(sizeof(typeof(argc)) == sizeof(argc));
	_Static_assert(sizeof(typeof(argv)) == sizeof(char**));
	_Static_assert(sizeof(typeof(argv)) == sizeof(argv));

	_Static_assert(sizeof(remove_quals('p')) == sizeof(int));
	_Static_assert(sizeof(remove_quals('p')) == sizeof('p'));
	_Static_assert(sizeof(remove_quals((char)'p')) == sizeof(char));
	_Static_assert(sizeof(remove_quals((char)'p')) == sizeof((char)'p'));
	_Static_assert(sizeof(remove_quals("meow")) == sizeof(char[5]));
	_Static_assert(sizeof(remove_quals("meow")) == sizeof("meow"));
	_Static_assert(sizeof(remove_quals(argc)) == sizeof(int));
	_Static_assert(sizeof(remove_quals(argc)) == sizeof(argc));
	_Static_assert(sizeof(remove_quals(argv)) == sizeof(char**));
	_Static_assert(sizeof(remove_quals(argv)) == sizeof(argv));
	return 0;
}
EXAMPLE 4 Nested typeof(...).
The following program:
int main (int argc, char*[]) {
	float val = 6.0f;
	return (typeof(remove_quals(typeof(argc))))val;
}
is equivalent to this program:
int main (int argc, char*[]) {
	float val = 6.0f;
	return (int)val;
}
EXAMPLE 5 Variable Length Arrays and typeof operators.
#include <stddef.h>

size_t vla_size (int n) {
	typedef char vla_type[n + 3];
	vla_type b; // variable length array
	return sizeof(
		remove_quals(b)
	); // execution-time sizeof, translation-time typeof operation
}

int main () {
	return (int)vla_size(10); // vla_size returns 13
}
EXAMPLE 6 Nested typeof operators, arrays, and pointers.
int main () {
	typeof(typeof(const char*)[4]) y = {
		"a",
		"b",
		"c",
		"d"
	}; // 4-element array of "const pointer to char"
	return 0;
}
EXAMPLE 7 Function types, pointer types, and array types.
void f(int);

typeof(f(5)) g(double x) {         // g has type "void(double)"
	printf("value %g\n", x);
}

typeof(g)* h;                      // h has type "void(*)(double)"
typeof(true ? g : NULL) k;         // k has type "void(*)(double)"

void j(double A[5], typeof(A)* B); // j has type "void(double*, double**)"

extern typeof(double[]) D;         // D has an incomplete type
typeof(D) C = { 0.7, 99 };         // C has type "double[2]"

typeof(D) D = { 5, 8.9, 0.1, 99 }; // D is now completed to "double[4]"
typeof(D) E;                       // E has type "double[4]" from D’s completed type

4.1.9. Modify §6.7.3 Type specifiers, paragraph 6:

If the same qualifier appears more than once in the same specifier-qualifier list or as declaration specifiers, either directly , via one or more typeof specifiers, or via one or more typedefs, the behavior is the same as if it appeared only once. If other qualifiers appear along with the _Atomic qualifier the resulting type is the so-qualified atomic type.

4.1.10. Modify §6.7.6.2 Array declarators, paragraph 5:

If the size is an expression that is not an integer constant expression: if it occurs in a declaration at function prototype scope, it is treated as if it were replaced by *; otherwise, each time it is evaluated it shall have a value greater than zero. The size of each instance of a variable length array type does not change during its lifetime. Where a size expression is part of the operand of a typeof or sizeof operator and changing the value of the size expression would not affect the result of the operator, it is unspecified whether or not the size expression is evaluated. Where a size expression is part of the operand of an _Alignof operator, that expression is not evaluated.

4.1.11. Modify §6.9 External definitions, paragraphs 3 and 5:

There shall be no more than one external definition for each identifier declared with internal linkage in a translation unit. Moreover, if an identifier declared with internal linkage is used in an expression (other than as a part of the operand of a sizeof or _Alignof operator whose result is an integer constant), there shall be exactly one external definition for the identifier in the translation unit . , unless it is:

  • part of the operand of a sizeof operator whose result is an integer constant;
  • part of the operand of a _Alignof operator whose result is an integer constant;
  • or, part of the operand of any typeof operator whose result is not a variably modified type.

...

An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as a part of the operand of a sizeof, typeof operator whose result is not a variably modified type, or a sizeof or _Alignof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.173)

5. Appendix

The following are old sections or references related to older parts of the proposal that have since been superceded and other interesting, but not critical, information.

5.1. Prior Art in Standardization

The C99 rationale states that:

A proposed typeof operator was rejected on the grounds of insufficient utility.

The times have since changed drastically and typeof(...) became powerfully useful and proved itself as good. Therefore, we are happy to include it. Another paper closer to the release of C11/C17 also came out: [N1229], an omnibus that listed all of the different extensions and evaluated them. There, support was greater for typeof, but nobody came forward with a paper to follow up on Nick Stoughton’s work.

This paper closes the loop on the request that Nick Stoughton did in that analysis as well as many user requests over the intervening more-than-a-decade of time.

5.2. Keyword Name Ideas (from Revision 2)

There are 3 options for names. We have wording for the options using find-and-replace on the TYPEOF_KEYWORD_TOKEN as well as the REMOVE_QUALIFIERS_KEYWORD_TOKEN. The option that provides the most consensus will be what is chosen:

5.2.1. Option 1: _Typeof keyword, <stdtypeof.h> header

This is the relatively conservative option that uses a _Typeof keyword plus <stdtypeof.h> to get access to the convenient spelling. It prevents implementations that have already settled on the typeof() keyword in their extension modes from having to warn users or breakage or deal with that problem. Many have raised issues with this, annoyed at the constant spelling of keywords in fundamentally awkward and strange ways while requiring headers to fix up usage. This is consistent with other new keywords introduced in the Standard to avoid breakage at all costs, but suffers from strong lamentations in needing a header to access a common spelling.

This is the authors' status quo and compromise position.

5.2.2. Option 2: typeof keyword

This is the relatively aggressive (but still milquetoast, overall) option. It takes over the extension that is used in non-conforming C modes in a few compilers, such as XL C and GCC. Maintainers/implementers from GCC and Clang have noted their approval for this option, but e.g. XL C maintainers and implementers are less enthused.

The reason some folks are against this change is because there are "bugs" in the implementation where some qualifiers are preserved, but other implementation-defined qualifiers are not. Most implementations agree that things like _Atomic and volatile should be preserved (and the compiler that did not implement it this way acknowledged that it was, more or less, a mistake). There are also qualifiers that are dropped on some implementations for their vendor-specific extensions. An argument can be made that implementations can continue to do whatever they want with implementation-defined qualifiers as far as typeof is concerned, as long as they preserve the standard qualifiers.

This option is the authors' overwhelmingly strong preference.

5.2.3. Option 3: Use a completely new keyword spelling

This uses a completely novel name to avoid the problem altogether. These names take no interesting space from users or implementers and it is the safest option, though it risks obscurity in what is a commonly anticipated feature. Names for this include:

Choosing this options means picking one of these novel keywords and substituting it for the TYPEOF_KEYWORD_TOKEN spelling in the wording above (not applicable any longer).

This is the authors' least favorite option.

References

Informative References

[N1229]
Nick Stoughton. Potential Extensions For Inclusion In a Revision of ISO/IEC 9899. March 26, 2007. URL: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1229.pdf
[N1267]
ISO/IEC JTC1 SC22 WG14. Meeting Minutes April 2007. November 4th, 2020. URL: https://gcc.gnu.org/pipermail/gcc/2020-November/234119.html
[N1607]
Jaakko Järvi; Bjarne Stroustrup. Decltype and auto (revision 3). February 17th, 2004. URL: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1607.pdf
[N2596]
ISO/IEC JTC1 SC22 WG14 - Programming Languages, C; JeanHeyd Meneide; Freek Wiedijk. N2596: ISO/IEC 9899:202x - Programming Languages, C. December 11th, 2020. URL: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf
[NAMED-ADDRESS-SPACE-BUG]
Uros Bizjak. typeof and operands in named address spaces. December 11th, 2020. URL: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf