1. Changelog
1.1. Revision 0 - December 12th, 2024
-
Initial Release! 🎉
2. Introduction and Motivation
During the standardization discussion of
in WG21 in the last year for [P1967] and [P3540], several adjustments were requested to the behavior of
for niche cases. This paper synchronizes the behavior between what WG21 is going to see and what is contained in the current C23/C Working Draft.
The requested synchronizations are as follows:
-
No potential double-expansion of preprocessor parameters allowed in any way.
-
Preprocessor expansion of parameters always happens, not just for
, and it is performed at the point of matching the directive and not during parameter processing.limit -
Make it clear we’re producing a sequence of (preprocessor) tokens, and not necessarily (post-processor, Phase 7) tokens.
-
Adding the extremely-popular and already-implemented
andgnu :: offset
parameters.clang :: offset
The wording below attempts to accomplish all of these things.
3. Wording
This wording is relative to C’s latest working draft.
📝 Editor’s Note: The ✨ characters are intentional. They represent stand-ins to be replaced by the editor.
3.1. Modify §6.10.4.1 to change the expansion behavior of macros
1 A resource is a source of data accessible from the translation environment.
An embed parameter is a single preprocessor parameter in the embed parameter sequence.It has animplementationresource width, which is the implementation-defined size in bits of the located resource.It also has a resource width, which is either:
the number of bits as computed from the optionally-provided limit embed parameter (6.10.4.2), if present; or,the implementation resource width.Constraints
2
An embed parameter sequence is a whitespace-delimited list of preprocessor parameters which can modify the result of the replacement for theLet embed element width be either:preprocessing directive.
#embed Let implementation resource count be
- an integer constant expression greater than zero determined by an implementation-defined embed parameter; or,
(5.3.5.3.2).
CHAR_BIT . Let resource count initially be
( resource width ) / ( embed element width ) . Let resource offset initially be zero. The result of
( implementation resource count ) shall be zero.
( resource width ) % ( embed element width ) ...
📝 IMPORTANT Editor’s Note: Replace all instances of "implementation resource width" with simply "resource width".
📝 IMPORTANT Editor’s Note: Delete all of ❡5 and ❡6, as its constraints have been moved up to just after ❡1 and redundant text has been eliminated, while a new definition for "empty" is defined in the semantics.
Semantics
5✨ A resource is considered empty in one of the following cases:
- its resource count is zero;
- or, its resource offset is greater than the implementation resource count.
76✨ Theexpansionreplacement of adirective is a preprocessor token sequence in the form of a comma-delimited list of integer constant expressions, unless otherwise modified by embed parameters.
#embed formed from the list of integer constant expressions described later in this subclause. The group of tokens for each integer constant expression in the list is separated in the token sequence from the group of tokens for the previous integer constant expression in the list by a comma.The sequence neither begins nor ends in a comma.If the list of integer constant expressions is empty, the token sequence is empty. The directive is replaced by its expansion and, with the presence of certain embed parameters, additional or replacement token sequences.If the resource is empty, then the directive is not replaced by the comma-delimited list of integer constant expressions representing the resource. Otherwise, the resource offset indicates the firstvalues (which would have been placed in the comma-delimited list had the resource offset been equivalent to zero) are discarded, ignored, and not part of the list. There shall be $max(0, min((resource\ count), (implementation\ resource\ count) - (resource\ offset)))$ integer constant expressions in the comma-delimited list, where $max$ and $min$ select the maximum and minimum value between 2 provided values, respectively. The value of each integer constant expression is determined in an implementation-defined manner, and is in the range from $0$ to $2^{embed\ element\ width} − 1$, inclusive.FOOTNOTE(For example, an embed element width of 8 will yield a range of values from 0 to 255, inclusive.) If:
( resource offset ) then the contents of the initialized elements of the array are as-if the resource’s binary data represented by the resource offset and the resource count, as a file, is
- the list of integer constant expressions is used to initialize an array of a type compatible with
, or compatible with
unsigned char if
char cannot hold negative values; and,
char - the embed element width is equal to
(5.3.5.3.2),
CHAR_BIT (7.23.8.1) into the array at translation time.
fread ...
1110✨ Either form of thedirective shall process the preprocessor balanced token sequence of any embed parameter in the optional embed parameter sequence as in normal text, unless otherwise specified further in this subclause.
#embed specified previously behaves as specified later in this subclause. The values of the integer constant expressions in the expanded sequence are determined by an implementation-defined mapping of the resource’s data. Each integer constant expression’s value is in the range from 0 to (2embed element width) − 1, inclusive.207) If:
the list of integer constant expressions is used to initialize an array of a type compatible with, or compatible with
unsigned char if
char cannot hold negative values; and,
char the embed element width is equal to(5.3.5.3.2),
CHAR_BIT then the contents of the initialized elements of the array are as-if the resource’s binary data is(7.23.8.1) into the array at translation time.
fread 12✨ NOTE If the directive is processed as in normal text because it doesn’t match the first two forms but matches the third, processing as in normal text happens once and only once for the entire directive, including its parameters.
1211✨ ... The preprocessing tokens after embed in the directive are processed just as in normal text. (Each identifier currently defined as a macro name is replaced by its replacement list of preprocessing tokens.) The directive resulting after all replacements shall match one of the two previous forms. If the directive matches one of the two previous forms after the directive is processed as in normal text, any further processing as in normal text described for the two previous forms is not performed. The method by which a sequence of preprocessing tokens between aand a
< preprocessing token pair or a pair of
> characters is combined into a single resource name preprocessing token is implementation-defined.
"
13✨ EXAMPLE If the directive matches one of the first two forms, then processing as in normal text only applies to the preprocessor balanced token sequence of any embed parameters. If the directive matches the third form, then processing as in normal text applies to the entire directive:
#define offset(ARG) limit(ARG) #define prefix(ARG) suffix(ARG) #define THE_ADDITION "teehee" #define THE_RESOURCE ":3c" #embed ":3c" offset(2) prefix(THE_ADDITION) #embed THE_RESOURCE offset(2) prefix(THE_ADDITION) is equivalent to:
#embed ":3c" offset(2) prefix("teehee") #embed ":3c" limit(2) suffix("teehee")
3.2. Modify §6.10.4.1 Semantics, ❡12 (now ❡13) to add a new embed parameter
An embed parameter with a preprocessor parameter token that is one of the following is a standard embed parameter:
limit prefix suffix if_empty offset
3.3. Modify §6.10.4.2 "limit
parameter"'s macro expansion rules in Semantics, ❡3 and ❡4
...
3The standard embed parameter
with a preprocessor parameter tokendenotes a balanced preprocessing token sequence
limit that will be used to compute the resource width.whose integer constant expression becomes the new value for the resource’s resource count defined in 6.10.4.1. The integer constant expression is evaluated using the rules specified for conditional inclusion (6.10.2), but without doing any further processing as in normal text.Independently of any macro replacement done previously (e.g. when matching the form of #embed), the constant expression is evaluated after the balanced preprocessing token sequence is processed as in normal text, using the rules specified for conditional inclusion (6.10.2), with the exception that any defined macro expressions are not permitted.4The resource width is:4✨The resource count is set to:
0, if the integer constant expression evaluates to 0; or,the implementation resource width if it is less than the embed element width multiplied by the integer constant expression; or,the embed element width multiplied by the integer constant expression, if it is less than or equal to the implementation resource width.
- 0, if the integer constant expression evaluates to 0;
- or, the implementation resource count if the integer constant expression is greater than the the implementation resource count;
- or, the integer constant expression, if it is less than or equal to the implementation resource count.
3.4. Add a new section §6.10.4.3 "offset
parameter"
6.10.4.3parameter
offset ConstraintsThe
standard embed parameter may appear zero times or one time in the embed parameter sequence. Its preprocessor argument clause shall be present and have the form:
offset
( constant-expression )
and shall be an integer constant expression. The integer constant expression shall not evaluate to a value less than 0.
The token
shall not appear within the preprocessor balanced token sequence.
defined SemanticsThe integer constant expression is evaluated using the rules specified for conditional inclusion (6.10.2), but without doing any further processing as in normal text.
The
standard embed parameter denotes a balanced preprocessing token sequence whose integer constant expression becomes the value of the resource’s resource offset as defined in 6.10.4.1. The integer constant expression is evaluated using the rules specified for conditional inclusion (6.10.2), but without doing any further processing as in normal text.
offset