1. Revision History
1.1. Revision 0 - March 23rd, 2025
-
Initial release. ✨
2. Introduction & Motivation
It is impossible to use anything other than an
for the precision (size) of a string specifier, whether it’s used with
or
. Normally, this should not be a problem because
and many other
and other I/O functions in C only ever return
. The problem is, most:
-
containers
-
strings
-
size calculations
-
stream offsets
-
large buffer indices
-
andcountof (...)
operationssizeof (...)
and so much more are not
-typed. This results in a lot of excessive (and, in some ways, dangerous) casting for working with the I/O output functions. The simple, easy-integration fix is to simply allow precision with
to include a size modifier, such that while
is a string sized by an
,
represents a string sized by a
.
It is also important for strings that are not null terminated, such as substring functionality and parsing/searching. Needing to make sure things are null terminated is a huge burden, and while the
precision modifier helps, the constant casting hides potential overflow errors from high quality of implementation libraries and makes its use dubious.
This proposal is to allow the typical integer length modifiers (
,
,
,
,
,
,
, and
) to be applied to the precision modifier when the precision modifier uses an asterisk (i.e.,
).
3. Design
Given the following grammar (using the notation from POSIX, where things enclosed in
are optional):
-
% [ argument$ ] [ flags ] [ width ] [ . precision ] [ length modifier ] conversion - specifier
(
is a POSIX extension), then the logical place in the grammar to place the
that applies specifically to the
argument is:
-
% [ argument$ ] [ flags ] [ width ] [ . [ length modifier ] precision ] [ length modifier ] conversion - specifier
This is the easiest place for this to be where it won’t be ambiguous. In particular, placing it in other locations could have it confused for a
, and putting it up ahead of the
/
but having it apply to the
itself means that we would preclude having such a modifier on
itself. (This paper does not propose this for
, just for asterisk-based
precision).
Therefore, this design slots it into the one place it can have no negative impact and would be unambiguous: after the
, but before the
of precision:
extern size_t big_honkin_number ; int main () { char * str = malloc ( big_honkin_number ); // ... int result = printf ( "%.z*s" , big_honkin_number , str ); // no cast needed // ... free ( str ); return 0 ; }
3.1. "But fprintf
and friends only return int
, isn’t this a problem?"
Thankfully, this is actually less of a problem than was previously surmised. In fact, this proposal actively makes it less of a problem than the cast-based solution. Consider the existence of a
file that can be written to and this program:
#include <stdio.h>#include <stdlib.h>#include <limits.h>#include <assert.h>int main () { enum { COUNT = 10 , BYTESIZE = INT_MAX / COUNT }; char * str = ( char * ) malloc ( BYTESIZE + 1 ); for ( size_t i = 0 ; i < BYTESIZE ; ++ i ) { str [ i ] = 'a' ; } str [ BYTESIZE ] = '\0' ; FILE * f = fopen ( "/dev/null" , "w+" ); [[ maybe_unused ]] int write_value = fprintf ( f , "%.*s" , BYTESIZE , str ); [[ maybe_unused ]] int large_write_value = fprintf ( f , "%.s %.s %.s %.s %.s %.s %.s %.s %.s %.s %*.s" , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str , BYTESIZE , str ); free ( str ); assert ( write_value == BYTESIZE ); // Well. assert ( large_write_value < 0 ); // ... Okay. return 0 ; }
For both
and
, the individual sizes of the strings are not what is ultimately the problem here. In fact, each of these is an
-typed value (as per the rules for
constants and their values in both old and new C) are fully within the bounds. But,
effectively creates a situation where, over the course of the 11 strings written, the last write is large enough that it triggers overflow.
While there is no hard requirement in any standard that mandates rigorous checking, most implementations do check if the write will eventually overflow the
and either return
with an appropriate
value or some other negative value. There is no constraint or recommended practice to check for overflow, but glibc, musl-libc, and many more can and do check for this case and report it. We see here that even with purely
-typed writes, we get the same error to happen on these platforms: all of them return a negative integer value.
What this means, ultimately, is that it is not the type length that matters more, but the actual value!
This proposal cannot change the return value’s type (as that is an ABI break), but allowing a
type for the length modifier is actually an improvement to security. Since most implementations are doing value/overflow checking here, being able to pass in a (too-large)
directly and letting the overflow checks inherit in most implementations catch it and return a negative number. For example, observe the following (too large) string being written, but written in the "typical" way that string sizes get passed to formatted I/O functions like
:
#include <stdio.h>#include <stdlib.h>#include <limits.h>#include <assert.h>int main () { const size_t BYTESIZE = (( size_t ) INTMAX ) + 1 ; char * str = ( char * ) malloc ( BYTESIZE + 1 ); for ( size_t i = 0 ; i < BYTESIZE ; ++ i ) { str [ i ] = 'a' ; } str [ BYTESIZE ] = '\0' ; FILE * f = fopen ( "/dev/null" , "w+" ); [[ maybe_unused ]] int write_value = fprintf ( f , "%.*s" , ( int ) BYTESIZE , str ); free ( str ); assert ( write_value < 0 ); // might not trigger, actually! return 0 ; }
This is an error. But, we will never see it as an error anymore: the explicit cast inserted into the code for the express purpose of matching the type means that the error is now hidden from us. We can avoid hard-to-detect truncation errors that happen from potential
code. Rather than (erroneously) casting and truncating the value of a
into an
type or similar, it will instead be actually checked by
,
, and similar.
This is a notably improvement because
is seen as an explicit choice on the part of the developer, made to silence warnings. Casting is too big of a hammer and too large of a club for this feature set; supplying the size without truncation directly to the function allows for existing quality of implementation to catch this error:
#include <stdio.h>#include <stdlib.h>#include <limits.h>#include <assert.h>int main () { const size_t BYTESIZE = (( size_t ) INTMAX ) + 1 ; char * str = ( char * ) malloc ( BYTESIZE + 1 ); for ( size_t i = 0 ; i < BYTESIZE ; ++ i ) { str [ i ] = 'a' ; } str [ BYTESIZE ] = '\0' ; FILE * f = fopen ( "/dev/null" , "w+" ); [[ maybe_unused ]] int write_value = fprintf ( f , "%.z*s" , BYTESIZE , str ); free ( str ); assert ( write_value < 0 ); // triggers on high quality-of-implementation again!! return 0 ; }
This forms the basis of this proposal.
3.2. Other Positions?
There were a couple of other choices for this insofar as where to put the "length modifier" type. Unfortunately, for all of these:
-
"%z.*s" -
"%.*zs"
There can be minor conflicts in the grammar or ambiguity of application. For (1), it’s unclear whether that is meant to apply to a potential
argument or the desired
argument (which determines whether it should be a formatting error or not). This could block future improvements or modifications to the
syntax that would allow for different types for the
argument. It is not being proposed in this paper, however; this paper is concerned mostly with enabling the use case of typical string and substring data.
For (2), the problem is that it’s unclear when parsing certain things, such as
, whether it’s a modifier on the size for the
or it’s the traditional, current meaning as a precision modifier of
type for a
type (e.g.,
-specified padding on a
argument.) Given the grammar, having it appear before the
is both the most grammatically safe and implementable choice (without disambiguation and backwards-compatibility break rules). It also appears before what it modifies -- the
-- which allows a future where some other position can be chosen to modify a potential
modifier or other
extensions.
4. Wording
The following wording is against the latest draft of the C standard.
4.1. Modify §7.23.6.2 "The fprintf
function"
7.23.6.2Thefunction
fprintf Synopsis#include <stdio.h>int fprintf ( FILE * restrict stream , const char * restrict format , ...); Description......
Each conversion specification is introduced by the character %. After the %, the following appear in sequence:
...
An optional precision that gives the minimum number of digits to appear for the
,
b ,
B ,
d ,
i ,
o ,
u , and
x conversions, the number of digits to appear after the decimal-point character for
X ,
a ,
A ,
e ,
E , and
f conversions, the maximum number of significant digits for the
F and
g conversions, or the maximum number of bytes to be written for
G conversions. The precision takes the form of a period (
s ) optionally followed
. either by an asterisk * (described later) or by an optional nonnegative decimal integer;by one of:
- an optional length modifier followed by a an asterisk
(described later);
* - an optional length modifier followed by a
and an
u (described later);
* - or, a nonnegative decimal integer.
If only the period is specified, the precision is taken as zero. If a precision appears with any other conversion specifier, the behavior is undefined.
...
As noted previously, a field width
, or precision, or both,may be indicated with an asterisk. A precision may be indicated with an asterisk or a lowercasefollowed by an asterisk.
u In this caseAn asterisks means anargument supplies the field width
int or precision. If the precision is an asterisk, anargument or an argument of signed integer type (indicated by an optional length modifier) supplies the precision. If the precision is a
int followed by an asterisk, an
u argument or an argument of unsigned integer type (indicated by an optional length modifier) supplies the precision. The arguments specifying field width, or precision, or both, shall appear (in that order) before the argument (if any) to be converted. A negative field width argument is taken as a
unsigned int flag followed by a positive field width. A negative precision argument is taken as if the precision were omitted.
- ...
The length modifiers and their meanings are:
hh Specifies that a following ,
b ,
B ,
d ,
i ,
o ,
u , or
x conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to
X or
signed char before printing); or that a following
unsigned char conversion specifier applies to a pointer to a
n argument. If it is followed by an asterisk, then it specifies that the corresponding argument is of type
signed char . If it is followed by a
signed char and then an asterisk, it specifies that the corresponding argument is of type
u .
unsigned char
h Specifies that a following ,
b ,
B ,
d ,
i ,
o ,
u , or
x conversion specifier applies to a
X or
short int argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to
unsigned short int or
short int before printing); or that a following
unsigned short int conversion specifier applies to a pointer to a
n argument. If it is followed by an asterisk then it specifies that the corresponding argument is of type
short int . If it is followed by a
short int and then an asterisk, it specifies that the corresponding argument is of type
u .
unsigned short int (ell)
l Specifies that a following ,
b ,
B ,
d ,
i ,
o ,
u , or
x conversion specifier applies to a
X or
long int argument; that a following n conversion specifier applies to a pointer to a
unsigned long int argument; that a following
long int conversion specifier applies to a
c argument; that a following s conversion specifier applies to a pointer to a
wint_t argument; or has no effect on a following
wchar_t ,
a ,
A ,
e
E , ,
f ,
F , or
g conversion specifier. If it is followed by an asterisk then it specifies that the corresponding argument is of type
G . If it is followed by a
long int and then an asterisk, it specifies that the corresponding argument is of type
u .
unsigned long int (ell-ell)
ll Specifies that a following ,
b ,
B ,
d ,
i ,
o ,
u , or
x conversion specifier applies to a
X or
long long int argument; or that a following
unsigned long long int conversion specifier applies to a pointer to a
n argument. If it is followed by an asterisk then it specifies that the corresponding argument is of type
long long int . If it is followed by a
long long int and then an asterisk, it specifies that the corresponding argument is of type
u .
unsigned long long int
j Specifies that a following ,
b ,
B ,
d ,
i ,
o ,
u , or
x conversion specifier applies to an
X or
intmax_t argument; or that a following n conversion specifier applies to a pointer to an
uintmax_t argument. If it is followed by an asterisk then it specifies that the corresponding argument is of type
intmax_t . If it is followed by a
intmax_t and then an asterisk, it specifies that the corresponding argument is of type
u .
uintmax_t
z Specifies that a following ,
b ,
B ,
d ,
i ,
o ,
u , or
x conversion specifier applies to a size_t or the corresponding signed integer type argument; or that a following
X conversion specifier applies to a pointer to a signed integer type corresponding to
n argument. If it is followed by an asterisk then it specifies that the corresponding argument is of the corresponding signed type of
size_t . If it is followed by a
size_t and then an asterisk, then it specifies that the corresponding argument is of type
u .
size_t
t Specifies that a following ,
b ,
B ,
d ,
i ,
o ,
u , or
x conversion specifier applies to a
X or the corresponding unsigned integer type argument; or that a following
ptrdiff_t conversion specifier applies to a pointer to a
n argument. If it is followed by an asterisk then it specifies that the corresponding argument is of type
ptrdiff_t . If it is followed by a
ptrdiff_t and then an asterisk, then it specifies that the corresponding argument is of the corresponding unsigned type of
u .
ptrdiff_t
wN Specifies that a following ,
b ,
B ,
d ,
i ,
o ,
u , or
x conversion specifier applies to an integer argument with a specific width where
X is a positive decimal integer with no leading zeros (the argument will have been promoted according to the integer promotions, but its value shall be converted to the unpromoted type); or that a following
N conversion specifier applies to a pointer to an integer type argument with a width of
n bits. If it is followed by an asterisk then it specifies that the corresponding argument is of
N -bit integer type. If it is followed by a
N and then an asterisk, it specifies that the corresponding argument is of
u -bit unsigned integer type. All minimum-width integer types (7.22.2.3) and exact-width integer types (7.22.2.2) defined in the header
N shall be supported. Other supported values of N are implementation-defined.
< stdint . h >
wfN Specifies that a following ,
b ,
B ,
d ,
i ,
o ,
u , or
x conversion specifier applies to a fastest minimum-width integer argument with a specific width where
X is a positive decimal integer with no leading zeros (the argument will have been promoted according to the integer promotions, but its value shall be converted to the unpromoted type); or that a following
N conversion specifier applies to a pointer to a fastest minimum-width integer type argument with a width of
n bits. If it is followed by an asterisk then it specifies that the corresponding argument is of
N -bit fastest minimum-width integer type. If it is followed by a
N and then an asterisk, it specifies that the corresponding argument is of
u -bit fastest minimum-width unsigned integer type. All fastest minimum-width integer types (7.22.2.4) defined in the header
N shall be supported. Other supported values of N are implementation-defined.
< stdint . h > If a length modifier appears with any conversion specifier other than as specified previously, the behavior is undefined.