1. Changelog
1.1. Revision 0 - December 12th, 2024
-
Initial release. ✨
2. Introduction and Motivation
Variable Length Arrays (VLAs) are a feature for potentially accessing an implementation-detail of where memory is stored to generate more efficient programs by utilizing a variable amount of stack space. The feature was conceived in the early 1990s with inspiration from Fortran, and eventually brought to C standardization by Tom MacDonald of the Cray Supercomputing group[N317]. Despite getting multiple implementations from 1993 onwards and achieving all of the necessary details for C standardization — including a detailed rationale answering open questions during its standardization process in subsequent N-documents — the feature went into the standard in C99 and received deeply mixed response.
Many numerical communities and other processing groups found VLAs to be of great use; kernel communities and application programmers found it fraught with peril as user-controlled variables snuck into the VLAs and threatened to blow out their stacks or — when paired with logic errors — much worse. Others still found their performance characteristics hard to control and inaccessible; different individuals could not get their compilers to provide them with the kind of memory guarantees that they wanted[making-c-less-dangerous].
Implementations withdrew support for even implementing variable length arrays as a required feature, citing its several unspecified corners and its total silence on how to handle memory errors or other issues. This resulted in variable length arrays being made optional — along with many other features — in C11, and since then VLAs have been in a state of flux. The GNU Compiler Collection (GCC) implemented "stack probing" (checking the size of the stack and letting the operating system fail where possible) rather than just blindly blowing out the stack in 2017 (its
option, a much improved version of its initial support). It took until 2020 for the Clang C compiler to support stack probing with their VLAs. Other implementations just aliased VLA usage as closely as possible to
and stack pointer manipulation as they could before phoning the effort in. Others still simply decided to call
for every VLA declared, with a simple crash/
if such an allocation failed.
It is clear now that there is a wide variety of implementation techniques and ways of handling VLAs, but in every attempt this has always been tackled from an implementer perspective. In this paper, we propose allowing the user to control the semantics of the the allocation and deallocation of a VLA. It is much more likely that the user knows the memory profile, binary footprint, and performance characteristics they would like to target, as opposed to the implementation. There are thousands — sometimes, hundreds of thousands — of developers as opposed to compiler developers; it makes sense for users to have control of the feature where they deem it important to their code.
Similarly, we do not infringe upon an implementation’s right to refuse implementing a VLA:
can still be defined by an implementation, and an implementation can still outright reject VLAs if and only if this new, proposed, opt-in extension mechanism is not defined by the user. This allows implementers to implement VLAs in terms of a quick code rewrite rather than needing to worry about stack probing, a source of memory, or any of the other things implementations have cited as problematic for the last two decades.
3. Design
The design of this feature is simple; we need to provide users with the ultimate level of control so that implementations need not concern themselves with the details of allocation and deallocation of variable length Arrays. This eliminates the sole reason that VLAs were made optional in C11 to begin with; implementations of varying sizes struggled with the ability to compute the allocation. If the user is able to control how a VLA is allocated, then all of the most serious technical concerns for the implementation and proliferation of VLAs disappears as the user is stating they will be responsible. This means that even implementations that have
defined and set to
will still be able to access the feature, provided the user declares 2 functions visible at the point of the creation of the VLA:
void * stdc_vla_alloc ( size_t n , size_t align , size_t * out_n ); void stdc_vla_free ( void * p , size_t n , size_t align );
The way this feature works is simple. If both function declarations are present, the compiler is required to use those as the source of memory for the VLA (subject to the usual as-if rules for optimization). The initial memory comes from the return value from
. The memory is freed at the end of the scope (or later) by a matching call to
, as was done before by the implementation. Every exit of the scope, including use of
to a label that is outside of the scope, shall be preceded by a matching call to
if a
was hit. Similarly, the behavior is undefined if the scope is left by using
/
or similar constructs.
If only one of the two functions is visible, then it is a constraint violation. If either function is visible but not of a compatible function signature, then it is a constraint violation.
If none of the two functions are visible, then the behavior is implementation-defined/unspecified (subject to
, and the location from where the data comes from is unspecified as it is in today’s C standard). Therefore, even if
is defined, the following program will evaluate and execute:
#include <stddef.h>#include <string.h>#include <stdlib.h>void * stdc_vla_alloc ( size_t n , size_t align , size_t * out_n ) { ( void ) align ; void * p = malloc ( n ); if ( ! p ) abort (); * out_n = n ; return p ; } void stdc_vla_free ( void * p , size_t n , size_t align ) { free ( p ); } int main ( int argc , char * argv []) { // works even if __STDC_NO_VLA__ is defined and non-zero. int vla [ argc ]; }
Importantly (for the purposes of optimization), the only hard semantic requirement is that for every one call to
, there is a matching call to
. There does not have to be a call for each VLA, if the implementation wants to e.g. re-use the memory of a previous allocation. See § 3.4 Memory Reuse for more details.
3.1. Can we return a null pointer value from stdc_vla_alloc
?
The pointer returned by the function is always non-null. This is why the
annotation should be used on the function call:
void stdc_vla_alloc ( size_t n , size_t align , size_t * out_n )[ static 1 ]; void stdc_vla_free ( void p [ static 1 ], size_t n , size_t align );
Unfortunately, this syntax is not allowed for
pointers. We do not solve this problem for this paper, but would note that such a fix would be of general interest to make these annotations more useful to a wide variety of functions, especially infallible (impossible to fail) memory and allocation functions.
If someone wishes to handle an error, they must do so inside of the function and handle it by either aborting or jumping out before the function returns. Once the function returns, the program is entirely valid in assuming that the returned pointer is a valid address for the purposes of the VLA. Any error reporting or error checking must be done exclusively by the user. This is because there is simply no syntax when working with a variable length Array to check for success, as e.g.:
int main ( int argc , char * argv []) { int vla [ argc ]; if ( vla ) { return 0 ; } return 1 ; }
will always create a program that returns
as the
check can never fail if execution reaches that point. There is simply no post-facto way to handle a failed VLA allocation in the language today, and that is a oversight we will have to live with for the rest of the duration of the existence of VLAs.
3.2. Allocation Size?
The size passed to the allocation is implementation defined. This is because the implementation controls the ABI for its VLA; it may ask for more memory and different alignment than may be implied by the declaration of the variable length array. For example, given the following program:
#include <stddef.h>#include <string.h>void * stdc_vla_alloc ( size_t n , size_t align , size_t * out_n ); void stdc_vla_free ( void * p , size_t n , size_t align ); int compute_sum ( size_t data_size , int * data ); int main ( int argc , char * argv []) { /* 0. … */ int vla [ argc ]; /* 1. use `vla` here … */ for ( int i = 0 ; i < ( sizeof ( vla ) / sizeof ( vla [ 0 ])); ++ i ) { vla [ i ] = strlen ( argv [ i ]); if ( vla [ i ] > 255 ) { return -1 ; } } int sum = compute_sum (( sizeof ( vla ) / sizeof ( vla [ 0 ])), vla ); /* 2. … */ return sum ; }
The equivalent de-sugared program may look as follows:
#include <stddef.h>#include <string.h>void * stdc_vla_alloc ( size_t n , size_t align , size_t * out_n ); void stdc_vla_free ( void * p , size_t n , size_t align ); int compute_sum ( size_t data_size , int * data ); int main ( int argc , char * argv []) { /* 0. … */ int $__vla_size ; int ( * vla )[ argc ]; vla = ( typeof ( vla )) stdc_vla_alloc ( sizeof ( vla [ 0 ]) + sizeof ( size_t ) /* implementation-defined */ , alignof ( size_t ) /* implementation-defined */ , & $__vla_size ); /* 1. use `vla` here … */ for ( int i = 0 ; i < ( sizeof (( vla [ 0 ])) / sizeof (( vla [ 0 ])[ 0 ])); ++ i ) { vla [ i ] = strlen ( argv [ i ]); if (( vla [ 0 ])[ i ] > 255 ) { stdc_vla_free ( vla , $__vla_size , alignof ( size_t )); return -1 ; } } int sum = compute_sum (( sizeof (( vla [ 0 ])) / sizeof ( * ( vla [ 0 ]))), ( vla [ 0 ])); /* 2. … */ stdc_vla_free ( vla , $__vla_size , alignof ( size_t )); return sum ; }
As shown in this de-sugared program, the VLA may have a size that is, practically and conceptually, different from the length of the variable length array retrieved by
. Therefore, it is important that there is an output parameter for
that can adequately track such information if necessary.
3.3. What If A VLA Is Lowered To A C Array?
A frequent question in the face of this proposal is "what happens if an implementation is smart enough to lower the VLA to a C-style, fixed-size array?". The answer here is simple: it is already implementation-defined if an array is considered a VLA or not, due to the nature of the rules of Constant Expressions. This was clarified in C23[n3138] that such arrays may or may not be VLAs. Implementations can continue to lower VLAs into fixed-size, C-style arrays and declare them non-VLAs and thus avoid needing to invoke any of this infrastructure. This proposal does not prevent any class of optimizations, as the implementation is still within full control when it needs to be.
3.4. Memory Reuse
It is important to note that, for this feature, the only hard requirement is that for every call to
, there is a matching call to
. It does not mean there must be one call to
for the start and end of every VLA’s object lifetime. For example, given the for loop in the below program:
const int vla_n = 30 ; int main () { const int n = 50 ; for ( int i = 0 ; i < n ; ++ i ) { double vla [ vla_n ] = {}; /* use VLA */ } return 0 ; }
The program may opt to only allocate the VLA once, resulting in a de-sugaring that looks something close to:
const int vla_n = 30 ; int main () { const int n = 50 ; size_t $__vla_size ; double ( * vla )[ vla_n ]; vla = ( typeof ( vla )) stdc_vla_alloc ( sizeof ( vla [ 0 ]) + sizeof ( size_t ) /* implementation-defined */ , alignof ( size_t ) /* implementation-defined */ , & $__vla_size ); for ( int i = 0 ; i < n ; ++ i ) { for ( size_t __init_vla = 0 ; __init_vla < sizeof (( vla [ 0 ])); ++ __init_vla ) { ( vla [ 0 ])[ __init_vla ] = ( double ){}; } /* use VLA */ } stdc_vla_free ( vla , $__vla_size , alignof ( size_t )); return 0 ; }
This results in a single call to
paired with a single call to
, instead of 30 paired function calls. This is legal so long as the semantics remain identical. Users are subject to unspecified behavior if they attempt to rely on the number of times
is called, as the compiler may unroll, fuse, or otherwise reuse memory allocations for whatever purposes it sees fit.
4. Interaction with Other (Potential) Language Features / Implementations
There are many other (future) language features and implementation-specific features that can allow this feature to really blossom. Below are a select sampling of such techniques and a brief discussion of each such.
4.1. alloca
and stack-probing
Below is an implementation of manual stack checking that works on MSVC-based platforms as well as GCC and Clang-based platforms with the pthreads library (with
(non-portable) extensions present).
/////////////////////// // platform boilerplate /////////////////////// #define _GNU_SOURCE #define WIN32_LEAN_AND_MEAN #if defined(_MSC_VER) #define MY_FLATTEN __forceinline #define MY_OUTLINE __declspec(noinline) #include <malloc.h>#include <windows.h>#elif defined(__clang__) || defined(__GNUC__) #define MY_FLATTEN [[gnu::flatten]] #define MY_OUTLINE [[gnu::noinline]] #else #define MY_FLATTEN #define MY_OUTLINE #error "unsupported platform: do not how to inline " \ "function call into parent function call on this vendor" #endif #if defined(_REENTRANT) && (_REENTRANT == 1) && \ __has_include(<pthread.h>) #define MY_PTHREAD_H 1 #include <pthread.h>#else #define MY_PTHREAD_H 0 #endif #include <stddef.h>#include <stdint.h>MY_OUTLINE bool my_is_stack_available ( size_t amount , size_t alignment ) { // TODO: support alignment #if defined(_MSVC_VER) // https://devblogs.microsoft.com/oldnewthing/20200610-00/?p=103855 ULONG_PTR low = 0 , high = 0 ; GetCurrentThreadStackLimits ( & low , & high ); ptrdiff_t remaining = ( ULONG_PTR )( & low ) - low ; ptrdiff_t available = high - low ; if ( remaining > available ) { // Ssssshhhooould not be possible?! // Something is horrifically wrong here...! __fastfail ( FAST_FAIL_INCORRECT_STACK ); } return remaining >= amount ; #elif MY_PTHREAD_H char * low_stack_addr ; size_t stack_size ; pthread_attr_t attr ; int getattr_res = pthread_getattr_np ( pthread_self (), & attr ); if ( getattr_res != 0 ) { return false; } int getstack_res = pthread_attr_getstack ( & attr , ( void ** ) & low_stack_addr , & stack_size ); if ( getstack_res != 0 { return false; } // some nerd will scream about provenance or whatever, I'm sure char * local_address_guess = (( char * )( void * ) & low_stack_addr ); ptrdiff_t remaining = local_address_guess - low_stack_addr ; if ( remaining > stack_size ) { // Absolutely should NOT be possible?! abort (); } return remaining >= amount ; #else # error "cannot determine current stack size: insufficient hacks" #endif } /////////////////////////// // User-Defined VLA Control /////////////////////////// #include <stddef.h>#include <stdlib.h>MY_FLATTEN inline void * stdc_vla_alloc ( size_t size , size_t alignment , size_t * actual_size ) { if ( ! my_is_stack_available ( size , alignment )) { abort (); return nullptr ; } * actual_size = size ; #if defined(_MSC_VER) return __alloca ( size ); #elif defined(__clang__) || defined(__GNUC__) return __builtin_alloca_with_align ( size , alignment ); #endif } MY_FLATTEN inline void stdc_vla_free ( void * ptr , size_t size , size_t alignment ) { // nothing, it's alloca } /////////////// // main program /////////////// extern int n ; int main () { // we are in compiler that doesn't support VLAs (e.g., MSVC) static_assert ( __STDC_NO_VLA__ != 0 , "this will work even if VLAs are not present" ); // because both stdc_vla_alloc and stdc_vla_free are available, // VLA will use that to retrieve memory // and ignore whatever implementation does int vla [ n ] = {}; // use as normal... /* … */ return 0 ; }
4.2. With Transparent Aliases
With Transparent Aliases, a future feature that is not yet in Standard C, the VLA allocation can provide a locally-specific allocation function that is not visible to other scopes. For example:
static_assert ( __STDC_NO_VLA__ != 0 , "this will work even if VLAs are not present" ); void * my_vla_alloc ( size_t n , size_t align , size_t * out_n ); void my_vla_free ( void * p , size_t n , size_t align ); int main ( int argc , char * argv []) { // aliases the 2 required function calls _Alias stdc_alloc_vla = my_alloc_vla ; _Alias stdc_free_vla = my_free_vla ; // uses my_vla_alloc int meow [ argc ]; // calls my_vla_free return 0 ; } int f ( int n ) { // Constraint violation: implementation does not support VLAs. int meow [ n ]; return 0 ; }
The VLA in
will compile, link, and run (provided a definition of
and
are in the final program). Conversely,
is a constraint violation and thus the program may not compile, link, or run.
5. Wording
Wording is relative to the latest draft of the C standard.
6.✨ Variable length array storage controlDescriptionA program can control the storage of variable length arrays by having two identifiers visible in the current scope with the names
and
stdc_vla_alloc that are function designators or function pointers compatible with a specific function type.
stdc_vla_free For any variable length array object, instead of utilizing the conditionally-supported and otherwise unspecified allocation method for that object, an implementation shall instead allocate its storage using the
function. For every invocation to
stdc_vla_alloc made, an invocation of
stdc_vla_alloc is paired with it to allow a program to free the storage of the variable length array.
stdc_vla_free ConstraintsIf one of
or
stdc_vla_alloc is visible at a given scope when a variable length array is declared, then the other shall also be visible at that same scope.
stdc_vla_free
shall have a (pointer to) function type compatible with
stdc_vla_alloc .
void * ( size_t n , size_t align , size_t actual_size [ static 1 ]) shall have a (pointer to) function type compatible with
stdc_vla_free . If any of them are function pointers, those function pointers shall not be a null function pointer.
void ( void * ptr , size_t n , size_t align ) The
function must return a valid pointer that has at least
stdc_vla_alloc bytes and has the requested or compatible alignment
n . The actual usable size of the allocation must be written into the
alignment parameter, representing the usable size in bytes of the allocation.
actual_size NOTE A valid implementation of
can terminate the program on failure to uphold the requirements. An implementation is allowed to use the provided allocation for any purpose or reason consistent with the usable size, alignment, and storage duration of the corresponding variable length array object.
stdc_vla_alloc The implementation shall pair every successful execution of
with a corresponding invocation of
stdc_vla_alloc , unless a non-local jump (7.13) or program termination occurs.
stdc_vla_free SemanticsRegardless of the presence or value of
, if all of the previously-specified constraints and requirements in this subclause are met in a given scope, a program can declare, initialize, and otherwise use a variable length array object as specified in this document for that scope. If the constraints and requirements are not met, than this clause does not apply and the program is subject to the conditional support for variable length arrays and the unspecified nature of its allocation.
__STDC_NO_VLA__ For the duration of its lifetime (6.2.4), a variable length array object shall have its storage provided through a successful call to
, and at the end of its lifetime will be freed with
stdc_vla_alloc .
stdc_vla_free The behavior is undefined if execution returns from a call to
and the returned value is a null pointer, does not provide enough storage, does not appropriately set the size of the allocation in
stdc_vla_alloc , or is not suitably aligned.
* actual_size The
function must free memory previously allocated by
stdc_vla_free .
stdc_vla_alloc NOTE Parameters of variable-length array type are equivalent to pointers, and do not need to use
or
stdc_vla_alloc .
stdc_vla_free EXAMPLE An implementation is not guaranteed to call
along with the corresponding
stdc_vla_alloc for every single variable length array object if it has access to suitable storage already available, including if it has access to data previously provided by
stdc_vla_free . The following:
stdc_vla_alloc #include <stddef.h>const int vla_n ; int main () { extern void * stdc_vla_alloc ( size_t n , size_t align , size_t actual_size [ static 1 ]); extern void stdc_vla_free ( void * p , size_t n , size_t align ); const int n = 50 ; for ( int i = 0 ; i < n ; ++ i ) { double vla [ vla_n ] = {}; /* use VLA */ } return 0 ; } has a valid interpretation equivalent to:
#include <stddef.h>extern const int vla_n ; int main () { extern void * stdc_vla_alloc ( size_t n , size_t align , size_t actual_size [ static 1 ]); extern void stdc_vla_free ( void * p , size_t n , size_t align ); const int n = 50 ; size_t vla_size ; double ( * pvla )[ vla_n ]; pvla = ( typeof ( pvla )) stdc_vla_alloc ( sizeof ( pvla [ 0 ]) /* hypothetical */ , alignof ( double ) /* hypothetical */ , & vla_size ); for ( int i = 0 ; i < n ; ++ i ) { for ( size_t init_vla = 0 ; init_vla < sizeof ( pvla [ 0 ]); ++ init_vla ) { ( pvla [ 0 ])[ __init_vla ] = ( double ){}; } double * vla = pvla [ 0 ]; /* use VLA */ } stdc_vla_free ( pvla , vla_size , alignof ( double )); return 0 ; }
is only called once for the entire duration of this program, despite the variable length array being declared and used within a
stdc_vla_alloc loop 50 times during program execution.
for EXAMPLE Assuming an implementation where
is defined and a non-zero value: let there be some function named
__STDC_NO_VLA__ that provides storage that is automatically reclaimed by the implementation when provided a size and an alignment; and, let there be some attribute named
alloca that behaves as-if the marked function, when called, has its function body written directly in the calling location. The following is a valid program that will translate and execute without error:
some :: flatten #include <stddef.h>[[ some :: flatten ]] inline void * stdc_vla_alloc ( size_t size , size_t alignment , size_t actual_size [ static 1 ]) { * actual_size = size ; return alloca ( size , alignment ); } [[ some :: flatten ]] inline void stdc_vla_free ( void * ptr , size_t size , size_t alignment ) { // nothing; implementation automatically reclaims this memory } int main ( int argc , char * argv []) { int vla [ argc + 1 ] = {}; return vla [ 0 ]; } EXAMPLE Assuming an implementation where
is defined and a non-zero value: the following will translate successfully until the body of function
__STDC_NO_VLA__ .
f #include <stddef.h>void * my_vla_alloc ( size_t n , size_t align , size_t actual_size [ static 1 ]); void my_vla_free ( void * p , size_t n , size_t align ); int main ( int argc , char * argv []) { static_assert ( __STDC_NO_VLA__ != 0 , "this will still work" ); typeof ( my_vla_alloc ) * stdc_alloc_vla = my_alloc_vla ; typeof ( my_vla_free ) * stdc_free_vla = my_free_vla ; // uses my_vla_alloc int meow [ argc ]; // ok // calls my_vla_free return 0 ; } int f ( int n ) { static_assert ( __STDC_NO_VLA__ != 0 , "this will fail" ); int meow [ n ]; // constraint violation: // implementation does not support VLAs return 0 ; }