N37XA: Functions with Data - Closures in C (A Comprehensive Proposal Overviewing Blocks, Nested Functions, and Lambdas)

1. Changelog

1.1. Revision 2 - December 28^th, 2025

Talk more directly about __builtin_call_with_static_chain, which is sometimes used to call Go-style functions from C § 2.6 __builtin_call_with_static_chain.
Add benchmarks for various existing Closures solutions from the Man or Boy test into § 3.4 Measuring Solution Spaces. Discusses performance and design impacts thorough for the Man or Boy tests and how it relates to other proposals in the conclusions section (§ 3.4.1.3 Conclusions, Comparisons to Other Proposals, and Inferences).
Talk about how type names and non-identified declarators are silly for Capture Functions (§ 3.2.7 Forward Declarations without a name are a bit useless).

1.2. Revision 1 - October 6^th, 2025

Integrate __self_func ([__self_func]) into this paper. Hopefully the reasoning for this is self-evident and nobody asks me justify the obvious in this version of the proposal.
Comment on Function Literals / Local Functions in § 2.5 Function Literals and Local Functions.
Initial wording. ✨
- There is no wording for 6.7.3.4 "Tags" because the type is not accessible from the declarator; the type is generated by the implementation and the entire apparatus of declaration types is covered under the existing rules for declarations of the same identifier, same as they are for functions in a way.
- There is currently a constraint on accessing a capture function’s members with . and -> if there is no definition available at that point. This makes it easier to prevent accessing a value at the wrong time. However, it might not be elegant or useful enough in certain situations. This is paired with § 3.2.6 Forward Declarations Work.
- Unfortunately, the wording is effectively a major surgery to parts of the standard to enable an elegant use. The good news is that it will enable using a "wide function pointer type" easily by just stating it is an invocable type and, particularly, a closure type. Even if we never accept Capture Functions or Lambdas, this is a useful rewrite of the core part of how functions/callables work.
- There is no Clause 7 wording. This will explicitly be a separate paper.

1.3. Revision 0 - July 24^th, 2025

Initial release. ✨
This paper has no wording. It is not fit for standardization at the moment and only tries to thoroughly discuss the motivation and design behind this work.

2. Introduction and Motivation

A colloquial overview (but with a bit less technical depth) of these options is available as a writeup here ([lambdas-nested-functions-block-expressions-oh-my]), though it was written in 2021 before Heap-based trampolines existed and some of the other proposals discussed here existed. We keep it as a gentler, milder introduction to the problem space written with a much less serious prose.

C has had an extremely long and somewhat complicated history of wanting to pair a set of data with a given function call. Early problems first started with the standardization of C89’s qsort, which only took a single function pointer argument and no way to pass data through for additional constraints:

void qsort(
  void* ptr, size_t count, size_t size,
  int (*comp)(const void* left, const void* right)
);

This worked, until -- occasionally -- people wanted to provide modifications to certain behaviors based on local data rather than static data. For example, to modify the sort order of a call to qsort, one would originally have to program it like this in standard C:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

static int in_reverse = 0;

int compare(const void* untyped_left, const void* untyped_right) {
  const int* left = untyped_left;
  const int* right = untyped_right;
  return (in_reverse) ? *right - *left : *left - *right;
}

int main(int argc, char* argv[]) {
  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare);
	
  return list[0];
}

There were multiple limiting factors in getting data into the function outside of using static. Accessing data in the local block was impossible for a compare function that, necessarily, had to be defined outside of main. This necessitated a way of transporting data to the compare function to do the work in a way that would work for qsort. Since it offered no way to transmit user data parameter, other forms of data transfer became commonplace:

static data was the most popular ISO Standard C way of handling this in qsort-style APIs, resulting in a large number of (inline) static global variables throughout early programs using clever tricks to duplicate variables VIA fork-style or loading several translation units with different static variables;
and, _Thread_local in the case of avoiding contention on variables when multithreading was formally introduced circa C11.

Still, for all of its benefits and ease-of-use, these techniques are not perfect. Concerns and problems around (potentially false) sharing data only grew in time as each program had to manage larger and larger swaths of 1980’s-style global variable soups, something that has famously ended up being used as a sign of negative code quality in legal matters. This particular problem was combatted by "reentrant" functions or "reentrancy" requirements, which is where the family of qsort_s and qsort_r-style functions (from Annex K implementations or BSD libraries) came from:

void qsort_s(
  void* ptr, size_t count, size_t size,
  int (*comp)(const void* left, const void* right, void* user),
  void* user
);

void qsort_r(
  void* ptr, size_t count, size_t size,
  int (*comp)(const void* left, const void* right, void* user),
  void* user
);

And they are used like so:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int compare(const void* untyped_left, const void* untyped_right, void* user) {
  const int* in_reverse = (const int*)user;
  const int* left = untyped_left;
  const int* right = untyped_right;
  return (*in_reverse) ? *right - *left : *left - *right;
}

int main(int argc, char* argv[]) {
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  qsort_r(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare, &in_reverse);
	
  return list[0];
}

While this example just has a single int as the type, other instances of using callbacks in this manner have resulted in Type Confusion bugs, where the void* pointer is cast to the wrong type or the wrong callback is used in conjunction with that void* callback type. The lack of type safety occasionally bites people, and given that it’s ferreted through a void* it’s hard to tackle this problem when dozens of little "helper" functions have to litter source code at file-scope. Finally, we add this frivolous example of ISO C (that does not make much sense upon first read):

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

int compare(const void* untyped_left, const void* untyped_right, void* user) {
  const int* in_reverse = (const int*)user;
  const int* left = untyped_left;
  const int* right = untyped_right;
  return (in_reverse) ? *right - *left : *left - *right;
}

compare_fn_t* make_compare(int argc, char* argv[], int* in_reverse) {
  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        *in_reverse = 1;
      } 
    }
  }
	
  return compare;
}

int main(int argc, char* argv[]) {
  int in_reverse = 0;

  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t* compare = make_compare(argc, argv, &in_reverse);
  qsort_r(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare, &in_reverse);
	
  return list[0];
}

The point of the above code is to show that it’s legal to return a function pointer to a normal function call, because it has a duration for the lifetime of the program (static storage duration, basically). This serves as a good proxy for the various Closure types we will be looking at: can they be returned safely from a function? Is there a way to make it return safely from the function? If we didn’t pass &in_reverse in as an argument, but created a local variable inside of make_compare, can it survive the return? These are all important questions to answer, especially as global variables became more discouraged towards C95 and C99 to prevent unintended data clobbering or sharing, and as reentrancy started to become more important. Because of void*’s lack of type safety which enabled unfortunate type confusion bugs, sharing issues, and the lack of locality of the writing of functions, a new extension was cooked up by GCC to handle this program. Similar to Ada and Algol features but spun directly for C at the time, it was a feature lovingly dubbed Nested Functions.

2.1. Preamble

Before we talk about GNU Nested Functions, Apple Blocks, C++ Lambdas, Function Literals, or anything else, there’s a few examples similar to the qsort one that will be used above. The purpose of these examples will be to talk about:

whether or not there’s a way to use these things in expressions by themselves;
whether or not there’s a way to manage the lifetime of an object in a dynamic or safe fashion;
and, whether or not there’s a way to make these things work with a single function pointer, function pointer + void* user data, or otherwise.

The one returning a function pointer seems utterly frivolous in the above for make_compare, but is meant as a proxy to determine how to handle dynamic lifetime of functions, where a closure needs to outlive the scope in which it was created or returned. This serves as a stand-in for code similar to returning things "up" the stack, or where the closure is meant to be invoked at some later point like in asynchronous code. That is how the current landscape is going to be evaluated.

To aid in this evaluation, in the final section of this introduction, we will be filling out this table of features to see what each ecosystem brings to the table in terms of features. There are explanations below for each one in the first column:

Feature	GNU Nested Functions	Apple Blocks	C++-Style Lambdas in C	Function Literals	Local Functions
Capture By-Name
Capture By-Value
Selective Capture
Safe to Return Closure
Relocatable to Heap (Lifetime Management)
Usable Directly as Expression
Forward-Declarable
Immediately Invokable
Convertible to Function Pointer
Convertible to "Wide" Function Type
Access to Non-Erased Object/Type
Access to Captures through Object/Type
Recursion Possible

Capture By-Name: can refer to an object directly, as if working with it as an l-value or dereferencing a pointer to that object. Sometimes also called "capture by reference".
Capture By-Value: can refer to a copy of an object from within the function, at some fixed point in time. The lifetime of the copied object from the capture is tied to the closure rather than the scope it comes from.
Selective Capture: can pick and choose what to capture and how it gets captured.
Safe to Return Closure: part of the dynamic lifetime problem, but is there a way to write this closure type such that it is safe to return?
Relocatable to Heap (Lifetime Management): if it is possible to extend or otherwise change the lifetime of a closure so that it can last longer, either by copying or relocating the closure.
Usable Directly as Expression: whether or not the closure can appear as a function argument or something else.
Forward-Declarable: whether or not the closure can be forward-declared and perhaps used in some ways (e.g., callable as a function but perhaps any related object definition might not be usable).
Immediately Invokable: whether or not the closure can be immediately invoked, usually without naming it.
Convertible to Function Pointer: whether or not the closure can be converted to a function pointer of an identical signature to be called.
Convertible to "Wide" Function Type: whether or not the closure can be converted to a "wide" function pointer type, now or into the future.
Access to Non-Erased Object/Type: can use and store the closure object without erasing the type or the object first.
Access to Captures through Object/Type: access the closure’s captures outside of the closure’s invocable body.
Recursion Possible: can refer to itself in order to create a recursive algorithm in the normal fashion, without needing a special feature like C++'s Deducing This or the proposed __self_func ([__self_func]).

2.2. GNU Nested Functions

Nested Functions ([nested-functions]) were the logical extension of function definition syntax pulled down into the local level. Despite being the oldest functions-with-data attempt (30+ years), it only has one proposal by Dr. Martin Uecker done recently ([n2661]) and is not widely adopted as an extension across C compilers. The goal was very simple, and during the C89 timeframe eminently doable due to the absence of a deep understanding of security implications for machines not yet being actively exploited in both civilian and military contexts. The syntax of a function definition within a block scope created a function that could be called in the obvious way but still reference surrounding block scope objects by name:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int main(int argc, char* argv[]) {
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
  int compare(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  }
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare);
	
  return list[0];
}

The notable benefits of this approach were:

looks, feels, and smells like a regular function definition;
can access local data instead of static data, preventing issues with forked data or shared data;
produced a single function pointer and so did not need an additional user data pointer (typically a void*);
and, places the function doing the comparison work closer to the actual usage site.

Unfortunately, the early and enduring design of this feature -- in order to enable some of the benefits listed above -- very quickly ran afoul of early security concerns, and soon earned itself a big CVE due to the way it worked. In particular, the implementation strategy for a nested function is a brilliant piece of engineering that runs afoul of one of the only enduring security mitigations that have not fallen by the wayside: Non-Executable Stacks. To understanding why this matters, a brief description of what (Non-)Executable Stacks are, and why they are important.

2.2.1. Non-Executable Stacks

In C -- using common practice to leverage lots of hot-patching, assembly hot-fixing, direct opcode injection, and more -- programs frequently made use of stack-based data buffers where they also dumped their programs. This meant that binaries would read from their stack with the instruction pointer, allowing programs to dynamically inject behavior into a currently running program either in parts or -- if that data came from outside sources -- a fully dynamic sort of live-machine coding. The problem with this approach was that if a normal user could use the stack to make the program do a set of behaviors, so could a malicious actor: this was the easiest and widest branch of attacks upon C programs, and it was an endemic issue given the extremely large number of stack-based, fixed-sized buffers or -- after the advent of C99 -- variable-length arrays. Commandeering programs was as easy as finding a place where naïve programmers and hackers employed the now-deprecated gets, or where input was loaded in a too-relaxed way into a stack buffer. Overrunning the buffer and gaining control of the program by getting a jmp instruction to jump not back to some expected place in the stack but instead to a piece of a new program written by malicious inputs into the program made it easy to exploit C programs day in and day out, over and over again.

After quite a few attempts and retries on mitigations and a several years of evolution, one of the lowest-hanging and easiest mitigations for a large class of direct attacks using stack buffers was to simply make the stack non-executable. This meant directly writing shellcode and jumping to it was no longer a valid way to attack many programs, and as a simple mitigation it has endured as one of many different mitigation techniques that prevents exploits. Executable stacks are, to this day, still one of the easiest-to-exploit properties of programs on a large share of modern computing platforms.

NOTE: This does not mitigate ALL stack-based exploits completely. It just eliminated the bottom-of-the-barrel, 40% ones; instead, folks now had to use what was already on the stack in terms of data to try and trick the program’s own logic to enter an unexpected state and, summarily put it into a vulnerable position. The kickoff from that vulnerable position to either abuse another vulnerability (e.g. an improperly checked heap pointer) can then be elevated to the point of illicit code execution ([solar-non-executable-stack-exploits]). Both formally and informally, this is known as Return-Oriented Programming (ROP).

Most programs, whether using Variable of Fixed-Length Arrays, loved to keep (small)-ish buffers on the stack that they constantly wrote data into in response to user input or read data (such as configuration files or network input). Preventing the ability of a poorly written program that did not guard against all forms of malicious input from turning into an easy Remote Code Execution issue was a gigantic security win. The overwhelming majority of exploits running at the time were effectively halted, even if they contained the perfect shell code to do so.

NOTE: Similar exploit-prevention techniques such as inserting security cookies on the stack, using Address Space Layout Randomization (ASLR), Control-Flow Integrity (CFI) checks, and more have also been developed further to shore up other issues from exploitation and vulnerabilities in C code since then, after ROP, Return to Libc, Heap-based Exploiters (Write-What-Where primitives), and Type-Confusion, became the new dominating exploit techniques.

This means that as a "table stakes" or entry-level bit on security, avoiding an Executable Stack is important for most C features.

2.2.2. Early Design Flaw: Nested Functions turn the stack Executable!

The original design of Nested Functions suffered for its compact and brilliant design, unfortunately. Let’s remind ourselves of the previous example relating to qsort: somehow, without marking the variable as static or _Thread_local, a nested function is able to access all of the object by name from its enclosing scope.

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int main(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
  int compare(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    /* HOW is `in_reverse` usable here?! */ 
    return (in_reverse) ? *right - *left : *left - *right;
  }
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare);
	
  return list[0];
}

The secret of this is in the brilliance of the original design: since it was so commonplace at the time, Nested Functions decided that the best way to be able to find the variables in the enclosing scope was to take a location associated with the enclosing block scope -- the Stack -- and turn it into an executable piece of code. This executable piece of code had an address. Because the executable code and its address came from the program’s stack, it meant that there was no need to provide a pointer to do reference-based / "by-name" object capturing. The function pointer itself served as both the jump to the code to execute, and a location that could then have a number of pre-determined, fixed offsets applied to it to reach all of the variables in that piece of code. This meant that rather than needing an object with static storage or _Thread_local storage, the block-scope variable could be accessed directly!

Unfortunately, this required the stack, still, to be executable in order to function.

This is the critical issue that has resulted in most other vendors not picking up the feature. Compilers want to work flawlessly with GCC-compiled code: this means that compiling a nested function and passing it to a function pointer has to result in a (somewhat) similar Application Binary Interface (ABI) that can be used as applications expect. This means the decision to make GCC nested functions trampoline from the stack is a non-ignorable detail. This is the primary reason Clang and other vendors have refused to implement Nested Functions, chiefly citing security concerns and the inability to develop a worthwhile ABI that is compatible. Just how prevalent are the security issues that Clang and other vendors avoided by refusing to implement the GNU Nested Functions extension? Well...

2.2.2.1. The Prevalence of Executable Stack, and GNU Nested Functions

A ton of users relied on or otherwise kept using Executable Stacks, even into today. While the non-executable stack fix was deployed to great success in the early 2020s, some platforms -- particularly, Linux and similar POSIX-based platforms -- kept this up. This clashed with users who became far too comfortable with the issues related to Executable Stack:

Year of Linux desktop will never happen as long as glibc is de-facto standard libc. As far as I know, this regression is wontfix, so many games just won’t work anymore.

Imagine being a game dev from 5 years ago and making a native Linux version of your game in a good faith, that’s 100% a net-negative money-wise, only for it to stop working in a couple years. Backward compatibility on Linux does not exist as far as regular users concerned, and the only way to make software that works in years to come is to make it for Windows, and hopefully let Valve and Wine teams handle the rest.

-- February 19, 2025, Valentin Ignatev

Briefly ignoring the extreme hyperbole: note the dates and the mentioned time of this tweet. 5 years ago; e.g. developed in 2020, with the complaint happening in 2025. The tweet included a screenshot from the article titled "The glibc 2.41 update has been causing problems for Linux gaming" ([gamingonlinux-dawe]). Windows and MacOS have been disabling executable stacks globally, by default, for years and refusing to load such applications. This complaint happened when glibc stopped forcefully setting executable stack, even if the tweet that garnered the public push back did not mention WHY these games were breaking. This talks to the severe need for a solution in this space that is not security breaking, and even telling folks to simply "turn on executable stack" for themselves is not wonderful:

... Don’t just take my advice on it though if you’re a developer or gamer reading this, always look up what you’re doing fully. Run at your own risk. ...

-- February 13, 2025, Article Author Liam Dawe

Unfortunately, this lax approach to security -- especially for video games -- has resurfaced quite a few truly unfortunate bugs. Popular Windows games such as Dark Souls: PREPARE TO DIE Edition and Call of Duty: WWII have exploitable Remote Code Execution (RCE) bugs in their code. One of them seems to be getting patched, but the other -- in Dark Souls -- is not going to be patched out. The idea that glibc -- or platforms in general -- can take a lax position to security on devices, even for something as "frivolous" as the (multi-billion dollar revenue) video game industry is simply not tenable. From a highly skilled Rendering Engineer commenting on the controversy caused by the glibc 2.41 update:

glibc is in the right here. iirc windows and mac DEP policy disabled executable stack by default for the past ~20 years or so. shocked this was not already the case in linux userland

-- February 21, 2025 A. W. R.

The engineer in charge of pushing the change and looking over the situation commented on the article and general attitude represented by Mr. Ignatev:

It is interesting that the headline did not get into details why I made this change: https://sourceware.org/pipermail/libc-alpha/2024-December/163146.html

In a short: the old behavior was used in a know RCE described in CVE-2023-38408.

-- February 20, 2025, Adhemerval Zanella

It is important to know that many other developers do not share this perception. Even as far back as 2018, when Microsoft kept up its Security Posture in its Windows Subsystem for Linux, people railed against the high-security default of refusing to work with programs that allowed executable stack ([WSL-no-executable-stack]):

The accepted trade-off is to have a non-executable stack be default but have an executable stack for programs which need them. Not supporting this is just a deficiency in WSL.

August 7, 2018, Dr. Martin Uecker

NOTE: This does not necessarily mean that Dr. Uecker wanted an executable stack in all programs. It’s just that for backwards compatibility purposes, it should be allowed to happened rather than fully banned, which is the opposite opinion of how WSL1, SELinux, several BSDs, and many other operating system loaders work.

Whose perspective is correct?

2.2.2.2. The Standard Is To Blame

This proposal agrees with both Mr. Zanella and Ms. A.W.R., and disagrees with Dr. Uecker and Mr. Ignatev; it is impossible to pretend like this is not a problem with more and more exploits taking advantage of not only executable stack, but directly targeting "harmless" software that makes use of it (such as video games). But, even more importantly, the real culprit here is ISO C. There were many alternatives to executable stack-based GNU Nested Functions, and other such entrapments. However, because of how convenient GNU Nested Functions are and how accepted they are in the GNU ecosystem, it has led to multiple security vulnerabilities that should have never existed in the first place (§ 5.4 Executable Stack CVEs).

Unlike Address Space Layout Randomization (ASLR) and several other run-time mitigations, non-executable stacks have been both the cheapest and most enduring security wins in the last 50 years of computing. Marking sections of memory as unable to be run as part of the program permanently shifted the landscape of targeted exploits to be focused almost exclusively on buffer overrun-into-heap-exploitation bugs or through ROP gadgets, as well as trying to find sequences of logic to put programs in a state of disrepair that ultimately grant attackers either a Denial of Service (DoS) attack or full control VIA Remote Code Execution (RCE). Exploits were now exceedingly harder and required orchestration of multiple carefully-crafted scenarios to hit the "Weird Machine" state so coveted by exploiters. This is not the case for executable stack.

2.2.3. Alternative Nested Function Implementations

We have already discussed the first and most popular attempt which leaves the stack executable. Because of the negative security properties of this, there were two more attempts, one still on-going (§ 2.2.3.2 Attempt 3) and one with limited success that requires the entire world to be recompiled or face potential ABI breaks (§ 2.2.3.1 Attempt 2).

2.2.3.1. Attempt 2

An Ada-style Function Descriptors implementation was attempted. The problem with this change for GCC is that it uses a bit (the lowest bit, which traditionally is always 0) in the function pointer itself to mark the function pointer as one that is relying on the Function Descriptor technology. Setting the lowest bit on the function pointer means that it is unsafe to call directly, and therefore every function call must first be masked with func_ptr & ~0b1ull before being called. This is a runtime cost and a general-purpose pessimization that applies to ALL function pointers, making the Function Descriptor approach unsuitable for solving the ABI problem both internally with existing GCC code and to the satisfaction of other developers.

NOTE: This is also inadvisable for specific ABIs such as ARMv7, where the lowest bit in function pointers is used to determine instruction switching capabilities.

2.2.3.2. Attempt 3

A third attempted implementation of Nested Functions attempts to use a separately-allocated trampoline. It can come from either: a stack that is set up at program load time in coordination with the compiler and whose exclusive purpose is to be a memory region for both trampolines and a slot for a void* environment/context; OR, a dynamic allocation that serves as the function pointer plus a void* environment/context pointer. These approaches simply do not work in the general case because it is unclear when, if ever, the function pointer will stop being used. However, one part of Nested Functions -- the fact that they refer to everything by name / "capture by reference" all of the things they use -- means that this can be sufficiently approximated by simply stating that GNU Nested Functions will deallocate such a trampoline (or shrink the stack it was cut from) when the enclosing block that was used for the nested function is gone. (Of course, this means that the function pointer will exhibit the exact same lifetime issues as with the current stack, so it solves some problems but leaves others on the table.)

There is also the slight issue that using a separated trampoline that is on a separate heap or a separate stack might need (but is not necessarily required) a secondary level of indirection. The original, executable stack implementation of GNU Nested Functions prevents this because both the code and the variables are in-line: having a separated trampoline may require such a trampoline to first load the right function call with __builtin_call_with_static_chain § 2.6 __builtin_call_with_static_chain within the code of the trampoline to have the code work properly.

NOTE: As of early 2025 in GCC 14, GCC provided a heap-based implementation that got rid of the executable stack. This requires some amount of dynamic allocation in cases where it cannot prove that the function is only passed down, not have its address taken in a meaningful way, or if it is not used immediately (as determined by the optimizer). It can be turned on with -ftrampoline-impl=heap.

2.2.3.3. A Possible 4th Attempt: Explicit User Control

An experimental technique for allocating a trampoline can be done by having an _Any_func* pointer, as is being standardized by using an in-progress proposal [_Any_func]. Then, rather than needing to implicitly create a trampoline on usage, a user can instead request it and control the allocation explicitly, while passing it back. A fictional example of such an intrinsic -- called __gnu_make_trampoline -- is seen here:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int main(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
  typedef int compare_fn_t(const void* left, const void* right);

  int compare(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  }
  // explicitly make a single-function-pointer trampoline, without an executable stack
  // __gnu_make_trampoline takes a function identifier, returns a _Any_func*
  compare_fn_t* compare_fn_ptr = __gnu_make_trampoline(compare);

  // use it
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_fn_ptr);

  // explicitly free a single-function-pointer trampoline, without an executable stack
  __gnu_destroy_trampoline(compare_fn_ptr);

  return list[0];
}

This is discussed further in § 5.3 Make Trampoline and Singular Function Pointers.

2.2.4. The Nature of Captures

There’s a final issue with nested functions, and it’s that it is not suitable for use with asynchronous code or code that returns "up". Consider the same bit of code as before, but slightly modified:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t* make_compare(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int compare(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  }
  return compare;
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t* compare = make_compare(argc, argv);
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare);
	
  return list[0];
}

In this example, we have simply moved the in_reverse and compare generation into a function, for ease-of-use. One can imagine that we need to create this sort of function multiple times, from perhaps different sources of data. GNU Nested Functions allow us to do this and to return the function "up" the call stack. The problem, of course, is that Nested Functions (in a heap-based implementation or the current executable stack implementation) both point to the current "function frame" that it is created in. That is, while make_compare -- once it has returned -- is no longer alive and all of its automatic storage durations have died, the compare function pointer is still there and passed up the stack. This means that all accesses to in_reverse are accessing memory that is no longer alive, and it is effectively Undefined Behavior.

The actual manifestation of the undefined behavior in this program is very clear: adding the -r argument to make in_reverse turn to 1 does not have any effect on the program anymore:
https://godbolt.org/z/81d7Tqn1E

This is a critical failure of Nested Functions: it only ever "captures" function values by-name / by-reference. There is no option to capture by-value, and therefore the transportation and use of these function pointers to asynchronous code or "up" the call stack means it is fundamentally dangerous. This was not a huge problem in the early days of C, where programs were very flat and it was easy to always "move" function calls up. Unfortunately, we now have asynchronous programming, coroutine libraries / green threading models, callbacks that are saved and invoked much, much later in a program, and all sorts of models for shared code. This is the part of Nested Functions that cannot be saved; it is an intrinsic part of the design that unfortunately will always lead to Undefined Behavior because there is no way to get around that limitations in the GNU Nested Functions design. This is another serious problem that ultimately make it impossible to consider Nested Functions as THE solution for all of the C ecosystem.

NOTE: As a general rule of thumb, if the entity being designed has the ability to be transferred out of its current scope (Blocks, Nested Functions, Lambdas, ...) then it must as a rule allow for determining how it interacts with the scope it is nested in. The answer for function calls in most languages (without a garbage collector or other memory-preserving solution) is "cannot interact with its surrounding scope", which simplifies the problem. But, the whole point of these features IS to interact with the surrounding scope, and so care must be taken to make it work better.

2.2.5. GNU Nested Functions By-Name Captures Cannot Be Worked Around Normally

The deeply unfortunate part of GNU Nested Functions is that even if someone realizes the issue with by-name capture of the surrounding scope and tries to escape it, there is not successful way to actually provide that escape. Consider, briefly, the make_compare-style example from before but with a slight modification to "heapify" the in_reverse variable for safety reasons:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t* make_compare(int argc, char* argv[]) {
  /* LOCAL, heap-allocated variable.... */
  int* in_reverse = malloc(sizeof(int));
  *in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        *in_reverse = 1;
      } 
    }
  }
	
  int compare(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (*in_reverse) ? *right - *left : *left - *right;
  }
  return compare;
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t* compare = make_compare(argc, argv);
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare);
	
  return list[0];
}

Ignoring for a moment that there’s no free called in this scenario, the bigger and more pressing problem here is that this code does not even work. Despite having properly had the memory on the heap now, a by-name reference to the in_reverse pointer means that once the make_compare function exits, that pointer is no longer alive. Doing *in_reverse is just deferfencing a pointer whose value may or may not have changed and is effectively a form of stack-based use-after-free. This means that even if you want to try to make local variable references "safe" by relocating objects in the local arena to the heap, it is still not enough. You would need to deploy static or _Thread_local data in order to solve the problem, still, which brings us back to the original problems at the beginning of this introduction (§ 2 Introduction and Motivation).

2.2.6. Additional Modifications for Nested Functions

The paper [n3654] discusses various potential modifications and directions for GNU Nested Functions. It contains many assertions and future directions, which are discussed in the Appendix (§ 5.1 Accessing Context in Nested Functions).

2.3. Apple Blocks

Blocks are an approach to having functions and data that originate from Objective-C ([apple-blocks]) and are associated with Apple’s Clang. They were proposed a long time ago in an overview of Apple Extensions for C by Garst in 2009 ([n1370]), refined into a specific proposal in 2010 ([n1451] [n1457]). It was later further refined into a proper "Closures" proposal, and the name changed ([n2030]). However, none of these papers made the standard and despite a brief moment where GCC maintained a Blocks runtime, the extension has not been adopted outside of the C / Objective-C ecosystem (and the Blocks extension is no longer allowed or used at the moment).

Because there is a wealth of proposals and literature talking about Blocks, their implementation, their runtime, and more (see an answer discussing the intrinsics and implementation bits for Blocks), rather than inform proposal readers how they work and why they are not suitable for ISO C, this proposal will focus exclusively on why they are not usable for the whole C ecosystem.

2.3.1. Expression

Unlike GNU Nested Functions, Block definitions are an expression. That means that -- given appropriate typing -- one can use blocks directly within a function call:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int main(int argc, char* argv[]) {
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
  qsort_b(list, (sizeof(list)/sizeof(*list)), sizeof(*list),
    ^(const void* untyped_left, const void* untyped_right) {
      const int* left = untyped_left;
      const int* right = untyped_right;
      return (in_reverse) ? *right - *left : *left - *right;
    }
  );
	
  return list[0];
}

The special function_type^ and return_type (identifier^)(argument0, argumentN...) are Block types, which are special types that act as a handle to a Block. Blocks are not just a simplistic combination of a function and a context, however: much more effort is put into making them safe at execution time, and that is done by putting everything related to Blocks behind a hefty runtime.

2.3.2. Runtime Required

Apple Blocks are, at their core, dynamic objects that engage in type-erasure at the top-most level. Where C++ lambdas are completely non-type-erased and each contains a unique type, and where Nested Functions are completely type-erased behind normal function pointers with executable stacks + trampolines, Blocks are callbacks that are unique but have all of their type information erased and carted around in a new function pointer-like construct: the block. Block types are denoted by the ^ in their function type name, and typically are cheaply copiable handles to a heap-stored callable.

NOTE: The layout of the heap-stored callable is dictated by Apple, and has been reverse-engineered and deconstructed many times. Some of this work has been done by the Clang Team, and is the most up-to-date, thorough specification for it is stored in with the Clang documentation ([clang-blocks-spec]).

Of course, all of this is just to illustrate the problem: while Microsoft as a platform may not need to care, GCC and Clang both tend to occupy the same hardware and software spaces. Even if one compiler or another figured out how to be clever, the base layout -- and the premise of blocks being a handle to a heap, and not a compile-time sized object -- means that some form of dynamic allocation or heap is required. This is a net-negative for memory-constrained environments, and in implementations that attempt to be ABI-compatible with the original Apple Blocks implementation will be forced to lay their blocks out in the way that Apple has already specified.

2.3.2.1. More Complications: Generally Unsafe to Return

All block literals are not initially placed on the heap or allocated through the run-time, as a blanket optimization applied to all block literals. They start out on the stack! Which means that while the above code using a literal works just fine, this code is actually Undefined Behavior, in a way that’s equally as bad as GNU Nested Functions:

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t^ make_compare(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };
  return compare;
}

int compare_trampoline(const void* left, const void* right, void* user) {
  compare_fn_t^* p_compare = (compare_fn_t^*)user;
  return (*p_compare)(left, right);
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t^ compare = make_compare(argc, argv);
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);
	
  return list[0];
}

This code does not work. Despite having a runtime, it will NOT perform the lift to the heap automatically. The return from make_compare into compare_fn_t^ compare is a dangling reference, and is explicitly discouraged by reference materials and documentation from Apple. It must be modified to use Block_copy on the return:

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t^ make_compare(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };
  return Block_copy(compare);
}

int compare_trampoline(const void* left, const void* right, void* user) {
  compare_fn_t^* p_compare = (compare_fn_t^*)user;
  return (*p_compare)(left, right);
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t^ compare = make_compare(argc, argv);
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);
  Block_release(compare);

  return list[0];
}

Annoyingly, every call to Block_copy must be paired with a call to Block_release. This means that there’s now an invisible (from the perspective of main) block copy that now needs to be managed with a specific call to Block_release. One can imagine that every function that returns a Block type should just be assumed to need releasing, but this isn’t always the case: this means there’s an invisible lifetime tracking that even the runtime and the heap does not solve for us! Truly, unfortunate.

There is also a small gotcha in this example, that only shows up based on where the compare block is created. This has to do with how Captures work under the Blocks feature.

2.3.3. Captures

Captures in Apple Blocks work in one of two ways.

referring to the existing object by-name in your code, which will make a copy of the object inside of the Block’s implicit capture;
or, annotating a variable with __block, which will load the object into a sort of intrusive pointer / automatic reference-counted place in memory (managed by the Blocks runtime) to allow it to be referred to be both the surrounding code and the block’s inner guts by-name.

The reason Apple used this technique, as talked about before, is because it’s safer than Nested Functions in the particular regard of using variables and carting them around.

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t^ make_compare(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };
  return Block_copy(compare);
}

int compare_trampoline(const void* left, const void* right, void* user) {
  compare_fn_t^* p_compare = (compare_fn_t^*)user;
  return (*p_compare)(left, right);
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t^ compare = make_compare(argc, argv);
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);
  Block_release(compare);
	
  return list[0];
}

The first thing to note is that, because this cannot be translated to a single function pointer, we cannot use qsort. This means that using such a function without creating some kind of special trampoline is off-limits to us. Again, this is something that could be solved by the introduction of an explicit heap-based trampoline creator (§ 5.3 Make Trampoline and Singular Function Pointers). Or, one would need to introduce a "wide function pointer" type -- which is exactly what function_type^ is -- and change qsort’s signature to use it for the callback.

NOTE: Simply upgrading qsort with a Block type is an ABI break that would cause old, already-compiled libraries and programs mixed with new programs to combust in painful, hard-to-detect ways. To solve this problem one would need to rely on existing C extensions like Assembly Labels, or work towards Transparent Aliases ([transparent-aliases]).

But, if someone uses qsort_s -- which takes a void* user data parameter -- one can use it without altering the signature of qsort directly and create a void*-kick off point as a trampoline. However, there is a bit of an interesting conundrum. Take, for example, moving the creation of the block function further upwards inside of make_compare:

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t^ make_compare(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;
	
  compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  return Block_copy(compare);
}

int compare_trampoline(const void* left, const void* right, void* user) {
  compare_fn_t^* p_compare = (compare_fn_t^*)user;
  return (*p_compare)(left, right);
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t^ compare = make_compare(argc, argv);
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);
  Block_release(compare);
	
  return list[0];
}

This code will sort the list incorrectly even if in_reverse is set to 1. That’s because Blocks will capture the variables that get used at the point-of-creation. The value at the point-of-creation is 0, therefore, in_reverse is 0 when the function is called later. Even though the in_reverse variable is copied into the block, the block is now sensitive to where it is being created without any indication that it behaves that way. This is safe, but the behavior would throw someone who uses Nested Functions religiously off completely.

This is, of course, easily fixable by just... moving it down, so it’s hardly a problem:

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t^ make_compare(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;
	
  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }

  // Compare function, with block copy: copies the right value
  compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };

  return Block_copy(compare);
}

int compare_trampoline(const void* left, const void* right, void* user) {
  compare_fn_t^* p_compare = (compare_fn_t^*)user;
  return (*p_compare)(left, right);
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t^ compare = make_compare(argc, argv);
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);
  Block_release(compare);
	
  return list[0];
}

If you don’t want to move the creation of the Block for some particular reason, it is not the only way to capture a variable for Blocks! The second way it can capture variables is by using __block, which works like so:

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t^ make_compare(int argc, char* argv[]) {
  /* BLOCK-QUALIFIED variable.... */
  __block int in_reverse = 0;
	
  compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  return Block_copy(compare);
}

int compare_trampoline(const void* left, const void* right, void* user) {
  compare_fn_t^* p_compare = (compare_fn_t^*)user;
  return (*p_compare)(left, right);
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t^ compare = make_compare(argc, argv);
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);
  Block_release(compare);

  return list[0];
}

This will work as expected, even though the creation of the block comes after in_reverse is declared and before modification of the variable. __block lifts the variable being declared up into a heap (or heap-like space) that is managed by either a garbage collector or an automatic reference-counted memory implementation. This both reverses the onus of capturing / handling the variable onto the surrounding scope and makes it safe-by-design. This does leave an unfortunate gap in that there’s no way to do the dangerous thing or opt into a direct reference without making an explicit declaration of the pointer and then using the pointer by-copy instead in that block, which can leave some memory footprint and program speed on the table without aggressive optimization.

NOTE: The address of in_reverse might actually change, depending on if the Apple intrinsic Block_copy is used to copy the block itself before being run. This code does not depend on it, but a hash map that stores the address of variables might experience the address of any __block-annotated variables changing between creation and the innovation of Block_copy. This was changed later on to always set the variable in a location so that a steady address exists, but how conforming it is to keep the old behavior is likely an implementation choice. While there was previously a Blocks runtime for GCC, it’s fallen off: it may make a comeback again in order to be more compatible with the C and C++ code on Apple platforms: whether they will choose the same implementation technique is not known as of the writing of this paper.

All in all, however, these two things are safe: either a copy is happening and is stored along with the creation of the block on the heap, or the variable itself is having its lifetime prolonged by a sort of automatic-tracking. The second of these is very against the typical properties of C, but that matters little in the face of the obvious safety it brings to the table. Unfortunately, because all of this happens magically and in a mostly-unspecified manner, it’s very detrimental to the proliferation of the C ecosystem and having several loosely-connected implementations working towards the same improved implementations.

2.3.4. Optimization: Folding Escapes

As a matter of optimization, Blocks do not necessarily have to pollute the heap. And indeed, most immediately invocations of a block or pass-down (rather than pass-up) invocations that are visible will be optimized into a direct call. Unfortunately, this is not something that is encouraged by the general design of Apple Blocks and Objective-C or Objective-C++. Because taking the address or passing the function along leaves it open to how far the handle-to-some-heap Block-typed object might be passed, compilers have to correctly (and pessimistically) generate the full, indirect-function-call representation as a matter of course. Block_copy also needs to be used, explicitly, in many cases, making it not much better in regular C code that uses malloc.

As a brief aside in Programming Language design, this sort of optimization problem is mitigated by changing how the defaults are applied and giving the user explicit control. For example, the Swift Programming Language solves this problem while still being compatible with Objective-C and C++ by making it so every "Block" or "Closure" type must be annotated if it "escapes" beyond the compiler’s knowledge, otherwise the program just refuses to compile ([swift-escapes]). This allows aggressive optimization to be applied by-default, with weaker static analysis-based optimizations or escape analysis optimization only acting as a fallback in the @escaping-annotated case. The annotation also makes it so Block_copy does not need to be invoked and rather than having to copy it to a heap version of itself, it is simply put in the right format and place, ready to interoperate cleanly with C, C++, Objective-C or Objective-C++.

NOTE: Swift’s native function type and ABI is not identical to the Blocks type at all. This is just an example in how designing with no-escape as a default and then taking it off to allow for taking its address or returning it from a function

In the opposite direction, the C-style attribute that can be used to say "the closure never escapes", which enables optimizations for crunching the object down and assuming that it never is invoked outside of the function. This is available through the NS_NOESCAPE macro, which expands to __attribute__((noescape)) and can be used to gain better binary size and reduce indirection for code execution speed where possible.

2.3.5. (Explicit) Trampolines: Page-based Non-Executable Implementation

Objective-C has the ability to create a C-style function pointer from a Block ([objective-c-block-trampoline]), and this implementation can be wielded from regular C code on Apple too. This implementation is an entirely different version that does not use a heap but instead a separate "stack" (a single page). That page is a writable (but NOT executable) slab of memory trampolines and Block pointer data in it. The actual trampoline function pulls from this single page one of the data pointers and then sets it up to call; because the trampoline is separate from the actual data, there is much less security risk from writing and reading a pointer that otherwise might be local to executable memory.

However, because of the implementation, occasionally Objective-C can run out of trampolines: if enough are created before any are deallocated, the entire page can be filled up with trampolines. This will trigger errors and failures to create the block. Therefore, even compared to heap-based GNU Nested Function implementation (§ 2.2.3.2 Attempt 3), there exists a potential tradeoff in the designs. This is why trampolines need to be explicit and should be handled in a separate paper, with an interface that can be generalized and can report potential errors (§ 5.3 Make Trampoline and Singular Function Pointers).

NOTE: Modern designs of the Page-based Non-Executable Implementation includes a growable array / linked list of pages, however, so the growth problem no longer exists for Modern Objective-C (read: Apple) platforms anymore. This still presents there being an allocation problem, in needing to have space to grow and having a potential failure point if getting the combination of writable-but-not-executable and read-able-and-executable pages for this design happens to hit a problem.

2.4. C++-Style Lambdas

C++ lambdas, despite coming from C++ initially, is the only solution not to apply executable stack, separate stack, Function Descriptors, or other dynamic/runtime data, to the problem. It was detailed in a large collection of proposals from Jens Gustedt and almost made C23, but ambition in trying to allow for type-generic programming through lambdas with auto parameters stalled progress and ultimately halted everything ([n2923] [n2924] [n2892] [n2893]). It makes a unique type for each and every lambda object that gets created using the syntax [ captures ... ] ( args ... ) { /* code ... */ }, and that object has an implementation-defined but compile-time known size.

NOTE: While type inference for function returns are not yet in, a section of [n2923] was broken off for just variable definitions which ultimately succeeded. Type-inferred variable declarations work properly in C23 and are an important part of Lambdas being able to work in the manner envisioned by Gustedt’s proposals and by C++.

If a lambda has no captures, it can reduce to a function pointer like so:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

static int in_reverse = 0;

int main(int argc, char* argv[]) {

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
  auto compare = [](const void* untyped_left, const void* untyped_right) {
    const int* left = (const int*)untyped_left;
    const int* right = (const int*)untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };
  // "compare" below becomes a function pointer
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare);
	
  return list[0];
}

The one thing that makes it different from GNU Nested Functions -- and somewhat similar to Blocks -- is that it is also an expression. That means that it can also be used inside of a function call expression as an argument, or as part of any other complicated chain of additions:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

static int in_reverse = 0;

int main(int argc, char* argv[]) {

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list),
    [](const void* untyped_left, const void* untyped_right) {
      const int* left = (const int*)untyped_left;
      const int* right = (const int*)untyped_right;
      return (in_reverse) ? *right - *left : *left - *right;
    }
  );
	
  return list[0];
}

Again, this only works because there are no captures. When captures get involved, the above code will simply stop compiling at all because the conversion to a function pointer stops working. This can be solved in the ways discussed previously, such as:

a version of qsort which takes a void* user data parameter, like qsort_s does;
a version of qsort which takes a "wide" function pointer types, similar to how Blocks do with the qsort_b defined by Apple/BSDs;
or, allowing a separate facility which lifts the complex closure type into a single function pointer through a trampoline or similar (§ 5.3 Make Trampoline and Singular Function Pointers).

This gives Lambdas a heavy weakness similar to all of the other solutions: there must be either a trampoline, a wide function pointer type, a void* user data/context pointer, or a something else to accommodate the lack of transformation into a singular function pointer.

NOTE: C++ as a language has a more robust set of core primitives, they don’t have to worry about this problem. std::function<FUNCTION_TYPE> is a type-erased way to transport a whole function object, as a copy, through API boundaries that is defined entirely in their library mechanisms. For a view into any old function, there is std::function_ref<FUNCTION_TYPE>, which is like a function pointer but allows pointing into many different kinds of functions that exist in C++. This makes the integration of lambdas into C++ easy; in C, it is much more difficult to have this all participate in the system automatically. There is no std::function-alike in C, and there’s no std::function_ref-alike in C either for new function types like GNU Nested Functions, Apple Blocks, or otherwise. This problem has to be solved separately, with a Wide Function Pointer type (§ 5.2 Wide Function Pointer Type) or with explicit user trampoline-making capability such as § 5.3 Make Trampoline and Singular Function Pointers.

2.4.1. Captures and Data

Lambdas do not capture any context by default, unlike both GNU Nested Functions or Apple Blocks. Instead, every capture must be manually annotated as either being taken by value or by reference, or a "universal capture" must be used to set a default method of capturing all visible, in-scope, block objects.

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

// a new kind of return type: "inferred"
auto make_compare(int argc, char* argv[]) {
  int in_reverse = 0;
	
  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  // explicitly capture in the "[ ]" of the lambda
  auto compare = [in_reverse](const void* untyped_left, const void* untyped_right) {
    const int* left = (const int*)untyped_left;
    const int* right = (const int*)untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };
  return compare;
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  auto compare = make_compare(argc, argv);
  auto compare_trampoline = [](const void* left, const void* right, void* user) {
    typeof(compare)* p_compare = user;
    return (*p_compare)(left, right);
  };
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);

  return list[0];
}

This, unfortunately, also makes them susceptible to location just like Blocks; the moment of creation during execution is the state they capture when using unadorned identifiers in_reverse and = captures. This code would capture in_reverse before any important modifications happens.

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

// a new kind of return type: "inferred" (`auto`)
auto make_compare(int argc, char* argv[]) {
  int in_reverse = 0;
	
  // "capture all variables" annotation `=` -- same as writing the flat name of
  // every object currently in-scope.
  auto compare = [=](const void* untyped_left, const void* untyped_right) {
    const int* left = (const int*)untyped_left;
    const int* right = (const int*)untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };
  // lambdas and captures do not reflect any changes beyond this point,
  // including the `in_reverse`

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  return compare;
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  auto compare = make_compare(argc, argv);
  auto compare_trampoline = [](const void* left, const void* right, void* user) {
    typeof(compare)* p_compare = user;
    return (*p_compare)(left, right);
  };
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);

  return list[0];
}

Like Apple Blocks, a lambda with captures is safe to return with the by-value capture (if one briefly ignores the need for Block_copy to reseat the memory of a Block). Additionally, it is better here because there is no usage of the heap needed to do this.

ASSERTION: This is PRIMARILY due to C++-style Lambdas just being normal objects. They have a compile-time sizeof(...), a compile-time alignof(...), can have their unique type inspected with typeof(...) (decltype(...) in C++), and are generally autonomous. There’s no erasure happening here as is with Blocks (the function_type^ type) or Nested Functions (the function_type* type); each Lambdas is a unique type, similar to the unique type gained by pairing a function with a hand-made struct.

The only problem is that this requires a feature that was proposed for C23 but didn’t make it (along with Lambdas not making it): deduced return types. There was consensus to have the feature, but the feature was bundled with the set of Lambda proposals, and thus fell through during the final stretch of C23. Therefore, a proposal for inferring the type of a function return should be separated from the previous proposals (such as [n2923]) in order to accommodate such behavior in C.

One of the things that’s better about Lambdas over Apple Blocks is that they also allow for by-name capture, just like Nested Functions do. So, this code -- despite having the lambda defined in main and before in_reverse is changed -- will work as expected:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int main(int argc, char* argv[]) {
  int in_reverse = 0;

  // & is by-reference capture
  auto compare = [&](const void* untyped_left, const void* untyped_right) {
    const int* left = (const int*)untyped_left;
    const int* right = (const int*)untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
  auto compare_trampoline = [](const void* left, const void* right, void* user) {
    typeof(compare)* p_compare = user;
    return (*p_compare)(left, right);
  };
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);
	
  return list[0];
}

This will invoke undefined behavior in the case of moving this by-name capturing lambda into a function and then returning it. Capturing a name with &some_identifier (or using the "default capture" of & by itself) always captures by pointer of the variable.

Even if it is more explicit that in the Nested Functions case, the danger is still present and so care must still be exercised. That is, the following is undefined behavior because of the explicit by-reference & capture:

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

// a new kind of return type: "inferred" (`auto`)
auto make_compare(int argc, char* argv[]) {
  int in_reverse = 0;
	
  // capture just one variable, and capture it "by-name" / "by-reference"
  auto compare = [&in_reverse](const void* untyped_left, const void* untyped_right) {
    const int* left = (const int*)untyped_left;
    const int* right = (const int*)untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        // lambda will reflect this change
        in_reverse = 1;
      } 
    }
  }

  // uh oh...
  return compare;
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  auto compare = make_compare(argc, argv);
  auto compare_trampoline = [](const void* left, const void* right, void* user) {
    typeof(compare)* p_compare = user;
    return (*p_compare)(left, right);
  };
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);

  return list[0];
}

This means that lambdas can be made to be unsafe, by capturing things whose lifetime dies even as the lambda itself is passed around or returned. This allows for a sleeker representation and no runtime-heap, but with the OBVIOUS drawback that no automated reference-counted variable also means no implicit lifetime safety like with the __block variables. Thankfully, since captures can be done both ways, the user can either choose to capture by reference, choose to capture by value, or -- if needed -- choose to allocate and then capture the new allocated pointer by value themselves. Of course, any explicit allocation will need to be freed, just as it would in a Blocks scenario. This usually implies waiting for a signaling callback from the API that it is done, or elevating the lifetime to a higher level to be deleted at a later time.

2.4.2. What About Lifetime / Destructors?

A common criticism of Lambdas and their unique type, whole-object approach is that such an approach with captures requires C++-style destructors to work well. We are completely unsure why this is the case or why this criticism keeps being levied specifically at Lambdas. In the previous section on Apple Blocks (§ 2.3.2.1 More Complications: Generally Unsafe to Return), coordinated function calls and documentation are the only way to communicate that a user has Block_copy’d an object and therefore requires Block_release. Similarly with GNU Nested Functions, returning them up the stack at all is pure undefined behavior, that has tangible effects on the program (§ 2.2.4 The Nature of Captures): these are problems endemic to C. Using a complex data structure like a binary tree or allocating memory requires that it is documented and communicated to the user: capturing such complex types and having it called over a longer period of time simply means the user has the responsibility to clean up or free the resources.

C APIs will always provide a user with provided functions a way to know when something must be cleaned up. For example, the Lua C API has an allocation function that is specifically called with a "new size" parameter of 0, it means memory passed in must be freed; that’s how it communicates what the current action is. Similarly, ev/libev -- with ev_set_allocator -- provides a hook to allow a user to manage the memory of the library, while also providing several statuses in callbacks for watchers (initialized/pending/running/stopped/etc.). Even for standard C, thrd_create passes a void* user data to the func that gets run on the new thread: a user must allocate and then pass the user data to the thread, and it becomes the thread’s responsibility to manage the lifetime of that type in a manner that is thread safe.

Any C API worth its salt, when dealing with convoluted lifetimes, provides to a given callback (through its parameters) a notification that the memory is not usable anymore, OR a separate callback (for more full-fledged APIs) that notifies the API that its finished or done with a specific operation, and therefore safe to close things out. The secondary alternative is, of course, statically sourcing lifetime until the user themselves guarantees all resources can be safely freed using some outside knowledge (e.g., an explicit set of calls after the start of the library to cleanup/close/stop the library). This is not just a point with lambdas: every attempt at solving this problem has to engage with this. Whether it’s using Block_copy to ensure the lifetime of an Apple Block, or malloc to make sure a struct type being pointed to by a void* is accessible after a dispatch to an asynchronous function call. This is simply not a problem unique to lambdas: lifetime tracking and safety will always be a problem in C because C has no extended concept of lifetime duration beyond Effective Types.

Any problem with lifetime is going to be present in every single iteration of the solution to this problem, and is going to manifest in different ways:

UB if you return a GNU Nested Function pointer that works with variables in the function scope being exited;
UB if you return an Apple Blocks pointer without Block_copy;
UB if you capture things that go out of scope in general (safe defaults for Apple Blocks, safe defaults for Lambdas with [=] or just ident captures);
UB if you convert to the wrong struct inside of an ISO C function to access a user data void* pointer;
UB if you by-name capture in a lambda and leave the scope;

and so on, and so forth. That’s a C-intrinsic problem, and the only thing any design in this space can do is offer better tools or better control to manage or avoid such problems where possible. Lifetime management is not solvable in C as it stands, and no amount of features or tinkering or attributes will really change the intrinsic language design flaw that is pointers and references that do not have any compile-time trackable properties asides from what can be inferred with (potentially strenuous) static analysis.

One way to alleviate this -- which would be beyond what is currently within C++ and what has been proposed previously -- is to allow for lambdas (beyond their initialization/creation) to have "accessor" syntax for any of its captures using the lambda.identifier syntax. This is explored in the later design for the solutions, within § 3.2.5 NEW: Data Captures are Accessible.

2.5. Function Literals and Local Functions

There is not much to say about Function Literals ([n3679]) and Local Functions ([n3678]) as they deliberately do not engage with the problem of trying to capture and use data. The syntax for Function Literals is based on Compound Literals, wherein the ( abstract-declarator ){ ... } syntax is repurposed. Currently, abstract-declarator being a function type is just a constraint violation in C, so it is safe to repurpose this syntax. The syntax for Local Functions is identical to GCC’s Nested Functions, and thus, all the criticisms and flaws of Nested Functions apply (§ 2.2 GNU Nested Functions) with the additional problem that they are refusing to engage with Captures at all, and therefore make it unsuitable for both the GNU Nested Function use case or the Apple Blocks use case.

By not engaging with the closure/capture issue, Function Literals seek to just be a prettier form of ISO C regular functions. This provides the benefits of not needing to have a wide function pointer type immediately (albeit one is still needed for the general ecosystem), and it allows code to be read in a much more friendly format by localizing the function pointer and, potentially, any user data structures that go with a void*. As a compound literal, it also immediately works since it can be created/used as an expression, meaning it can be passed to function arguments:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

static int in_reverse = 0;

int main(int argc, char* argv[]) {
  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list),
    (int(const void* untyped_left, const void* untyped_right)) {
      const int* left = untyped_left;
      const int* right = untyped_right;
      return (in_reverse) ? *right - *left : *left - *right;
    }
  );
	
  return list[0];
}

Unfortunately, not having any solution or future direction for captures and repurposing the compound literal syntax for it means that it seems more like a dead end. In the above example, we still have to transfer the in_reverse with a static for the qsort call. It gets slightly better if the API has a void* user data carveout, like for qsort_r:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int main(int argc, char* argv[]) {
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  qsort_r(list, (sizeof(list)/sizeof(*list)), sizeof(*list),
    (int(const void* untyped_left, const void* untyped_right, void* user)) {
      const int* in_reverse = (const int*)user;
      const int* left = untyped_left;
      const int* right = untyped_right;
      return (*in_reverse) ? *right - *left : *left - *right;
    },
    &in_reverse
  );
	
  return list[0];
}

A more pronounced example that uses more than an int shows that one can write both the struct and callback locally:

#include <stdlib.h>

typedef void async_callback_t(int result, void* data);
void async(async_callback_t* callback, void* data);

int main() {
	
  // struct and callback are next to each other
  struct { int value; }* capture = calloc(1, sizeof(*capture));  
  auto function  = (void (int result, void * data)) {
    // anonymous struct only identified by `typeof`,
    // keeps the exact type and helps reduce Type Confusion errors
    typeof(capture) captured = data;
    free(captured);
  };
  async(function, capture); // used immediately: hard to lose track
  return 0;
}

2.6. `__builtin_call_with_static_chain`

This is an intrinsic that is available both in GCC and in Clang([builtin_call_with_static_chain_gcc]). It is the thinnest possible wrapper around the idea of a function call and an associated environment; effectively, it picks a location to store a "context" pointer and "chains" that with the expression that will be the function call without needing to pass that context pointer as an argument.

This kind of intrinsic is effectively directly relatable to what a Wide Function Pointer’s (§ 5.2 Wide Function Pointer Type) internals would look like. It is also either directly compatible with or cheaply convertible to the inner workings of the (vast) majority of cheap closure pointers in other languages, such as Go, Lua, C++, and more. This is a nice intrinsic to have but it is extremely low-level; it is also deeply implementation-defined and underspecified/undocumented even in the implementations that have it. It also does not translate universally, though plenty of individuals remake the same conceptual thing in other languages (Lua with its "environment" tables for execution of certain functions, Zig and Odin contexts for function bodies, and more).

Standardizing this directly is likely not worth the hassle of trying to pin down a formal specification of what the "static chain" ultimately is. It is also only a single kind of implementation technique for the closure problem: it is not necessarily required to be implemented in the manner that a static chain call would provide or imply. Nevertheless, this is an intrinsic so deep into implementation internals that it could be done separately; we think it would be more applicable and worthwhile to get the better-defined primitives on top of such an intrinsic (such as a Wide Function Pointer type and manual Trampoline creation).

2.7. Solution

This proposal is going to work to standardize both Capture Functions as a C extension-familiar way of working with data that is based on existing practice. It is also going to standardize lambdas for the technical differences between it and Capture Functions, in particular its ability to be used for macros (small but important) and its ability to be C++-compatible (unifying more header and in-line code).

A different proposal is going to work on the "Make Trampoline" aspect, to allow interoperation with old code. Another different proposal is going to work on providing a "wide function pointer" type. As used in the examples here, we hope to see % as a pointer-like modifier for a "wide function pointer" type, and if not that perhaps a _Closure(function-type) spelling to make it directly accessible by most.

3. Design

Given the following properties from all of the extensions and proposals for this in the wild:

Feature	GNU Nested Functions	Apple Blocks	C++-Style Lambdas in C	Function Literals	Local Functions
Capture By-Name	✅ (default, use-based)	✅ (`__block ident;`)	✅ (`[&]`, `[&ident]`)	❌	❌
Capture By-Value	❌	✅ (default, use-based)	✅ (`[=]`, `[ident]`)	❌	❌
Selective Capture	❌ (use-based, by-name only)	✅ (for by-name) ❌ (for by-value, use-based)	✅	❌	❌
Safe to Return Closure	❌	⚠️ (requires `Block_copy`)	✅	✅ (never unsafe)	✅ (never unsafe)
Relocatable to Heap (Lifetime Management)	❌	✅ (`Block_copy`/`Block_release`)	✅ (`malloc`/`memcpy`/`free`)	✅ (not needed)	✅ (not needed)
Usable Directly as Expression	❌	✅	✅	✅	❌
Forward-Declarable	✅	❌	❌	❌	✅
Immediately Invokable	❌	✅	✅	✅	❌
Convertible to Function Pointer	✅	❌	⚠️ (only capture-less)	✅	✅
Convertible to "Wide" Function Type	✅	✅	✅	✅	✅
Access to Non-Erased Object/Type	❌ ((wide) function pointer only)	❌ (Block type/wide function pointer only)	✅ (unique type/size)	❌ (no object to access)	❌
Access to Captures through Object/Type	❌	❌	❌	❌	❌
Recursion Possible	✅ (use the identifier of the nested function)	❌ (`__self_func` required)	❌ (`__self_func` required)	❌ (`__self_func` required)	✅ (use the identifier of the local function)

This proposal is going to propose two distinct options for standardization, with the recommendation to do both. It is critical to do both for the approval of the C ecosystem in general (with § 3.2 Capture Functions: Rehydrated Nested Function), and for the maximum amount of external language compatibility (C++ in particular, with § 3.3 Lambdas). The necessary and core goals of this proposal are focused on

compile-time knowable size of the function object (can be treated as a regular object);
that has a unique type with its own size;
does not require a heap or a special stack (unless type-erased or otherwise relocated by (explicit) user action);
can be interacted with like a normal function (but not necessarily as a normal function pointer);
and, does not compromise the security of all or a portion of the program.

Lambdas already have usage experience with well-known properties that can be directly translated to C and is easy enough to understand, despite the unfortunate syntax. Capture Functions are a simple modification of Nested Functions that produce a sized object (similar to Lambdas) and makes their captures explicit, allowing for a degree of control and additional safety that was not present in the original Nested Functions design.

We are not focused on interoperability with singular function pointers. We believe that should be left to a separate, explicit mechanism in the language, capable of allowing the user to choose where the memory comes from and setting it up appropriately. This way, a user can make the decision on their own if they want to use e.g. executable stack (with the consequences that it brings) or just have a part of (heap) memory they set with e.g. Linux mprotect(...) or Win32 VirtualProtect to be readable, writable, and executable. Such a trampoline-maker (as briefly talked about in § 5.3 Make Trampoline and Singular Function Pointers) can also be applied across implementations in a way that the secret sauce powering Nested Functions cannot be: this is much more appealing as an approach.

We DO NOT take any of the design from Blocks because the Blocks design is, as a whole, unsuitable for C. While its deployment of a blocks "type" to fulfill the necessary notion of a "wide function pointer" type is superior to what Nested Functions have produced, the implementation details it imposes for __block variables and the excessive reliance on an (underspecified) runtime/heap are detrimental to a shared & unified approach to C.

NOTE: The Blocks runtime/heap layout has changed (at least) once in its history: the only reason this worked is because Apple owned every part of the Blocks ecosystem. Apple can do whatever they want with it, however they want, whenever they want: this does not work in a language with diverse, loosely coordinated implementations like C and not Objective-C, Objective-C++, or Swift.

As the heap is (typically) repulsive to some freestanding implementations, we do not want to standardize something that will have similar technological drawbacks like VLAs, where -- even if no syntactical or language-design issues exist from the way blocks are written -- the presence of an unspecified source of memory (stack or heap) produces uncertainty in the final code generation of a program.

The feature table for these two looks like this:

Feature	C Lambdas	Capture Functions
Capture By-Name	✅ (`[&]`, `[&ident]`)	✅ (`_Capture(&)`, `_Capture(&ident)`)
Capture By-Value	✅ (`[=]`, `[ident]`)	✅ (`_Capture(=)`, `_Capture(ident)`)
Selective Capture	✅	✅
Safe to Return Closure	✅	✅
Relocatable to Heap (Lifetime Management)	✅ (`malloc`/`memcpy`/`free`)	✅ (`malloc`/`memcpy`/`free`)
Usable Directly as Expression	✅	❌
Forward-Declarable	❌	✅
Immediately Invokable	✅	✅
Convertible to Function Pointer	⚠️ (only capture-less)	⚠️ (only capture-less)
Convertible to "Wide" Function Type	✅	✅
Access to Non-Erased Object/Type	✅ (unique type/size)	✅ (unique type/size)
Access to Captures through Object/Type	✅	✅
Recursion Possible	❌ (`__self_func` required)	✅

3.1. What is NOT Being Proposed!

While we would like to standardize them in the future, this proposal is NOT looking to standardize statement expressions, a "make trampoline" compiler intrinsic, or a wide function pointer type.

3.1.1. Statement Expressions?

Statement Expressions should be standardized. While it is related to these efforts, it is entirely separate and has a full, robust set of constraints and concerns in standardizing. It has more existing implementation experience, deployment experience, and implementer practice than any of Blocks or Nested Functions combined. Therefore, it will be pursued in a different proposal. This was briefly noted in a proposal collecting existing extensions in 2007 by Stoughton ([n1229]); while there was enthusiastic support at the time, nothing materialized of the mention nor the in-meeting enthusiasm. Some attempts are being made at standardizing it, but it is notably difficult to standardize due to the large number of corner cases that arise from needing to clarify semantics of constructs that normally cannot appear in certain places being able to suddenly appear there, like a break; being placed in the initializer expression of a for loop.

Another advantage of Statement Expressions is that, unlike any of Apple Blocks / C++ Lambdas / GNU Nested Functions, there is no separating function body. This is critical for writing macros that coordinate with one another, AND is critical in writing reusable macros that have no additional cost and does not set up extra individual entry points. For example, there are hundreds of permutations of the functions in C2y’s <stdmchar.h> that could be written to make them easier to use, to make them not require double-pointers, to infer the size from a C-style string, and so on, and so forth. The choice of having a bunch of macros which simply repeat the same code means not having to add hundreds of permutations of the <stdmchar.h> functions (5 different character types across 5 different encoding types with 6 forms of "pointer and length, just pointer" for input/output, and typical skip/ignore/replace-character error handling strategies, pairwise with one another where order matters).

Another place that statement expressions come in handy is with RESULT/TRY/etc. macros, primarily used for low-level code where handling (and possibly enforcing error handling) is desirable through error codes and return types, as demonstrated by jade and lak here: https://gist.github.com/LAK132/0d264549745e8196df1e632d5b518c37. Being able to error and jump out or error and stop if things do not work is a very common (and powerful) idiom for writing straightforward code, and is employed heavily in many different ways across the C ecosystem in various forms.

This paper does not standardize Statement Expressions, and leaves that to a future paper similar to n3643 ([n3643]).

3.1.2. Wide Function Pointer Type?

We do hope that another paper creates a new "Wide Function Pointer" type of some kind. Some suggestions can be found in § 5.2 Wide Function Pointer Type.

3.2. Capture Functions: Rehydrated Nested Function

Capture Functions are a slight modification of the design of Nested Functions. We start from the base of Nested Functions with three goals in mind.

Implementers are not comfortable with the implementation baggage associated with Nested Functions or maintaining potential ABI compatibility with those choices (heap/stack trampolines versus separate-page allocations).
We want to allow a way to access captured values explicitly, and control how those captures work.
We want them to be safe to move around and relocate, whether to the heap or copied into static memory or otherwise.

A brief demonstration of all of the well-defined behavior:

auto make_seven (int x) {
  int y = 7;
  int seven_fn() _Capture(x, y) {
    return x * y;
  }
  return seven_fn; // OK: unique type which
  // is a complete object
}

typedef int eight_fn_t();

eight_fn_t* make_eight () {
  int eight_fn () _Capture() {
    return 8;
  }
  return eight_fn; // OK: empty capture converts to function pointer
}

#if 0
typedef int nine_fn_t();

nine_fn_t* make_nine () {
  int val = 30;
  int nine_fn () _Capture(val) {
    return val;
  }
  return nine_fn; // constraint violation: cannot convert
  // captures to function pointer
}
#endif

int main () {
  int x = 3;
  int zero () {
    // OK, no external variables used
    return 0;
  }
  int also_zero () _Capture() {
    // same as above, just explicit
    return 0;
  }
#if 0
  int double_it () {
    return x * 2; // constraint violation
  }
#endif
  int triple_it () _Capture(x) {
    return x * 3; // OK, x = 3 when called
  }
  int quadruple_it () _Capture(&x) {
    return x * 4; // OK, x = 5 when called
  }
  int quintuple_it () _Capture(=) {
    return x * 5; // OK, x = 3 when called
  }
  int sextuple_it () _Capture(&) {
    return x * 6; // OK, x = 5 when caled
  }
  x = 5;
  auto seven_tuple_it = make_seven(x);
  eight_fn_t* eight = make_eight();
  return zero() + triple_it() + quadruple_it()
    + quintuple_it() + sextuple_it() + seven_tuple_it()
    + eight();
  // same as
  // return 117;
  // 0 + (3 * 3) + (5 * 4)
  // + (3 * 5) + (5 * 6) + (5 * 7)
  // + 8
}

We go over the purpose of the design of this and the reasons for that design here.

3.2.1. Capture Functions are Complete Objects (unless only Declared)

The most important change from typical GNU Nested Functions and mirroring behavior from C++ Lambdas is that nested functions -- the identifier itself introduced by the definition of the function -- is a regular, normal, complete C object. This enables it to be:

returned, if the type is knowable at the time of function definition (or auto return types are incorporated into the language);
passed to a function, if the function is defined after the creation of the capture functions;
and, stored elsewhere through static/_Thread_local data with assignment or memcpy, or even on the heap.

These are important qualities to allow these functions with data to be used with asynchronous code, as (stored) callbacks, and in other scenarios. The size and alignment of the object is implementation-defined, and its layout is also entirely implementation-defined, much like the properties of a regular struct or union type. This allows implementations to not have to figure out how to squash everything into a single erased type, and instead enforce the Single Responsibility Principle; they already know how to create unique types, they already know how to create and fill structure types, and now separately a wide function pointer type or a "make trampoline" compiler feature (§ 5.3 Make Trampoline and Singular Function Pointers) can be developed.

Given an extremely simple example:

#include <stdlib.h>
#include <stdio.h>

typedef void work_fn_t(void* user);
void add_work(work_fn_t* work, void* user);
bool work_done();

void kickoff(int start, int limit) {
  void work() _Capture(start, limit) {
    printf("doing work for %d to %d\n", start, limit);
    for (int i = start; i < limit; ++i) {
      printf("sooo much work - %d\n", i);
    }
  }
  void work_trampoline(void* user) {
    (*((typeof(work)*)user))()
    // free lambda after work is done
    free(user);
  };
  // elevate to higher lifetime to survive async function call time
  void* work_ptr = malloc(sizeof(work));
  memcpy(work_ptr, &work, sizeof(work));
  add_work(work_trampoline, work_ptr);
}

int main (int argc, char* argv[]) {
  int start = 0;
  int limit = 30;
  if (argc > 1)
    start = atoi(argv[1]);
  if (argc > 2)
    limit = atoi(argv[2]);

  kickoff(start, limit);

  while (!work_done());
  // no memory leaks at the end of the program
  return 0;
}

There are caveats about this, but they are related to forward declarations (§ 3.2.6 Forward Declarations Work).

3.2.2. Deduced Return Types, Unique Types

Reusing an example from the above code, the make_seven function needs to have a special, inferred/deduced return type. This is because the type of a capture function is not known until it is defined:

auto make_seven (int x) {
  int y = 7;
  int seven_fn() _Capture(x, y) {
    return x * y;
  }
  return seven_fn; // OK: unique type which
  // is a complete object
}

The auto return type here just means "the first return expression is the return type of the function". This only works with in-line function definitions, and does not allow for a separated function declaration/definition (as the separated declaration would not have a material, real type until the definition could be read). This only applies to functions with inferred return types like this, where the first declaration of such a function must also be its definition.

If no return appears in such a function, or all the returns do not contain an expression, the return type is inferred to be void. Otherwise, all the return <expr>; must return the exact same type. If there exists one or more return <expr>;s and the types are not exactly the same in the whole function definition, then it is a hard error. This is already partly described in Jens Gustedt’s "Type inference for variable definitions and function returns v6" ([n2923]); reviving this paper would be a matter of rebasing it on the current working draft and improving the wording present.

3.2.3. Data Captures are Explicit

Data captures, the way in which local data is accessible inside of the function, are explicit. The only reason captures are explicit is because it is impossible to tell if something should be captured by value (and copied into whatever implementation-defined holding space is used for the Capture Functions’s complete object), or if something should be captured by name/reference (and only have its pointer/address copied into whatever implementation-defined holding space is used for the Capture Functions’s complete object). This detail matters both for safety reasons when assigning, copying, storing, and otherwise relocating a capture function from its original scope.

NOTE: static and _Thread_local objects, as well as typical file-scope declarations, are accessible within a capture function in the normal way. constexpr objects, without a static specifier, at local scope are also accessible.

Allowing for explicit captures also allows for better type checking (used objects must be explicit acknowledged by the programmer that they should be used), and allows for covering both the use cases of Apple Blocks (default by-value capture) and GNU Nested Functions (default by-name capture) without breaking anything. The lack of a capture also covers all of the use cases that Function Literals would have covered, which means that Capture Functions can sufficiently cover all of the existing use cases currently in production in C ecosystems. To match the default behaviors:

Apple Blocks: _Capture(=) (capture all by-value).
GNU Nested Functions: _Capture(&) (capture all by-name/reference).
Function Literals: _Capture() (capture nothing).

Only one "capture all" is allowed. That is, _Capture(=, &) (and vice-versa) is illegal. The rest of the specific captures for accessible identifiers can be specified in any order. Note that specific captures for a given object override the default implicit "capture all" behavior. For example:

int main () {
  int x = 30;
  int y = 10;
  int fn () _Capture(&, x) {
    return x + y;
  }
  x = 50;
  y = 40;
  return fn();
}

This program returns 70 (x is captured by-value as 30, y is captured by-name and is changed to 40 before invocation). The change to x on the outside to 50 is not reflected inside of the invocation. This allows an ease-of-use for specifying the "default" implicit all-capture, while letting the user select specifically which captures should work.

3.2.4. Data Captures can be Renamed

Data captures can be renamed (or computed, with an expression that does not include a , unless it is parenthesized). This is important for e.g. incrementing reference counters for copying large, important data structures into callbacks that may either be invoked multiple times or have their own long-lived lifetime. The syntax for this occurs within the _Capture clause of a capture function:

#include <tree.h>

TREE_DECLARE(int_tree_t, int_tree, int);
TREE_IMPLEMENT(int_tree_t, int_tree, int);

#include <stdcountof.h>

enum queue_status {
  qs_success,
  qs_timedout,
  qs_busy,
  qs_fail,
  qs_invalid
};

typedef int work_fn_t(void* user);

queue_status add_dispatch_work(work_fn_t* work, void* user);
queue_status is_work_done();
void work_shutdown();

int main () {
  int data[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  int_tree_t tree = int_tree_init_with(data, data + countof(data));
  int work () _Capture(my_tree = int_tree_copy(tree)) {
    /* do work.... */
    int elem = int_tree_remove(my_tree, int_tree_min_node(my_tree));
    /* blah blah blah */
    return 0;
  }
  int work_trampoline (void* user) _Capture() {
    return (*((typeof(work)*)user))();
  }
  if (add_dispatch_work(work_trampoline, &work) != qs_success) {
    return 1;
  }
  queue_status err;
  while ((err = work_done()) != qs_success) {
    swith () {
      case qs_invalid:
      case qs_timedout:
      case qs_failed:
        // some error happened
        work_shutdown();
        return 2;
      default:
        break;
    }
  }
  work_shutdown();
  return 0;
}

3.2.5. NEW: Data Captures are Accessible

An important adjustment to make sure this code works better than the way it did for Blocks or Nested Functions is the ability not only to copy (§ 3.2.1 Capture Functions are Complete Objects (unless only Declared)) or otherwise rename objects (§ 3.2.4 Data Captures can be Renamed), but ALSO to get at the internals of a given Capture Functions. This is something missing from GNU Nested Functions (which provides no real resolution for it) as well, and something that could matter for Apple Blocks but does not in practice because they can turn any object into a shared one with the __block modifier on an object. In particular, this only matters in the case of a closure which is given a (copied) resource that must either be released or freed.

NOTE: Thanks to Alex Celeste, for being the first person to bring this to my attention!

The syntax looks just like normal structure access, and is based on the names placed in the _Capture clause:

#include <stdio.h>

int main () {
  int x = 30;
  double y = 5.0;
  char z = 'a';

  int cap_fn0 () _Capture(=, &renamed_x = x) {
    printf("inside cap_fn0 | renamed_x: %d, y: %f, z: %c\n",
      renamed_x, y, z);
  }
	
  int cap_fn1 () _Capture(&, renamed_y = y) {
    printf("inside cap_fn1 | x: %d, renamed_y: %f, z: %c\n",
      x, renamed_y, z);
  }
	
  x = 60;
  y = 10.0;
  z = 'z';

  cap_fn0();
  cap_fn1();
	
  printf("\n");

  printf("inside main fn | cap_fn0.renamed_x: %d, cap_fn0.y: %f, cap_fn0.z: %c\n",
    cap_fn0.renamed_x, cap_fn0.y, cap_fn0.z);
  printf("inside main fn | cap_fn1.x: %d, cap_fn1.renamed_y: %f, cap_fn1.z: %c\n",
    cap_fn1.x, cap_fn1.renamed_y, cap_fn1.z);

  return 0;
}

This would print:

inside cap_fn0 | renamed_x: 60, y: 5.0, z: a
inside cap_fn1 | x: 60, renamed_y: 10.0, z: z

inside main fn | cap_fn0.renamed_x: 60, cap_fn0.y: 5.0, cap_fn0.z: a
inside main fn | cap_fn1.x: 60, cap_fn1.renamed_y: 10.0, cap_fn1.z: z

How the implementation actually accesses the information is implementation-defined, and the layout of the Capture Functions object is not defined the specification, except to say it’s implementation-defined (§ 4 Wording).

NOTE: This leaves room for an implementation to, for example, use creative ways to retrieve objects and object references. Using a pointer to the current stack frame and then computing a raw offset to get to a specific bit of data, or using entirely registers, are all possible depending on how the captures are implemented. Such improvements and optimizations -- especially in the face of potential asynchronous calls and the need to protect against false sharing -- must be left up to Quality of Implementation.

As an example for releasing resources outside of the function call itself for the purposes of a function call that gets used more than once and isn’t passed a "We’re Done" signal, we can reuse the example from § 2.2.5 GNU Nested Functions By-Name Captures Cannot Be Worked Around Normally:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

auto make_compare(int argc, char* argv[]) {
  /* LOCAL, heap-allocated variable.... */
  int* in_reverse = malloc(sizeof(int));
  *in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        *in_reverse = 1;
      } 
    }
  }
	
  int compare(const void* untyped_left, const void* untyped_right) _Capture(in_reverse) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (*in_reverse) ? *right - *left : *left - *right;
  }
  return compare;
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  auto compare = make_compare(argc, argv);
  qsort_r(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare, &compare);
  // with data field captures, we can now `free` the
  // field `in_reverse` from the lambda
  free(compare.in_reverse);
	
  return list[0];
}

Thanks to the capture of in_reverse with the by-value _Capture(in_reverse) indication, the return of this function is safe. And, since we have access to the unique type that is generated (through the auto return type), we can access the pointer in_reverse normally and naturally. This isn’t possible with normal C++-style lambdas, as they haven’t decided to make this available (though our design for Lambdas in C will also include the named captures as accessible fields). It’s also not possible in the other solutions which rely on type-erasure as a first-class part of the design (Apple Blocks with the Blocks type, GNU Nested Functions only being accessible through a pointer or convertible to a wide function pointer in [n2661] or [n3564], Borland’s closure annotation or function literals). This is why making it possible to access the unique type first and foremost is of great benefit.

3.2.6. Forward Declarations Work

Capture functions can be forward declared, similar to how GNU Nested Functions can be forward-declared if one uses the auto keyword in front of the definition of a Nested Functions:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int main(int argc, char* argv[]) {
  int in_reverse = 0;
  // forward-declared compare
  int compare(const void* untyped_left, const void* untyped_right) _Capture(in_reverse);
  // even though it is captured by value...
	
  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }

  /*
  compare.in_reverse; // CONSTRAINT VIOLATION: cannot access until definition
  */
	
  int compare_trampoline(const void* untyped_left,
    const void* untyped_right,
    void* user)
  _Capture() {
    return (*(typeof(compare)*)user)(untyped_left, untyped_right);
  }

  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  qsort_r(list, (sizeof(list)/sizeof(*list)), sizeof(*list),
    compare_trampoline,
    &compare
  );
  // define it here;
  // captures and arguments must be identical otherwise it is a violation
  // `in_reverse` is captured at the point of definition, not the point of declaration.
  int compare(const void* untyped_left, const void* untyped_right) _Capture(in_reverse) {
      const int* in_reverse = (const int*)user;
      const int* left = untyped_left;
      const int* right = untyped_right;
      return (*in_reverse) ? *right - *left : *left - *right;
  };
  return list[0];
}

Some important points for Capture Functions’s forward declarations:

the values captured have their values taken at the point of definition, not at the point of declaration;
because of this, one cannot use the . or the -> operators on a capture function declaration, only a definition;
and, forward declarations can only have the function call operation applied to them, before they are defined.

NOTE: It is unclear whether the Apple Blocks-like by-value capture should occur at the point of the first declaration or the definition. Currently, the reasoning is that using the definition for the values captured is better because there can be multiple forward declarations (in perhaps sprawling manners due to #include and other code copy-paste mechanisms) but only one definition, and thus the definition should be the important part.

NOTE: The other opinion is that restricting access to . and -> by effectively labeling the "object" part of the closure type as an incomplete type is not worth it, and that capturing things by-value at the first forward declaration is good.

There is no existing practice for this due to the way GNU Nested Functions -- the only existing practice where forward-declaraing the closure type is allowed -- work. That is, they only capture by-name/by-reference, and so the value is always the value at the point of execution and not at the point of evaluation. Apple Blocks and other solutions do not have this problem since they are expressions and thus do not have to engage in any sort of work with the "split between declaration vs. definition" issue at all.

3.2.7. Forward Declarations without a name are a bit useless

Forward declarations without a name are a bit silly, because they are unique types. An unnamed declarator means that there’s a unique type that has been forward-declared but serves no other purpose:

int main(int argc, char* argv[]) {
  typedef int (compare_closure1_t)(const void* untyped_left, const void* untyped_right) _Capture(=);
  typedef int (compare_closure2_t)(const void* untyped_left, const void* untyped_right) _Capture(=);
  typedef compare_closure_t compare_closure1_t;
  typedef compare_closure_t compare_closure2_t; // constraint violation: not a compatible type redeclaration
  return 0;
}

The syntax is kept for parity with the rest of the declarator syntax. It also allows a user to make a unique type within a translation unit, though there are obviously other ways to do this.

3.2.8. Capable of Recursion

Capture Functions are able to refer to themselves for the purpose of recursion. This means that __self_func ([__self_func]), unlike for expression-based/unnamed Function Literals/Lambdas/Block literals, is not required:

int main () {
  const int mult = 3;
  int tripling (int times, int start) _Capture(mult) {
    if (times >= 5) {
      return start;
    }
    return tripling(times + 1, start * mult); // normal recursion
  }
  return tripling(0, 1);
}

3.2.9. Capable of Self-Call

Capture Functions are able to refer to themselves in the argument list, and that means it is possible to form a pointer to itself. While they are incomplete types until after the end of the _Capture list, the type is still there and thus taking a pointer to such a type and using it after the opening { thus results in using a complete type, though the usage is somewhat weird:

int main () {
  const int mult = 3;
  int tripling (int times, int start, typeof(tripling)* self) _Capture(mult) {
    if (times >= 5) {
      return start;
    }
    return self[0](times + 1, start * mult, self);
    // or
    // return (*self)(times + 1, start * mult, self);
  }
  return tripling(0, 1, &tripling);
}

3.2.10. Not An Expression

The one true technical downside is that Capture Functions are declarations / definitions. They cannot be used (without the Statement Expression extension) in a function call’s argument list, which means that (short) closures and anonymous functions still need the full function definition. This is annoying and, honestly, one of the reasons § 3.3 Lambdas are preferred as a shorthand syntax.

It also means that, without Statement Expressions, Capture Functions cannot be used for the implementation of many macros which are typically expected to be usable as normal expressions.

3.2.11. Footgun: By-Name Capture Exceeds Captures’s Lifetime

A brief display of the undefined behavior:

auto ub (int parameter) {
  int automatic = 7;
  int fn() _Capture(parameter, &automatic) {
    return parameter + automatic;
  }
  return fn; // well-defined copy return
  // but dangling reference to `automatic`!
}

int main () {
  auto fn = ub(2);
  return fn(); // undefined behavior:
  // `automatic` no longer exists.
}

In general, undefined behavior occurs in the same way that it occurs within existing C code: use of an object after its lifetime has ended (in this case, an automatic storage duration object has gone out-of-scope). The fix for ub in this case is to capture automatic by-value. This makes it safe to copy that function object to the heap, or the stack. Additionally, no UB is possible by conversion to a function pointer.

3.2.12. Future Footgun: Wide Function Pointers

Wide function pointers, if and when they come to C, can make for footguns with capturing lambdas given that they will (likely) allow conversions from any Nested Function / Block / Lambda to them implicitly. Using a fictional wide function pointer syntax using %:

typedef int foo_fn_t(int);

foo_fn_t% call_me (int x) {
  return [x](int y) { return x + y; }; // converts to wide function pointer type!
  // undefined behavior if the return value is ever
  // called outside of this function 
}

int use_me(foo_fn_t% fn) {
  return fn(2);
}

int main () {
  int x = 30;
  return use_me(call_me(x));
}

This is a similar problem to Nested Functions returning a regular function pointer from a function call. Unfortunately, a conversion being allowed here is necessary to allow the 75%+ use case of passing it as a parameter, such as:

typedef int foo_fn_t(int);

void pass_to_me (foo_fn_t% func);

int main () {
  int x = 30;
  pass_to_me(
    [x](int y) { return x + y; }
  ); // converts to wide function pointer type!
  return 0; 
}

Thusly, in a future with a wide function pointer type, such a problem might be allowed. This is similar to the § 3.2.11 Footgun: By-Name Capture Exceeds Captures’s Lifetime. A special carveout in the specification for the return value case could be developed, but this would need work to avoid precluding useful cases.

3.3. Lambdas

Lambdas are simply a reskinned version of Capture Functions. They have all the same functionality, but with the benefits that they are:

expressions, and therefore can be used in-line in a function call as an argument or as part of an argument;
expressions, and therefore can be immediately invoked;
and, C++-compatible in their design.

We are deliberately leaving these as the only three benefits of lambdas over Capture Functions for the sole reason that, after Capture Functions, Lambdas will be VERY minimal effort to support. The reason for that is that they are, semantically, just a "Syntactic Reskin" of Capture Functions, save for their presence as an expression.

auto make_seven (int x) {
  int y = 7;
  return [x, y]() { return x * y; };
}

int main () {
  int x = 3;
  auto zero = [] () {
    // OK, no external variables used
    return 0;
  };
#if 0
  auto double_it = [] () {
    return x * 2; // constraint violation
  };
#endif
  auto triple_it = [x] () {
    return x * 3; // OK, x = 3 when called
  };
  auto quadruple_it = [&x] () {
    return x * 4; // OK, x = 5 when called
  };
  auto quintuple_it = [=] () {
    return x * 5; // OK, x = 3 when called
  };
  auto sextuple_it = [&] () {
    return x * 6; // OK, x = 5 when caled
  };
  x = 5;
  auto seven_tuple_it = make_seven(x);
  return zero() + triple_it() + quadruple_it()
    + quintuple_it() + sextuple_it() + seven_tuple_it();
  // return 109;
  // 0 + (3 * 3) + (5 * 4)
  // + (3 * 5) + (5 * 6)
  // + (5 * 7)
}

Given this, there is nothing else to write for this section: all of the benefits of Capture Functions (§ 3.2 Capture Functions: Rehydrated Nested Function) applies to these types in full, and just copying all of that text from one to another to say exactly the same thing is not important. We will instead just talk about the differences exclusively in comparison to Capture Functions in the next few sections.

3.3.1. Lambdas are Expressions

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

static int in_reverse = 0;

int main(int argc, char* argv[]) {
  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list),
    // expression, fits in-line
    [](const void* untyped_left, const void* untyped_right) {
      const int* left = (const int*)untyped_left;
      const int* right = (const int*)untyped_right;
      return (in_reverse) ? *right - *left : *left - *right;
    }
  );
	
  return list[0];
}

This also makes it suitable for use in macros, which is not something a regular Capture Functions can accomplish.

NOTE: This can be alleviated by using Statement Expressions, which would allow Capture Functions to work within typical macro contexts.

3.3.2. Recursion Is Impossible

Unfortunately, it is impossible to call a lambda from within itself (not without C++'s feature "deducing this", which requires templates and other things to work), and therefore that is another disadvantage. It can be fixed with the proposed __self_func feature ([__self_func]):

int main () {
  int tripling (int times, int start) {
    if (times >= 5) {
      return start;
    }
    return __self_func(times + 1, start * 3); // __self_func feature
  }
  return tripling(0, 1);
}

3.3.3. Capable of Self-Call

LAmbdas are able to refer to themselves in the argument list as they are object types. Unlike Capture Functions, which are incomplete during the argument list and complete after the _Capture specification, lambdas have their captures first. That means all the information to be a complete object is known by the time the argument list is opened. If __self_func is available ([__self_func]), this means one can use a "self" closure type without pointers or indirection:

int main () {
  const int mult = 3;
  auto tripling = [mult](int times, int start, typeof(__self_func) self) {
    if (times >= 5) {
      return start;
    }
    return self(times + 1, start * mult, self);
  }
  return tripling(0, 1, &tripling);
}

This is, obviously, not recommended because this takes the object itself by value and so can do lots of unnecessary copies. But, it can be taken as either a full complete object or just a pointer to said object, so the choice is up to the user since Lambdas are complete object types by the function argument list.

3.3.4. Trailing Return Types / Deduced Return Type

Finally, one may need to add the concept of a "trailing return type" to C in order to allow modifying the return type of a lambda. At the moment, the way a lambda with no specified return type works is that every single return statement must have exactly the same type (there is no negotiation for some "promoted" type or similar). That is, returning a long in one branch and an int in another branch is an error: they all must be cast to int or they all must be cast to long:

int main () {
  auto okay0 = []() {
    if (1) {
      return 0;
    }
    else {
      return 0;
    }
  }(); // ok
  auto violation0 = []() {
    if (1) {
      return 0U;
    }
    else {
      return 0L;
    }
  }(); // constraint violation: different return types
  auto okay1 = []() {
    if (1) {
      return (unsigned long long)0U;
    }
    else {
      return (unsigned long long)0L;
    }
  }(); // ok: cast to identical types
  return 0;
}

This can be extremely annoying to deal with. Trailing return types fix this problem by allowing lambdas to use a trailing -> type-name to have the function return type become type-name:

int main () {
  auto okay0 = []() {
    if (1) {
      return 0;
    }
    else {
      return 0;
    }
  }(); // ok
  auto violation0 = []() -> unsigned int {
    if (1) {
      return 0U;
    }
    else {
      return 0L;
    }
  }(); // now okay: fixed return type, conversions happen normally
  auto okay1 = []() {
    if (1) {
      return (unsigned long long)0U;
    }
    else {
      return (unsigned long long)0L;
    }
  }(); // ok: cast to identical types
  return 0;
}

This fixes other problems in the C language as well, such as not being able to specify functions with proper variable-length array returns without using ugly syntax. The auto part only applies for regular function definitions, and could also be applied to Capture Functions for ease-of-use (but is not required for it to function appropriately). One could also just have auto but no -> to have regular functions achieve the lambda behavior, where all return expressions must evaluate to the exact same type. Not having a return or having a return; both imply the return type is void, and thus any other kind of return <expr>; in that function would be illegal.

3.3.5. Forward-declaration is Impossible

It is impossible to forward-declare a lambda given the fact that every lambda is a definition of a unique object as an expression and does not do the usual declaration/definition split. Implementations may fold identical lambdas together but that is only observable as an optimization, and is not a guarantee of the design. This contrasts with Capture Functions, wherein they can be declared (with all of their captures) under a specific identifier in a specific scope and reserve that identifier for that type, and then later defined in that scope or a dependent scope.

That is the full set of notable technical differences between Lambdas and Capture Functions.

3.4. Measuring Solution Spaces

One of the ways we can increase the confidence we have in our data is to provide benchmarks for the things we are working on proposing to the C standard. We do not mind adding new and improved functionality to the benchmarks to do more measurements it is an open set and we would appreciate any help in benchmarking or measuring or coming up with new ways to observe behavior.

3.4.1. Donald Knuth’s Man or Boy Test

This is a set of benchmarks using Donald Knuth’s Man or Boy program, which tests self-references and recursion in the same function / closure object. It also flexes a number of properties that can evaluate the quality of a closure implementation, and so is suitable as a microbenchmark. Both a linear version and a logarithmic version of the graphs are made available due to how the performance differences skew things wildly in one direction or another. The benchmark’s source code is available in the ztd.idk repository ([ztd-idk-closures-benchmark]).

3.4.1.1. Methodology

The tests were ran on a 13-inch 2020 MacBook Pro M1. It has 16 GB of RAM and is on MacOS 15.7.2 Sequoia at the time the test was taken, using the stock MacOS AppleClang Compiler and the stock brew install gcc compiler in order to produce the numbers seen on December 28^th, 2025.

The experimental setup used the Man or Boy test, but with the given k value loaded by calling a function in a DLL / Shared Object. The expected k value that the Man or Boy test is supposed to yield is also loaded from a DLL / Shared Object. This prevents optimizing out all recursion and doing enough ahead-of-time computation to simply collapse the benchmarked code into a constant-time, translation-time calculation. It ensures the benchmark is actually measuring the actual performance characteristics of the technique used, as all of them are computing from the same initial k value and all of them are expected to produce the same expected_k answer.

There 2 measures being conducted: Real ("wall clock") Time and CPU Time. The time is gathered by running a single iteration of the code within a for loop. That loop runs anywhere from a couple thousand to hundreds of thousands of times to produce confidence in that run of the benchmark, and each loop run is considered an individual iteration. The iterations are then averaged to produce the first point after there is confidence that the measurement is accurate and the benchmark is warm. The iteration process to produce a single mean was then repeated 150 times. All 150 means are used as the points for the values (shown as transparent dots) on the bar graph, and the average of all of those 150 means is then used as the height of a bar in a bar graph.

The bars are presented side-by-side as a horizontal bar chart with various categories of C or C++ code being measured. The 13 total categories of C and C++ code are:

no-op: Literally doing nothing. It’s just there to test environmental noise and make sure none of our benchmarks are so off-base that we’re measuring noise rather than computation. Helps keep us grounded in reality.
Normal Functions: regular C functions which add an extra argument to the function call in order to pass more data. Somewhat similar in representation to rewriting qsort to qsort_r/qsort_s to pass a user data pointer.
Normal Functions (Rosetta Code): regular C functions which add an extra argument to the function call in order to pass more data. Taken directly from the Rosetta Code weekly, and uses a pointer int* k to refer to an already-existing value of k during a series of recursive calls.
Normal Functions (Static): regular C function which uses a static variable to pass the specific context to the next function. Not thread safe.
Normal Functions (Thread Local): same as "Normal Functions (Static)" but using a thread_local variable instead of a static variable. Obviously thread safe.
Lambdas (No Function Helpers): a solution using C++-style lambdas. Rather than using helper functions like f0, f1, and f_1, we compute a raw lambda that stores the value meant to be returned for the Man-or-Boy test (with a body of just return i;) in the lambda itself and then pass that uniquely-typed lambda to the core of the test. The entire test is templated and uses a fake recursion template parameter to halt the translation-time recursion after a certain depth.
Lambdas: The same as above but actually using int f0(void), etc. helper functions at the start rather than lambdas. Tries to reduce optimizer pressure by using “normal” types which do not add to the generated number of lambda-typed, recursive, templated function calls.
Lambdas (std::function_ref): The same as above, but rather than using a function template to handle each uniquely-typed lambda like a precious baby bird, it instead erases the lambda behind a std::function_ref<int(void)>. This allows the recursive function to retain exactly one signature.
Lambdas (std::function): The same as above, but replaces std::function_ref<int(void)> with std::function<int(void)>. This is an allocating, C++03-style type.
Lambdas (Rosetta Code): The code straight out of the C++11 Rosetta Code Lambda section on the Man-or-Boy Rosetta Code implementation.
Apple Blocks: Uses Apple Blocks to implement the test, along with the __block specifier to refer directly to certain variables on the stack.
GNU Nested Functions (Rosetta Code): The code straight out of the C Rosetta Code section on the Man-or-Boy Rosetta Code implementation.
GNU Nested Functions: GNU Nested Functions similar to the Rosetta Code implementation, but with some slight modifications in a hope to potentially alleviate some stack pressure if possible by using regular helper functions like f0, f1, and f_1.
Custom C++ Class: A custom-written C++ class using a discriminated union to decide whether it’s doing a straight function call or attempting to engage in the Man-or-Boy recursion.
C++03 shared_ptr (Rosetta Code): A C++ class using std::enable_shared_from_this and std::shared_ptr with a virtual function call to invoke the “right” function call during recursion.

Each bar graph has a black error bar at the end, representing the standard error of the measurements performed. At 150 iterations, the error bars (which are most easily understood and read in the linear graphs) are a decent visual approximation of whether or not two solutions are within a statistical threshold of one another.

The two compilers tested are Apple Clang 17 and GCC 15. There are two graph images for each kind of measurement (linear, logarithmic, and linear-but-with-outliers-removed) because one is for Apple Clang and the other is for GCC. This is particularly important because neither compiler implements the other’s closure extension (Clang does Apple Blocks but not Nested Functions, while GCC does Nested Functions in exclusively its C frontend but does not implement Apple Blocks).

MSVC was not tested because MSVC implements none of the extensions being tested, and we do not expect that its performance characteristics would be wildly different.

3.4.1.2. Results

The result graphs are as follows, presented in four images for both MacOS ARM64 with Apple Clang 17 and GCC 15, and Windows AMD64 with Clang 21 and GCC 15:

Because some of the checked categories perform so astronomically poorly, we have a version of the graphs that are logarithmic, which are the two following graphs:

This obscures some of the changes in performance because it skews towards big order-of-magnitude differences. To show the differences without the heavy skew due to the extremely poor performance of the bottom two/three categories, we also provide a "focused" graph which eliminates the poor performers and shows the linear performance of the more competitive top 12 performers.

A full writeup is available at [closures-in-c-benchmark] and [closures-in-c-benchmark-followup] which includes methodology and other information about the tests.

3.4.1.3. Conclusions, Comparisons to Other Proposals, and Inferences

Note: A wide function pointer type is necessary no matter the solution chosen.

Note: A fixed, statically-known object that can be optionally type-erased behind a Wide Function Pointer seems optimal for even complex usages of closures in C.

Note: Apple Blocks and GNU Nested Functions have intrinsic design flaws that drastically impact their performance, meaning that even transporting a context pointer through a global variable with a normal C function is better.

Note: Manual management of closure pointer-to-function pointer trampolines may produce much better code generation and quality than implementation can currently handle, while allowing the user to have as much or as little security as possible.

In addition to the key takeaways above, some other details we have understood are as follows. C++-style lambdas have the capability to be both awful (Lambdas using the Rosetta Code technique) and powerful (Lambdas with Perfect Type Information). Both of these techniques are unusable as-is in C as the things that make them awful or great are tied not due to lambdas design but moreso how they are used (e.g., std::function abstractions or translation-time recursion prevention.) Normal C functions (storing the integer k by value) can achieve near-parity with C++-style Lambdas at their best in terms of performance. However, it requires the modification of the function signature, which may not be viable in all cases (such as calling already-compiled interfaces or working with FFI). The use of static and thread_local to pass information across function boundaries comes with an unshakeable and implicit cost. thread_local is -- for obvious reasons -- more expensive than just static, but both incur overhead.

After evaluating current industry extensions, we find them to be strangely poorly-performing despite being decades-old in some cases. Apple Blocks ([apple-blocks]) and similar heap-based solutions incur a fixed overhead due to the Blocks Runtime scheme, making it unsuitable for resource-constrained C environments. This means it is not worth pursuing this as a long-term solution in our technical opinion. GNU Nested Functions ([nested-functions]), as a quirk of its current most popular implementation, really inhibit inlining and other similar optimizations. A different implementation (e.g., the current -ftrampoline-impl=heap work that is currently available on 5 platforms in GCC trunk at time-of-writing) could be better. But, the fact that a typical quality of implementation provides such awful performance characteristics compared to every other solution despite being the oldest solution means that they too should be looked at skeptically for direct ISO C standardization. The heap-based trampolines of GNU Nested Functions also look like they may incur a similar fixed overhead just like the Apple Blocks implementation.

There may be a way to avoid this by simply reducing the context pointer and the actual function pointer to something more directly usable with the __builtin_call_with_static_chain intrinsic available in GCC and Clang ([builtin_call_with_static_chain_gcc]). We expect that the performance benefits here could be quite good, but this is not proven and not implemented yet, unlike Lambdas and other solutions which have plenty of implementation and perform well based on user choice of type erasure or allocation or storage.

The Function Literals and Local Functions ([n3678], [n3679]) proposals can have their performance approximated in their best-case as following the performance of the series of Normal Functions. This requires modifying the function signature. If the function signature isn’t modified, there isn’t really a point of comparison in these benchmarks other than falling back to other means of context transportation. All of these solutions vary wildly in performance, from Apple Blocks runtimes to various style of trampolines. This makes it impossible to understand Function Literals and Local Functions as solving the problem at hand, because they provide no plausible forward path on which to judge. Thus, we consider them incomplete.

Accessing the Context of Nested Functions ([n3654]), even in its later iterations, approximates either the Normal Functions (can modify the signature, best case) or GNU Nested Functions (([nested-functions], [n2661])) (creates an invisible trampoline to transport closure context, worst case) performance. We are unsure of how this would continue given the deeply negative impact given access to the current invocation’s stack frame / "function environment" is on the optimizer’s ability to push for performance. While we think there is potential for this, we believe explicitly or implicitly capturing just the data -- and not the environment or stack frame abstraction itself -- would result in better performance characteristics no matter the design.

Finally, any solution is going to need a Wide Function Pointer type in the C ecosystem to make it usable and worthwhile. In particular, Lambdas with std::function_ref is a directly applicable proxy to what "Capture Functions and Wide Function Pointers" and/or "Lambdas and Wide Function Pointers" could bring in terms of worst-case performance to the C ecosystem as a whole.

4. Wording

THIS SECTION IS NOT GOING TO BE OFFICIAL UNTIL THE DESIGN SHAKEDOWN IS COMPLETE.

NOTE: THIS PROPOSAL WILL ONLY INCLUDE LANGUAGE (CLAUSE 6) WORDING. LIBRARY WORDING WILL BE DONE AFTER THE LANGUAGE IS HANDLED, LIKELY IN A SEPARATE PROPOSAL.

4.1. `__self_func` Changes

4.1.1. Modify "Predefined identifiers" (6.4.3.2)

change the mention of "execution encoding" in this section to instead be "literal encoding (6.2.9)";
and, add constexpr to the list of storage class specifiers for static const char __func__[] = "function-name".

4.1.2. Add the new keyword `__self_func` to §6.4.2

Syntax

¹ keyword: one of

...

__self_func

4.1.3. Add `__self_func` to the primary-expression grammar of §6.5.2

Syntax

¹ primary-expression:

identifier

constant

string-literal

( expression )

generic-selection

__self_func

4.1.4. Add a new section §6.5.2.✨ "`__self_func` after §6.5.2.1 "Generic selection"

6.5.2.✨ __self_func

Constraints

__self_func shall only appear in the body of an invocable, and refers to the innermost invocable scope it is inside.

Semantics

__self_func is either:

the lvalue of the closure (6.2.✨1) that it is contained within;

or, the function designator (6.3.3.1) designating and having the type of the function it is used in.

4.2. Core, Shared Changes

4.2.1. Modify §6.2.1 "Scopes of identifiers, type names, and compound literals"

6.2.1 Scopes of identifiers, type names, and compound literals

An identifier can denote:

a standard attribute, an attribute prefix, or an attribute name;

an object;

a function;

a closure;

a tag or a member of a structure, union, or enumeration;

a typedef name;

a label name;

a macro name;

or, a macro parameter.

For each different entity that an identifier designates, the identifier is visible (i.e. can be used) only within a region of program text called its scope. Different entities designated by the same identifier either have different scopes or are in different name spaces. There are four kinds of scopes: ~~function~~ invocable , file, block, and function prototype. (A function prototype is a declaration of a function.)

A label name is the only kind of identifier that has ~~function scope~~ invocable scope . It can be used (in a goto statement) anywhere in the ~~function~~ body of the invocable (6.2.✨0) in which it appears ~~, and~~ excluding the body of any nested invocables, unless otherwise specified. It is declared implicitly by its syntactic appearance (followed by a : and a statement). Each invocable body has an invocable scope that is separate from the invocable scope of any other invocable body. In particular, a label is visible in exactly one invocable scope (the innermost body in which it appears) and distinct invocable bodies may use the same identifier to designate different labels.

...

NOTE Properties of the feature to which an identifier refers are not necessarily uniformly available within its whole scope of visibility. Examples are identifiers or functions with an incomplete type that is only completed in a subscope of its visibility, labels that are only valid targets of goto statements when the jump does not cross the scope of a VLA, identifiers of objects to which the access is restricted in specific contexts such as signal handlers or closures, or library features such as setjmp where the use is restricted to a specific subset of the grammar.

4.2.2. Modify §6.2.5 "Types"

6.2.5 Types

...

...

Any number of derived types can be constructed from the object and function types, as follows:

— ...

A function type describes a function with specified return type. A function type is characterized by its return type and the number and types of its parameters. A function type is said to be derived from its return type, and if its return type is T, the function type is sometimes called "function returning T". The construction of a function type from a return type is called "function type derivation".

A closure type describes a structure or union type that is similar to a function with a specified or inferred return type (6.2.✨1). It is characterized by: its return type; the number, order, and type of its parameters; its lexical position in the program; and, the number, order, and type of its captures. The function type that has the same return type and list of parameter types as the closure type is the closure’s function type. A closure type is said to be derived from its function type’s return type and, if present, any of its captures. If its return type is T, the closure type is sometimes called "closure returning T" or "closure with captures returning T" (referring to closures with any number of captures, including zero captures).

...

These methods of constructing derived types can be applied recursively.

...

A complete type shall have a size that is less than or equal to SIZE_MAX. A type has known constant size if it is complete and is not a variable length array type.

An invocable type is either a closure type, a function type, or a pointer to function type (6.2.✨0).

A closure literal type is a closure type characterized by having no captures. If its return type is T, the closure literal type is sometimes called "closure literal returning T" or "closure with no captures returning T". Closure literal types are a proper subset of the closure types.

...

4.2.3. Modify §6.2.7 "Compatible type and composite type"

6.2.7 Compatible type and composite type

...

A composite type can be constructed from two types that are compatible. If both types are the same type, the composite type is this type. Otherwise, it is a type that is compatible with both and satisfies the following conditions:

If both types are structure types or both types are union types, the composite type is determined recursively by forming the composite types of their members.

If both types are array types, the following rules are applied:

If one type is an array of known constant length, the composite type is an array of that length.

Otherwise, if one type is a variable length array whose length is specified, the composite type is a variable length array of that length.

Otherwise, if one type is a variable length array of unspecified length, the composite type is a variable length array of unspecified length.

Otherwise, both types are arrays of unknown length, and the composite type is an array of unknown length.

The element type of the composite type is the composite type of the two element types.

If both types are function types, the type of each parameter in the composite parameter type list is the composite type of the corresponding parameters.

If both types are closure types, the composite function type of the closure types' function types and the composite of the implementation-defined structure or union type of the closure types is the composite type of the two closure types.

If one of the types has a standard attribute, the composite type also has that attribute.

If both types are enumerated types, the composite type is an enumerated type.

If one type is an enumerated type and the other is an integer type other than an enumerated type, it is implementation-defined whether or not the composite type is an enumerated type.

These rules apply recursively to the types from which the two types are derived.

4.2.4. Add a new section §6.2.✨0 "Invocable", likely §6.2.10

6.2.✨0 Invocable

Syntax

parameter-clause:

( parameter-type-list_opt )

Description

An invocable is something with zero or more parameters (and possibly zero or more captures if it is a closure) that may be invoked/called to trigger an entry into and subsequent execution of an associated series of statements and/or expressions.

Certain invocables can be declared (6.7.7) and potentially used before they are defined.

Invocables are:

functions and pointers to functions (6.3.3.1, 6.5.3.3, 6.7.7.4, 6.9.2);

and, closures (6.2.✨1).

The associated series of statements and/or expressions is called the invocable body or body of the invocable. The body of an invocable has its own scope, and that scope includes any captures it was created with and arguments it was invoked with. The scope that includes just the associated series of statements and/or expressions without the arguments or captures is called the invocable statement list. The scope in which an invocable is declared or defined is called the invocable’s surrounding scope, which is either the block scope of another invocable or file scope.

As part of the declaration or definition of its parameter list, an invocable may include an ellipses in its list, either as the sole argument or at the end of its list. This is called a variadic parameter. Arguments supplied to an invocable whose positions match or come after the ellipses in the parameter list are its varying arguments. Any invocable which contains a variadic parameter is a variadic invocable. Additionally, functions with a variadic parameter are sometimes specifically called variadic functions and closures with a variadic parameter are sometimes specifically called variadic closures.

Constraints

Although variable length array types of unspecified size and incomplete types can be used as part of a parameter declaration for the declaration of an invocable, they shall not be used as part of a parameter declaration in the definition of an invocable.

The only storage-class specifier that shall occur in a parameter declaration is register.

After adjustment, the parameters in a parameter type list in an invocable declarator (such as a function declarator) that is part of a definition of that invocable shall not have incomplete type.

Except for void, an invocable shall not specify a return type that is a function type, an array type, or an incomplete type.

A parameter declaration shall not specify a void type, except for the special case of a single unnamed parameter of type void with no storage-class specifier, no type qualifier, and no following ellipsis terminator. If the parameter list consists of a single parameter of type void, the parameter declarator shall not include an identifier.

Semantics

An invocable can have a sequence of arguments passed to it that comply with the constraints and requirements of the invocable’s list of parameters. An invocable’s parameters have automatic storage duration.

The identifier of the parameter, if any, is an lvalue in the invocable body. Variable length array types of unspecified size shall not be used as part of a parameter declaration in an invocable definition. The layout of the storage for parameters is unspecified. The type of each parameter is adjusted as described later in this subclause.

NOTE A parameter that has no declared name is inaccessible within the invocable body. A parameter’s identifier cannot be redeclared in the invocable body except in an enclosed block. The visibility scope of a parameter in a function definition starts when its declaration is completed, extends to following parameter declarations, to possible attributes that follow the parameter type list, and then to the entire function body. The lifetime of each instance of a parameter starts when the declaration is evaluated starting a call and ends when that call terminates.

The special case of an unnamed parameter of type void as the only item in the parameter list specifies that the function has no parameters.

A declaration of a parameter as "array of type" shall be adjusted to "qualified pointer to type", where the type qualifiers (if any) are those specified within the [ and ] of the array type derivation. If the keyword static also appears within the [ and ] of the array type derivation, then for each call to the function, the value of the corresponding actual argument must provide access to the first element of an array with at least as many elements as specified by the size expression.

If, in a parameter declaration, an identifier can be treated either as a typedef name or as a parameter name, it shall be taken as a typedef name.

A declaration of a parameter as "function returning type" shall be adjusted to "pointer to function returning type", as in 6.3.3.1.

If the invocable declarator is not part of a definition of that invocable, parameters can have incomplete type and can use the [*] notation in their sequences of declarator specifiers to specify variable length array types.

The storage-class specifier in the declaration specifiers for a parameter declaration, if present, is ignored unless the declared parameter is one of the members of the parameter type list for the definition of the invocable. The optional attribute specifier sequence in a parameter declaration appertains to the parameter.

On entry to the invocable, the size expressions of each variably modified parameter and typeof operators used in declarations of parameters are evaluated and the value of each argument expression is converted to the type of the corresponding parameter as if by assignment. (Array expressions and function designators as arguments were converted to pointers before the call.)

After all parameters have been assigned, the invocable statement list is executed.

Upon return to its invoker/caller from either finishing the execution of its body or returning, an invocable can either return nothing (indicated by the return type void) or return a value of object type (6.8.7.5). Unless otherwise specified, if the end of the invocable body is reached (such as the terminating {), and the value of the function call is used by the caller, the behavior is undefined.

4.2.5. Add a new section §6.2.✨1 "Closures", after §6.2.✨0 "Invocables", likely §6.2.11

6.2.✨1 Closures

6.2.✨1.1 General

Description

A closure is a structure or union object of invocable type which has zero or more captures and zero or more parameters.

A closure’s object is composed of at least any information necessary to invoke it with its arguments and for its implicit or explicit captures (6.2.✨1.2).

Constraints

Within the invocable body of a closure, identifiers shall be used according to the usual scoping rules. Captures and parameters can be accessed by their name in the invocable body. However, no identifier that is a variably modified type or an object of variably modified type shall be captured and used in the invocable statement list.

An identifier that is an object of automatic storage duration which has a storage-class specifier of constexpr may also be used, but its address shall not be taken in the invocable body of the closure.

An object of closure literal type shall be convertible to a function pointer.

Semantics

A closure with zero captures has closure literal type. Otherwise, it has closure type.

A closure literal is convertible to a function pointer. That function pointer is the same for every closure literal of that closure literal type. An invocation of that function pointer invokes the associated body of that closure with the provided arguments.

A closure’s size, layout, and representation are all implementation-defined unless otherwise specified. Each closure’s type is unique, and is characterized by:

any captures it can have;

the function type it has, particularly any parameters it can have and its return type;

and, the lexical position of the first declaration and/or definition.

A closure’s type is not required to be compatible with any other type, unless otherwise specified.

NOTE Two closures with identical captures and parameter types are not necessarily compatible (e.g. assignable) to one another.

Unless otherwise specified for the associated closure, each captured value can be accessed with the member access operator . on that closure, or with the member access operator -> on a pointer to that closure, using the capture’s name. The layout of the storage for the captures of a closure is implementation-defined.

NOTE While accessing captures uses the member access operators, there is no requirements or constraints on what the actual members, layout, and other details of a closure is.

6.2.✨1.2 Captures

Syntax

capture-list

capture-default

capture-name-list

capture-default , capture-name-list

capture-name-list:

capture

capture-name-list , capture

capture-default:

=

&

capture:

&_opt identifier capture-rename_opt

capture-rename:

= assignment-expression

Description

"Capturing" an object or a "captured" object describes the process of making visible objects from a surrounding scope visible in an inner, more nested invocable scope.

A capture list allows access to objects visible in a certain scope named by the capture’s identifier or computed by the optional capture rename. A capture is an entry in the capture list.

The = capture default is a default value capture. The & capture default is a default reference capture. Either one is called a default capture. The identifier in a capture is the capture name. The optional capture rename is the capture initialization expression. A capture with no capture initialization expression is called a identifier capture. A capture with a capture initialization expression is called an expression capture.

If present, the closure which the capture partly characterizes and modifies is called the associated closure. The capture list’s associated scope is the surrounding scope of the associated closure unless otherwise specified.

A capture name with no preceding & is called a value capture. A capture name preceded by a & is called a reference capture.

Constraints

The capture name shall appear at most once across the capture list and, if present, the parameter names of an associated closure. If the capture list’s associated scope is file scope, then no captures or default captures are permitted in the list.

For a capture that is a reference capture and an identifier capture, the identifier shall be an object which is addressable. For a capture that is a reference capture and an expression capture, the capture initialization expression shall be an addressable lvalue.

For an identifier capture, the capture name shall be a visible identifier of automatic storage duration from the associated scope.

If a default value capture is specified, then subsequent captures shall only be reference captures. If a default reference capture is specified, then subsequent captures shall only be value captures.

The type of the capture shall not be a variably modified type.

NOTE Parameters of variably modified type are allowed to be captured because their type after adjustment is a pointer type.

Semantics

Capture lists are evaluated when their associated closure is defined and evaluated. The order of evaluation for captures are sequenced in order of declaration. The capture name is complete after its optional capture initialization expression. An earlier capture may occur within the capture initialization expression of a later capture.

Value captures provide the value at time of evaluation of the capture when used. It is either the value of the identifier it refers to if it is an identifier capture, or the value of the capture initialization expression used to compute it if it is an expression capture. Unless otherwise specified, value captures have either:

the storage duration of an associated closure (if present);

or, automatic storage duration.

NOTE A possible implementation of value captures is having a stored member in the closure object that is accessed every time it is used within the associated closure.

Reference captures are lvalues that either:

refer to an identifier which exists in the surrounding scope identified by the capture name (if it is an identifier capture);

or, to the addressable lvalue of the capture initialization expression at the time of evaluation (if it is an expression capture).

If a reference capture is used after the lifetime of what it refers to or addresses finishes, the behavior is undefined.

NOTE A possible implementation of reference captures is storing a value capture of the address of the desired addressable lvalue in the closure object, and automatically dereferencing it upon its use in the associated closure.

If a default capture is present, then it behaves as if all automatic storage duration objects visible in the surrounding scope are:

value captured, for =;

or, reference captured, for &.

For a capture, let T_{capture-initial} be:

typeof(identifier), where identifier is the capture name if it is identifier capture;

or, the type of the capture initialization expression if it is an expression capture.

Let T_capture be the type of the capture after lvalue, array-to-pointer, or function designator to pointer conversion is applied to T_{capture-initial}. The type of a capture is T_capture const, unless otherwise specified.

NOTE A capture therefore will not have array type itself, but a member of the captured type can possibly have array type.

Recommended practice

If all captures in a capture list are reference captures, implementations are encouraged to take advantage of potential layout and storage optimizations which respect to the lifetime of any associated closure.

4.2.6. Modify §6.3.3.1 "Lvalues, arrays, and function designators"

6.3.3.1 Lvalues, arrays, and function designators

...

A function designator is an expression that has ~~function~~ invocable type. Except when it is the operand of the sizeof operator, a typeof operator, or the unary & operator, a function designator with type ~~"function returning type" is converted to an expression that has type "pointer to function returning type"~~ :

"function returning R";

or, "closure literal returning R",

is converted to an expression that has type "pointer to function returning R", where "R" is the return type.

4.2.7. Modify §6.5.3.3 "Function calls"

In order,

Replace the title "Function calls" with "Invocation"
Outside the EXAMPLE and NOTE text, replace every instance of "type pointer to function" with "invocable type".
Outside the EXAMPLE and NOTE text, replace every instance of "function type" with "invocable type".
Outside the EXAMPLE and NOTE text, replace every instance of "a function" with "an invocable".
Outside the EXAMPLE and NOTE text, replace every instance of "function" with "invocable", EXCEPT:
- "... ellipses in a variadic function declarator (6.7.7.4) ..." becomes "... ellipsis notation in a variadic invocable (6.2.✨0) ..."
Add a new description section before the Constraints:

Description
¹ Invocation, sometimes referred to as a "function call", "invocable call", "closure call", "function invocation", or "closure invocation", refers to calling an invocable (e.g. a function or (closure) object) and triggering the execution of the invocable’s statement list, as described in 6.2.✨0.

4.2.8. Modify §6.5.5 "Cast operators"

6.5.5 Cast operators

Constraints

...

A pointer type shall be converted only to void, an integer type, or a pointer type. Only a pointer, integer, closure literal or nullptr_t type shall be converted to a pointer type. The type nullptr_t shall not be converted to any type other than void, bool or a pointer type. If the target type is nullptr_t, the cast expression shall be a null pointer constant or have type nullptr_t. A closure literal type shall not be converted to any pointer type other than a pointer to the closure’s function type.

Semantics

...

4.2.9. Modify §6.5.17.2 "Simple assignment"

6.5.17.2 Simple assignment

Constraints

One of the following shall hold:

the left operand has atomic, qualified, or unqualified arithmetic type, and the right operand has arithmetic type;

the left operand has an atomic, qualified, or unqualified version of a structure or union type compatible with the type of the right operand;

the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left operand has all the qualifiers of the type pointed to by the right operand;

the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) one operand is a pointer to an object type, and the other is a pointer to a qualified or unqualified version of void, and the type pointed to by the left operand has all the qualifiers of the type pointed to by the right operand;

the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) one operand is a pointer to a function type, and the other is a closure literal type whose function type is compatible with the left’s pointed to function type;

the left operand has an atomic, qualified, or unqualified version of the nullptr_t type and the right operand is a null pointer constant or its type is nullptr_t;

the left operand is an atomic, qualified, or unqualified pointer, and the right operand is a null pointer constant or its type is nullptr_t; or

the left operand has type atomic, qualified, or unqualified bool, and the right operand is a pointer or its type is nullptr_t.

4.2.10. Remove §6.7.7.4 "Function declarators" to rewrite in a new version it in terms of invocables

6.7.7.4 Function declarators
Description

If, in the declaration "T D1", D1 has the form

D parameter-clause attribute-specifier-sequence_opt

and the type specified for ident in the declaration "T D" is "derived-declarator-type-list T", then it is an invocable declarator (6.2.✨0) and the type specified for ident is "derived-declarator-type-list function returning the unqualified, non-atomic version of T".

Semantics

The optional attribute specifier sequence appertains to the function type.

Two function types are compatible if and only if all of the following hold:

They specify compatible return types.

The parameter type lists agree in the number of parameters and in whether the function is variadic or not.

The corresponding parameters have compatible types.

In the determination of type compatibility and of a composite type, each parameter declared with function or array type is taken as having the adjusted type and each parameter declared with qualified type is taken as having the unqualified version of its declared type.
EXAMPLE The declaration
int f(void), *fip(), (*pfi)();
declares a function f with no parameters returning an int, a function fip with no parameters returning a pointer to an int, and a pointer pfi to a function with no parameters returning an int. It is especially useful to compare the last two. The binding of *fip() is *(fip()), so that the declaration suggests, and the same construction in an expression requires, the calling of a function fip, and then using indirection through the pointer result to yield an int. In the declarator (*pfi)(), the extra parentheses are necessary to indicate that indirection through a pointer to a function yields a function designator, which is then used to call the function; it returns an int.

If the declaration occurs outside of any function, the identifiers have file scope and external linkage. If the declaration occurs inside a function, the identifiers of the functions f and fip have block scope and either internal or external linkage (depending on what file scope declarations for these identifiers are visible), and the identifier of the pointer pfi has block scope and no linkage.
EXAMPLE The declaration
int (*apfi[3])(int *x, int *y);
declares an array apfi of three pointers to functions returning int. Each of these functions has two parameters that are pointers to int. The identifiers x and y are declared for descriptive purposes only and go out of scope at the end of the declaration of apfi.
EXAMPLE The declaration
int (*fpfi(int (*)(long), int))(int, ...);
declares a function fpfi that returns a pointer to a function returning an int. The function fpfi has two parameters: a pointer to a function returning an int (with one parameter of type long int), and an int. The pointer returned by fpfi points to a function that has one int parameter and accepts zero or more additional arguments of any type.
EXAMPLE The following prototype has a variably modified parameter.
void addscalar(int n, int m, double a[n][n*m+300], double x);

int main(void)
{
  double b[4][308];
  addscalar(4, 2, b, 2.17);
  return 0;
}

void addscalar(int n, int m, double a[n][n*m+300], double x)
{
  for (int i = 0; i < n; i++)
    for (int j = 0, k = n*m+300; j < k; j++)
      // a is a pointer to a VLA with n*m+300 elements
      a[i][j] += x;
}
EXAMPLE The following are all compatible function prototype declarators.
double maximum(int n, int m, double a[n][m]);
double maximum(int n, int m, double a[*][*]);
double maximum(int n, int m, double a[ ][*]);
double maximum(int n, int m, double a[ ][m]);
as are:
void f(double (* restrict a)[5]);
void f(double a[restrict][5]);
void f(double a[restrict 3][5]);
void f(double a[restrict static 3][5]);
The last declaration also specifies that the argument corresponding to a in any call to f can be expected to be a non-null pointer to the first of at least three arrays of 5 doubles, which the others do not.

📝 IMPORTANT Editor’s Note: Undefined Behavior List J.2 references 6.7.7.4 -- change to 6.2.✨0.

4.2.11. Modify §6.8 "Statements and blocks"'s §6.8.1 "General" for blanket jump/label banning

6.8 Statements and blocks

6.8.1 General

Constraints

Unless otherwise specified, every jump and label is associated with the innermost invocable body it is contained within. Nested declarations or definitions of invocable bodies shall not refer to jump, iteration, or labeled statements in the invocable’s surrounding scope and shall only refer to a visible label or statement in the body of the associated invocable’s scope.

Semantics

A statement specifies an action to be performed. Except as indicated, statements are executed in sequence. The optional attribute specifier sequence appertains to the respective statement.

4.2.12. Modify §6.8.3 "Compound Statement"

6.8.3 Compound Statement

Semantics

A compound statement that is a ~~function body~~ invocable body together with the parameter type list and the optional attribute specifier sequence between them forms the block associated with the function definition or closure definition in which it appears. Otherwise, it is a block that is different from any other block. A label that is not followed by another label or an unlabeled statement shall be translated as if it were followed by a null statement.

4.2.13. Rewrite §6.9.2 "Function definitions" in terms of Invocables

6.9.2 Function definition
Description

A function definition defines (or declares and defines) the invocable body of a function call. Its surrounding scope is always file scope, even if a function declarator at block scope is the first encountered declaration of such a function.

Constraints

The identifier declared in a function definition (which is the name of the function) shall have a function type, as specified by the declarator portion of the function definition.

The storage-class specifier, if any, in the declaration specifiers shall be either extern or static.

Semantics

The optional attribute specifier sequence in a function definition appertains to the function.

The declarator in a function definition specifies the name of the function being defined and the types (and optionally the names) of all the parameters; the declarator also serves as a function prototype for later calls to the same function in the same translation unit. The type of each parameter is adjusted as described in 6.2.✨0.
NOTE In a function definition, the return type of the function and its prototype cannot be inherited from a typedef:
typedef int F(void);
// type F is "function with no parameters returning int"

F f, g;                      // f and g both have type compatible with F
F f { /* ... */ }            // WRONG: syntax/constraint error
F g() { /* ... */ }          // WRONG: declares that g returns a function
int f(void) { /* ... */ }    // RIGHT: f has type compatible with F
int g() { /* ... */ }        // RIGHT: g has type compatible with F
F *e(void) { /* ... */ }     // e returns a pointer to a function
F *((e))(void) { /* ... */ } // same: parentheses irrelevant
int (*fp)(void);             // fp points to a function that has type F
F *Fp;                       // Fp points to a function that has type F
EXAMPLE In the following:
extern int max(int a, int b)
{
  return a > b ? a: b;
}
extern is the storage-class specifier and int is the type specifier; max(int a, int b) is the function declarator; and { return a > b ? a: b; } is the function body.
EXAMPLE To pass one function to another, one can say
int f(void);
/* ... */
g(f);
Then the definition of g can read
void g(int (*funcp)(void))
{
  /* ... */
  (*funcp)(); /* or funcp(); */
}
or, equivalently,
void g(int func(void))
{
  /* ... */
  func(); /* or (*func)(); */
}

4.3. Lambda Expression Changes

4.3.1. Add lambda-expression to the postfix-expression grammar of §6.5.3.1

Syntax

¹ postfix-expression:

primary-expression

postfix-expression [ expression ]

postfix-expression ( argument-expression-list_opt )

postfix-expression . identifier

postfix-expression -> identifier

postfix-expression ++

postfix-expression --

compound-literal

lambda-expression

4.3.2. Add a new section §6.5.3.✨ "Lambda expressions" somewhere after §6.5.3.4 "Structure and union members", likely §6.5.3.5

6.5.3.✨ Lambda expressions
Syntax

lambda-expression:

terse-capture-clause attribute-specifier-sequence_opt parameter-clause_opt attribute-specifier-sequence_opt trailing-return-clause_opt function-body

terse-capture-clause:

[ capture-list_opt ]

trailing-return-clause:

-> type-name

Description

A lambda expression (or just "lambda") creates a closure definition that is immediately usable as an invocable. It is introduced and partly characterized by the terse capture clause, has parameters listed in the optional parameter clause, and has a surrounding scope of either:

the block scope at the lexical position within an invocable;

or, file scope.

Constraints

If a lambda or pointer to lambda is the first operand of the . operator or -> operator, respectively, the second operand shall only specify a capture name from that lambda’s capture clause as the identifier to designate a member.

If the trailing return clause is specified, the return type of the lambda is the specified type name. Otherwise, the return type is inferred from the body of the lambda expression. If the return type is inferred, then:

all return statements shall have an expression which is the exact same type of any one return statement’s an expression;

or, void if there is no expression for all the return statements or there are no return statements.

Semantics

The optional attribute specifier sequence in a lambda expression appertains to the resulting lambda type and to its function type. If the parameter clause is omitted, and parameter clause of the form () is assumed.
EXAMPLE In following lambda object initialization:
int main () {
  [[deprecated]] auto f = []() [[unsequenced]] {
    return 2;
  }
  return f();
}
[[unsequenced]] describes the properties of the invocable body through its its function type, while [[deprecated]] applies to the closure object f and a diagnostic is encouraged at the invocation of f in return f().
Similar to a function definition, a lambda expression forms a single block that comprises all of its parts. Each capture and parameter has a scope of visibility that starts immediately after its definition is completed and extends to the end of the lambda body. Captures and parameters are visible throughout the body of a lambda unless they are redeclared in an inner block within that lambda’s body.

Value captures have the same storage duration as the lambda. Value captures are initialized and formed during the evaluation of the lambda expression, and are tied to that specific lambda expression’s closure. Each invocation to the formed lambda creates a new instance of each parameter, similar to a function call. The layout of the storage for parameters is unspecified.

The behavior is undefined if the lifetime of the lvalue referred ends and the reference capture is used.
EXAMPLE Non-capturing lambdas can be immediately converted to function pointers, which makes them usable in functions such as qsort:
#include <stdlib.h>

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  qsort(list, _Countof(list), sizeof(*list),
    []() (const void* untyped_left, const void* untyped_right) {
      const int* left = untyped_left;
      const int* right = untyped_right;
      return *left - *right;
    }
  );
	
  return list[0]; // return 2;
}

4.4. Capture Function Changes

4.4.1. Add capture-function-declarator to the direct-declarator grammars of §6.7.7.1 "General" of "Declarators"

6.7.7.1 General

Syntax

...

direct-declarator:

identifier attribute-specifier-sequence_opt

( declarator )

array-declarator attribute-specifier-sequence_opt

function-declarator attribute-specifier-sequence_opt

capture-function-declarator attribute-specifier-sequence_opt

...

...

4.4.2. Add a new section §6.7.7.✨ "Capture function declarators" somewhere after §6.7.7.4 "Function declarators", likely becoming §6.7.7.5

6.7.7.✨ Capture function declarators
Syntax

capture-function-declarator:

direct-declarator parameter-clause function-capture-clause

function-capture-clause:

_Capture ( capture-list_opt )

Description

A capture function declarator declares a closure object. It is introduced similarly to a function declarator but is characterized by the function capture clause as well as its lexical position, and has a surrounding scope of either:

the block scope at the lexical position within an invocable;

or, file scope.

Constraints

The identifier declared in a capture function declarator (which is the name of the capture function) shall have closure type, as specified by the declarator portion of the capture function declarator.

A capture function object that is declared but not yet defined (6.9.✨) shall not have the member access operators . on the object or -> the a pointer to that object applied to it.

Semantics

If, in the declaration "T D1", D1 has the form

D parameter-clause function-capture-clause attribute-specifier-sequence_opt

and the type specified for the identifier ident in the declaration "T D" is "derived-declarator-type-list T", then the type specified for ident is "derived-declarator-type-list closure with function-capture-clause captures returning the unqualified, non-atomic version of T". The type is complete after the end of the function capture clause. The optional attribute specifier sequence appertains to the closure’s function type.

The second optional attribute specifier sequence after the parameter type list appertains to the closure object.
EXAMPLE In following capture function declaration and definition:
int main () {
  [[unsequenced]] int f () _Capture() [[deprecated]] {
    return 2;
  }
  return f();
}
[[unsequenced]] appertains to and describes the properties of the invocable body through its its function type, while [[deprecated]] appertains and applies to the closure object f and a diagnostic is encouraged at the invocation of f in return f().

4.4.3. Add capture-function-abstract-declarator to the direct-abstract-declarator grammars of §6.7.8 "Type names"

6.7.8 Type names
Syntax

...

direct-abstract-declarator:

( abstract-declarator )

array-abstract-declarator attribute-specifier-sequence_opt

function-abstract-declarator attribute-specifier-sequence_opt

capture-function-abstract-declarator attribute-specifier-sequence_opt

...

capture-function-abstract-declarator:

direct-abstract-declarator_opt parameter-clause function-capture-clause

Semantics

In several contexts, it is necessary to specify a type. This is accomplished using a type name, which is syntactically a declaration for a function or an object of that type that omits the identifier. The optional attribute specifier sequence in a direct abstract declarator appertains to the preceding array or function type. The attribute specifier sequence affects the type only for the declaration it appears in, not other declarations involving the same type.
EXAMPLE The constructions
(a)  int
(b)  int *
(c)  int *[3]
(d)  int (*)[3]
(e)  int (*)[*]
(f)  int *()
(g)  int (*)(void)
(h)  int (*const [])(unsigned int, ...)
(i)  int (*)() _Capture(&)
(j)  int (*const [])(unsigned int, ...) _Capture(=)
name respectively the types

(a) int,

(b) pointer to int,

(c) array of three pointers to int,

(d) pointer to an array of three ints,

(e) pointer to a variable length array of an unspecified number of ints,

(f) function with no parameters returning a pointer to int,

(g) pointer to function with no parameters returning an int, and

(h) array of an unspecified number of constant pointers to functions, each with one parameter that has type unsigned int and an unspecified number of other parameters, returning an int.

(i) pointer to closure which takes no parameters and default captures the current scope by reference, returning an int.
(j) array of an unspecified number of constant pointers to closures, which take an unsigned int, an unspecified number of other parameters, and default captures by value of the current scope, returning an int.
The constructions (i) and (j), while valid types, are nonsensical as the lexical position of the abstract declarator for such a types makes it unique, regardless of the similarity of captures and function types of the closure type. Such a type may not be meaningfully useful in the program’s text for interacting with closures and closure types.
NOTE As indicated by the syntax, empty parentheses in a type name are interpreted as "function with no parameters", rather than redundant parentheses around the omitted identifier.

4.4.4. Modify §6.8.3 "Compound statements" and add a new grammar production for "block-item"

6.8.3 Compound statements

Syntax

...

block-item:

declaration

function-definition
capture-function-definition

unlabeled-statement

label

4.4.5. Modify the title of §6.9 to feature more than external definitions to "Definitions" and add a new grammar production for "external-declaration"

6.9 Definitions~~External definitions~~

6.9.1 General

Syntax

translation-unit:

external-declaration

translation-unit external-declaration

external-declaration:

function-definition

capture-function-definition

declaration

4.4.6. Add a new section §6.9.✨ "Capture function definitions" somewhere after §6.9.2 "Function definitions", likely 6.9.3

6.9.✨ Capture function definitions
Syntax

capture-function-definition:

direct-declarator parameter-clause function-capture-clause attribute-specifier-sequence_opt function-body

Description

A capture function definition declares and defines an invocable of closure type. All of the description, constraints, and semantic requirements in 6.7.7.✨ apply to a capture function definition, with a few additional constraints and semantics as follows.

Constraints

The storage-class specifier, if any, in the declaration specifiers shall not be typedef.

If a capture function or pointer to capture functions is the first operand of the . operator or -> operator, respectively, the second operand shall only specify a capture name from its function capture clause as the identifier to designate a member.

Semantics

At block scope, a function definition is interpreted as a capture function definition with the empty function capture clause _Capture().
EXAMPLE In following capture function declaration:
int main () {
  [[unsequenced]] int f () _Capture() [[deprecated]] {
    return 2;
  }
  return f();
}
[[unsequenced]] describes the properties of the invocable body through its its function type, while [[deprecated]] applies to the closure object f and a diagnostic is encouraged at the invocation of f in return f().
Similar to a function definition, a capture function forms a single block that comprises all of its parts. Each capture and parameter has a scope of visibility that starts immediately after its definition is completed and extends to the end of the capture function’s body. Captures and parameters are visible throughout the body of a capture function unless they are redeclared in an inner block within that capture function’s body.

Value captures have the same storage duration as the capture function. Value captures are initialized and formed during the evaluation of the capture function, and are tied to that specific capture function’s closure. Each invocation of the capture function creates a new instance of each parameter, similar to a function call. The layout of the storage for parameters is implementation-defined.
#include <stdlib.h>

typedef int seven_fn_trampoline_t(void*);
typedef struct seven_fn_data {
  seven_fn_trampoline_t* f;
  void* p;
} seven_fn_data;

seven_fn_data make_seven (int x) {
  int y = 7;
  int seven_fn() _Capture(x, y) {
    return x * y;
  }
  typedef typeof(seven_fn) seven_fn_t;
  int seven_fn_trampoline(void* p) {
    seven_fn_t* seven_fn = p;
    return seven_fn();
  }
  seven_fn_data d = {
    .f = seven_fn_trampoline, // OK: closure literal
    .p = malloc(sizeof(seven_fn))
  };
  // simple assignment of closure into allocated storage
  *((seven_fn_t*)d.p) = seven_fn;
  return d; 
}

typedef int eight_fn_t();

eight_fn_t* make_eight () {
  int eight_fn () _Capture() {
    return 8;
  }
  return eight_fn; // OK: empty capture means closure literal
}

typedef int nine_fn_t();

nine_fn_t* make_nine () {
  int val = 30;
  int nine_fn () _Capture(val) {
    return val;
  }
  return nine_fn; // constraint violation: cannot convert
  // closure to function pointer
}

int main () {
  int x = 10;
  int zero () {
    // OK, no external variables used
    return 0;
  }
  int also_zero () _Capture() {
    // same as above, just explicit
    return 0;
  }
  int double_it () {
    return x * 2; // constraint violation
  }
  int also_wrong () _Capture() {
    return x * 2; // constraint violation
  }
  int triple_it () _Capture(x) {
    return x * 3; // OK, x = 3 when called
  }
  int quadruple_it () _Capture(&x) {
    return x * 4; // OK, x = 5 when called
  }
  int quintuple_it () _Capture(=) {
    return x * 5; // OK, x = 3 when called
  }
  int sextuple_it () _Capture(&) {
    return x * 6; // OK, x = 5 when caled
  }
  x = 1000;
	
  void* trampoline_data = nullptr;
  auto seven_tuple_it = make_seven(x);
	
  eight_fn_t* eight = make_eight();
  int result = zero()
    + triple_it() + quadruple_it()
    + quintuple_it() + sextuple_it()
    + seven_tuple_it.f(seven_tuple_it.d)
    + eight();
  // same as
  // int result = 17088;
  // 0
  // + (10 * 3) + (1000 * 4)
  // + (10 * 5) + (1000 * 6)
  //            + (1000 * 7)
  // + 8
  free(seven_tuple_it.d);
  return result;
}

5. Appendix

5.1. Accessing Context in Nested Functions

A newer paper by Dr. Martin Uecker discusses the various ways to access GNU Nested Functions and their potential future standardization ([n3654]). It addresses the executable stack / general-trampoline problem of GNU Nested Functions (by providing a wide function pointer type to get around it) before discussing various ways forward and various improvements around GNU Nested Functions, but takes a dissimilar approach to the one outlined in our proposal. We will go through the some of the sections in the paper and talk about how it differs from the approach this paper is going to take, and the criticisms it levies at the various aspects of other solutions such as Apple Blocks, Lambdas, GNU Nested Functions, and more.

5.1.1. §1 & §2

These are sections we agree with the most: the introduction of a wide function pointer type is necessary (§ 5.2 Wide Function Pointer Type), no matter which solution is picked. Wide Function Pointers are a unifying part to make C more of the appropriate "lingua franca" between languages. This proposal even agrees that naked, unadorned GNU Nested Functions can be introduced as part of C: however, the caveat would be that, insofar as the design in § 3.2 Capture Functions: Rehydrated Nested Function is concerned, it would produce a constraint violation to not appropriately capture any objects from the outside local scope that are used inside. Secondly, the use of it in [n3654]'s api_old would not be "implementation-defined", but rather a constraint violation that GNU (and other compilers) could turn into well-defined behavior. From §2:

typedef int (*cb_t)(int);
typedef int (*cb_wide_t)(int) _Wide;

void api_old_simple(cb_t cb);
void api_old(cb_t cb, void *data);
void api_new(cb_wide_t cb);

void example4()
{
  int d = 4;
  int bar(int x) {
    return x + d; // constraint violation: `d` not captured
  }
  int bar_fixed(int x) _Capture(&) {
    return x + d; // ok
  }
  api_old(bar, nullptr); // GNU extension, constraint violation in ISO C
  api_new(bar); // ok
}

NOTE: [n3654] does not seem to use its own API correctly, so the code above is not identical to what is in [n3654]: e.g. api_old is called in [n3654] with just bar and nothing else, leaving off the second required parameter.

Our hope is to fix that with § 5.3 Make Trampoline and Singular Function Pointers:

typedef int (*cb_t)(int);
typedef int (*cb_wide_t)(int) _Wide;
// or: typedef int (%cb_wide_t)(int);

void api_old_simple(cb_t cb);
void api_old(cb_t cb, void *data);
void api_new(cb_wide_t cb);

void example4()
{
  int d = 4;
  int bar(int x) { // GNU Extension, Nested Functions
    return x + d; 
  }
  int bar_fixed(int x) _Capture(&) { // (Proposed) ISO C, Capture Functions
    return x + d;
  }
  cb_t bar_fn_ptr = stdc_make_trampoline(bar); // Extension, but works
  cb_t bar_fixed_fn_ptr = stdc_make_trampoline(bar_fixed);
  // all ok now
  api_old_simple(bar_fn_ptr); 
  api_old(bar_fn_ptr, nullptr);
  api_new(bar);
  api_old_simple(bar_fixed_fn_ptr); 
  api_old(bar_fixed_fn_ptr, nullptr);
  api_new(bar_fixed);
  // trampolines must be freed
  stdc_destroy_trampoline(bar_fn_ptr);
  stdc_destroy_trampoline(bar_fixed_fn_ptr);
}

Individuals can rely on the GNU Nested Functions, but would have an explicit way to opt-in to get ISO Standard C behavior. We think this is a better path forward for harmonizing things, and would let the user be explicit about where and when trampolines (and their effects) are created/used.

5.1.2. §3

We agree with the premise of section 3, including of the way that any capture can be used with the "old" style of API, so long as it passes a userdata parameter:

typedef int (*cb_t)(int);
typedef int (*cb_wide_t)(int) _Wide;

void api_old(cb_t cb, void *data);

void example5()
{
  int d = 4;
  int bar(int x) { return x + d; }
  // static (capture-less) nested function
  static int trampoline(int x, void *ptr)
  {
    return (*(cb_wide_t)ptr)(x);
  }
  api_old(trampoline, &(cb_wide_t){ bar });
}

[n3654] then introduces a potential new keyword to capture what is, effectively, the current function frame and reuse it in the same place:

typedef int (*cb_t)(int);

void api_old(cb_t cb, void *data);

void example6()
{
  const int d = 4;
  // static chain passed via specified argument
  int bar(int x, void *data) _Closure(data)
  {
    return x + d;
  }
  api_old(bar, &_Closure(bar));
}

_Closure(bar) can effectively be seen as a signal to the implementation for the invocation of __builtin_frame_address, while _Closure(data) attached to the function definition is a directive to the compiler to use __builtin_call_with_static_chain. Semantically, the use of _Closure(data) on the definition ties the contents of the nested function to the surrounding scope from the perspective of whatever function definition _Closure is attached to. It’s a way to saying the surrounding scope is being provided by the void* argument (data in this case). This is mildly more type-safe than just a regular void* cast to a structure type inside. It is impossible to cast to the wrong type since it’s some (unnamed) type related to the current scope, and therefore the location provides the safety. It also offers a way to have two different closures use the same void* data, meaning that one could theoretically optimize a function taking 2 or three callbacks to have only one void* userdata-style parameter.

The problem with that is that assuming two or three callbacks all have the same environment or use the same userdata is, oftentimes, not a good idea. An example in the thrd_create_attrs_err proposed function (Thread Attributes); if the API were to assume that all three void* provided to this function can or should be the same, there could be many possible issues (thread of invocation does not match expectations, race conditions, having access to the wrong data, and more). So it’s unclear whether or not that would be good in general purpose, widely-adopted, or prolofic library interfaces.

NOTE: Folding together multiple nested functions to have similar closure data would certainly be useful for internal APIs where the caller of a specific API can make assumptions of how things work; but it does not hold up for external or uncontrollably-available APIs.

The final problem with this section is that it still assumes that the only kind of closure one would want is one that refers to variables in the current scope. This results in all the same problems documented in § 2.2.4 The Nature of Captures; undefined behavior, lifetime failures, and more. This could especially be the case for thrd_create_attrs_err, thread/worker pools, thread queues, and other asynchronous scheduling initiatives.

5.1.3. §4

This section introduces the concept of modifying how Nested Functions capture variables, recommending that some variables are captured by value inside of the data stored for a closure. The recommendation is that values that are const should be captured, while other mutable values are not. The examples do not seem to explain why capturing only const variables is helpful, as the primary reason to capture by value (particularly as Apple Blocks has explained (§ 2.3.2 Runtime Required)) is for safety in copying the closure to another location. The following example, using an old-style, void* based API for the purposes of copying, is given:

typedef int (*cb_t)(int);
void api_old_copy(cb_t cb, void *data, size_t data_size);

void example7()
{
  // const-qualified variables can be copied
  int (*const p)[10] = malloc(sizeof *p);
  if (!p) return;
  int bar(int x, void *data) _Closure(data)
  {
    return (*p)[x];
  }
  // sizeof can be used to obtain the required size
  api_old_copy(bar, &_Closure(bar), sizeof(_Closure(bar)));
}

We do not see how the const qualification helps in this scenario, and also note that this isn’t helpful for the vast majority of declarations and types that are non-const qualified. For example, if this pointer was not const qualified, a copy would have to be created solely for the purpose of capture:

typedef int (*cb_t)(int);
void api_old_copy(cb_t cb, void *data, size_t data_size);

void example7_modified()
{
  // non const-qualified variable is by-name
  int (*p)[10] = malloc(sizeof *p);
  if (!p) return;
  // change to `const` to enable capture
  int (const *cap_p)[10] = p;
  int bar(int x, void *data) _Closure(data)
  {
#if 0
    // dangerous -- may not exist
    return (*p)[x];
#else
    // not dangerous -- captured by value
    return (*cap_p)[x];
#endif
  }
  // sizeof can be used to obtain the required size
  api_old_copy(bar, &_Closure(bar), sizeof(_Closure(bar)));
}

One would need to form a copy of any mutable variable into a const one in order to ensure that it gets its whole value placed inside of whatever the implementation decides to place inside of _Closure. This is, in many ways, a by-proxy form of doing C++ Lambda captures or using __block in Apple blocks. At the very least with C++ Lambdas and Apple Blocks, their design is once again explicit; it allows the user to decide if something should be transported by-value, and then can be moved into a by-name state by the user. For 1980 direction, this requires duplicated variables, and given how _Closure works rather than the user stating their intent directly ("put this variable inside of this thing so I can carry it around in the manner of my chosing"), they have to instead contort their declarations to be const as a means of perhaps making safe access to these variables ("this variable is now const so it should be copied in, but anything else is implementation-defiend or something"). This is a roundabout way of just being clear about what is coming and going and what the properties of that thing are; we believe this to be infinitely less clear than erroring on something that is not captured and making the user specify explicitly.

Capture-by-const is not a useful or reliable scheme and would require users to contort their declarations for the sole purpose of making it work better with this new solution: we do not believe it to be a viable path forward.

5.1.4. §5 and §6

This is where the paper starts departing more strongly from what we believe to be the right direction. This section opens with a use of api_old_copy_del:

typedef int (*cb_t)(int);
void api_old_copy_del(cb_t cb, void *data, size_t size, void (*del)(void*));

void example8()
{
  int (*const p)[10] = malloc(sizeof *p);
  if (!p) return;
  int bar(int x, void *data) _Closure(data)
  {
    return (*p)[x];
  }
  // static nested functions acting as destructor
  static void del(void *_data)
  {
    // the structure type is visible at this point
    typeof(_Closure(bar)) *data = _data;
    free(data->p);
  }
  api_old_copy_del(bar, &_Closure(bar), sizeof(_Closure(bar)), del);
}

The problem is that there seems to be a limitation in how _Closure can be used; the assertion is that _Closure is meant to strongly mimic "call with static chain" and map entirely towards that. This is fine for that architecture, but it begs the question in this example: why is Closure(_data) in del not allowed like so?

typedef int (*cb_t)(int);
void api_old_copy_del(cb_t cb, void *data, size_t size, void (*del)(void*));

void example8_modified()
{
  int (*const p)[10] = malloc(sizeof *p);
  if (!p) return;
  int bar(int x, void *data) _Closure(data)
  {
    return (*p)[x];
  }
  // static nested functions acting as destructor
  static void del(void *_data) _Closure(data)
  {
    free(p);
  }
  api_old_copy_del(bar, &_Closure(bar), sizeof(_Closure(bar)), del);
}

One should be able to simply say that local variables can be looked up through whatever is given to _Closure. For example, if someone were to make a global void* variable and set it to the value, it would also be a viable way to saying "this is the function’s current frame / environment" without necessarily requiring that the function be explicitly used with a static chain:

typedef int (*cb_t)(int);
// `del` takes no void* now
void api_old_copy_del(cb_t cb, void *data, size_t size, void (*del)());

static void* my_env;
	
void example8_modified()
{
  int (*const p)[10] = malloc(sizeof *p);
  if (!p) return;
  int bar(int x, void *data) _Closure(data)
  {
    return (*p)[x];
  }
	
  static void del() _Closure(my_env)
  {
    free(p); // `p` is found because we have statically asserted
    // that the stack frame of `example8_modified`
    // comes from the variable `my_env`
  }

  my_env = &_Closure(bar); // get closure data pointer
  api_old_copy_del(bar, &_Closure(bar), sizeof(_Closure(bar)), del);
}

This is hinted at in §6 of the paper, but the chosen syntax and explanation uses a plain naked nested function that implicitly (?) knows the static chain without a void *data or void *_data argument:

typedef int (*cb_t)(int);
void api_old_copy_del(cb_t cb, void *data, size_t size, void (*del)());

void example9()
{
  int (*p)[10] = malloc(sizeof *p);
  if (!p) return;
  int bar(int x, void *data) _Closure(data)
  {
    return (*p)[x];
  }
  // nested function acting as destructor
  void del() { free(p); p = NULL; } // missing... `void*` and `_Closure`?
  // wrong function name, as well
  api_data_copy_del(bar, &_Closure(bar), sizeof(_Closure(bar)), del);
}

It’s completely unclear how del receives the environment for bar here: is it simply assumed that nested functions contained in the same scope all implicitly receive the environment? If so, how? And, importantly, how does an API compiled separately (e.g., as a DLL in a library) know to make the association between bar and del here? Is there something that needs to be done internally in api_data_copy_del (meant to be api_old_copy_del?) for this to happen?

Adjusting this to allow for a callback that takes a void*, AND making it static so that the environment can be shared while del is can be used as a normal function pointer with a shared environment would likely look more like this:

typedef int (*cb_t)(int);
// signature adjusted to allow for `void*` into `del` callback
void api_old_copy_del(cb_t, void *data, size_t size, void (*del)(void*));

void example9_modified()
{
  int (*p)[10] = malloc(sizeof *p);
  if (!p) return;
  int bar(int x, void *data) _Closure(data)
  {
    return (*p)[x];
  }
	
  static void del(void *data) _Closure(data)
  {
    free(p);
    p = NULL;
  }
  api_old_copy_del(bar,
    &_Closure(bar), // "environment" of `bar`
    sizeof(_Closure(bar)), // size for closure data to be copied in and survive
    del // `del` now appropriate is just a regular function pointer
  );
}

Whether the void* data is passed to the del callback or it comes from some other (_Thread_local or static) object, there’s some amount of potential for "can take a pointer and using the surrounding scope assert that it is some implementation-defined environment containing values for use". There’s nothing wrong with using the location of the nested function as a way of saying:

"any _Closure(some-void-ptr) represents the surrounding scope and should be used to look up identifiers if possible" (in the static nested function case);
OR, "any _Closure(some-void-ptr) represents a __builtin_call_with_static_chain (or Function Descriptor, or etc. etc. Implementation Decision Here) to appropriately call the function and set the necessary captures and data" (in the non-static nested function case).

But [n3654] has a hard time communicating that effectively, if that is indeed what it is trying to communicate at all.

NOTE: We are assuming this is what it means. This is why many of the code samples taken from the paper have been changed with the addition of _modified in the example function’s name.

We do not critique much of the rest of the paper because it is simply building on top of this API, but using partially related orchestrations for Polymorphic Types. We are not interested in what polymorphic types will or will not do for this, and it is outside the scope of what we care about for this.

Appendix B: C++ Lambda Quiz

The final problem of this proposal is in the appendices. We will start with Appendix B, which has a quiz formulated using C++ Lambdas and asking "what will it print?":

#include <stdio.h>

int j = 3;

int main()
{
  int i = 3;
  auto foo = [=](){ printf("%d\n", i); };
  auto bar = [=](){ printf("%d\n", j); };
  i = j = 4;
  foo();
  bar();
}

The answer is "3" and then "4" (https://godbolt.org/z/KW4j1zG93). Before we talk about the answer, we are going to compare this answer to what the answer would be with GNU Nested Functions and Apple Blocks. To start, let’s try this quiz with GNU Nested Functions:

#include <stdio.h>

int j = 3;

int main()
{
  int i = 3;
  void foo(){ printf("%d\n", i); };
  void bar (){ printf("%d\n", j); };
  i = j = 4;
  foo();
  bar();
}

The answer here is "4" and then "4" (https://godbolt.org/z/voWG3Gjo3). The file-scope variable still gives the same answer (because of course it does): the change here is in the local variable. As explained in the above introduction to GNU Nested Functions (§ 2.2.4 The Nature of Captures), it captures by-name, so the value is updated. This makes sense to what the expectation is. Apple Blocks behave differently:

#include <stdio.h>

int j = 3;

int main()
{
  int i = 3;
  auto foo = ^(){ printf("%d\n", i); };
  auto bar = ^(){ printf("%d\n", j); };
  i = j = 4;
  foo();
  bar();
}

The answer is now back to "3" and then "4" (https://godbolt.org/z/a9c79cjYb). That is because, as explained above, the default for Apple Blocks is capturing by-value (§ 2.3.3 Captures).

The implication of this quiz is that Apple Blocks -- the thing that has worked for the entirety of the Apple ecosystem -- is wrong and unexpected, and the GNU Nested Function behavior is correct and expected. It wouldn’t be a "Quiz", after all, if the answer was anticipated to be entirely normal. In the rush to make a point about captures doing certain things, [n3654] effectively called the entirety of the Apple Blocks ecosystem fraudulent in its expectations. That’s certainly a choice that can be made, but a more important point that overshadows this is that C++ Lambdas can have the same behavior as GNU Nested Functions:

#include <stdio.h>

int j = 3;

int main()
{
  int i = 3;
  auto foo = [&](){ printf("%d\n", i); };
  auto bar = [&](){ printf("%d\n", j); };
  i = j = 4;
  foo();
  bar();
}

This changes the answer to "4" and then "4" (https://godbolt.org/z/EW8PETdxz). What this means is that this Quiz -- when properly displayed next to its counterparts -- shows that C++ Lambdas can be naturally configured to work like Apple Blocks OR like GNU Nested Functions, at the cost of one (1) character change in its capture clause. We can imagine that a person writing from the perspective of Apple Blocks could present the preceding C++ Lambda that doesn’t have the same behavior they expect to be a "Quiz" that contains a big "Gotcha". The reality is that C++'s design can handle both defaults without compromising the ergonomics in any serious manner. The syntax for lambdas is, of course, "sinfully ugly" -- even C++ enthusiasts acknowledge this readily -- but the acknowlegement that there are engineering tradeoffs to be had -- and not things to poke fun at or make "gotcha"s out of -- is why the design is mature and useful.

In contrast, [n3654] proposes capturing and copying based on things such as whether or not a variable is const, which does not approximate how it works in any existing practice.

Appendix A: List of Issues with C++ Lambdas

Appendix A is a laundry list of issues with C++ Lambdas, in no particular order. Some of them are already addressed in the introduction of Lambdas (§ 2.4 C++-Style Lambdas), but in-general the list of issues has many flaws in its reasoning.

"Passing of a lambda as a regular parameter needs either trampolines or a new type."

Every single solution requires trampolines or a new type to be useful, GNU Nested Functions and Apple Blocks included. C++ did not have this problem because they have a much stronger base language that can do this as a library type: C needs a fundamental "wide function pointer" type no matter what (§ 5.2 Wide Function Pointer Type) and it needs trampoline-making functionality (§ 5.3 Make Trampoline and Singular Function Pointers).

Lambdas with lvalue capture suffer from the same lifetime limitations as nested functions. auto foo(int i) { return [&](){ printf("%d\n", i); }; }

This is discussed earlier in the introduction, but GNU Nested Functions have an identical problem. At the very least, Lambdas have a mechanism to stop this from being a problem: there is no built-in solution with GNU Nested Functions. Apple Blocks use an entire heap runtime and thus avoid this problem completely.

Not having destructors and smart pointers in C requires workarounds not needed in C++.

Not having explicit access to the structure holding the captured values requires unsafe byte copies, causes issues with alignment and makes deep copying impossible.

Everything in C is a byte copy, unless an explicit function is inserted to do something just before that byte copy happens. An example of this comes from Apple Blocks, with a required Block_release and Block_copy required to make usage of stored blocks safer (§ 2.3.2.1 More Complications: Generally Unsafe to Return). To uphold this as a problem is to lambast the entirety of C and its object model as unsafe; which, honestly, is not an unfair assessment. The fix to that is to restore access to the objects captured inside of an object, as shown in § 3.2.5 NEW: Data Captures are Accessible, which makes any capturing entity -- whether it’s a _Closure like in [n3654], Capture Functions as in this paper, or Lambdas -- accessible once more.

Making the lambda itself have a unique anonymous object type in C means it can only be invoked immediately which seems useful only in macros. In C++ it can be returned from and passed to template functions.

This criticism is partly untrue: complete objects in C can have their type retrieved with typeof. That means it can be cast/assigned into heap storage, copied around, and called just fine in certain contexts (c.f. the "static trampoline" technique mentioned both in the paper and used extensively in the example code above). We also already have the auto type-specifier: as regular complete objects, such types can be created and then stored in what already exists as a feature in C23. Macros are a foundational and important use case: it means that expressions being passed into function-like macros can be evaluated once, and only once, by being passed to an immediately-invoked lambda expression.

Storage outside of those contexts has to be powered by a wide function pointer type, and this proposal acknowledges that it will be necessary to solve that problem (but not in this proposal). GNU Nested Functions also need such a type, as do Apple Blocks (though they do come with their own Block type as well using the ^ syntax). This is also partly discussed in § 3.3.4 Trailing Return Types / Deduced Return Type. Actual returns were handled by [n2923] but only the auto for variable definitions was handled: functions was left for later, and there was initially consensus for something of this nature for the purposes of lambdas.

To address the various use cases, there are many different ways to capture variables [&], [=], [], [a], [&b], [a = b], mutable, etc. adding a lot of complexity that does not seem necessary.

The complexity is necessary, as demonstrated by the Quiz example in Appendix B: C++ Lambda Quiz: the fact that captures can be changed and are not dependent on unrelated properties like const-ness (as in [n3654]) is how it successfully blends into e.g. the GNU ecosystem or the Apple ecosystem without breaking either of them. The complexity is inherent to the problem domain: glossing over it like GNU Nested Functions does limits whether or not this can successfully be deployed to replace Apple Blocks as an ISO C Standard solution.

Value captures can be confusing as they shadow the original variable under same name.

It is unclear how captures in which the user makes an explicit choice can be more confusing than one where the user has no choice but the behavior changes. We have already established this in the Apple Ecosystem perspective with Blocks versus the GNU Nested Functions perspective in Appendix B: C++ Lambda Quiz; switching from one to another can result in bugs when people do not expect the default capturing style to change. Being explicit means nobody is surprised, and having renames prevents shadowing confusion: but it all has to be the user’s choice.

Other features from C++ may need to be pulled in from C++ to make them fully useful, such as trailing return types, return type deduction, and generic arguments.

Trailing return types enhance what is capable, but are not strictly required (§ 3.3.4 Trailing Return Types / Deduced Return Type). Generic arguments are the one part that was opposed for inclusion in C23 and did not have consensus; it was, in fact, generic arguments that served as one of the primary reason Gustedt’s Lambdas were completely and utterly tanked. This proposal does not use it and the design below does not require it to be useful, especially as anything relating to an immediately-invoked lambda in a macro can be covered by the necessary typeof(...) from C23.

Adopting lambdas from C++ would limit [our] design freedom relative to C++. We can not easily change specific aspects when it might be better for C, because it would then be a divergence from C++ that should be avoided and will be opposed by implementers.

As a sole solution, certainly. But this proposal provides Capture Functions (§ 3.2 Capture Functions: Rehydrated Nested Function) as the flagship proposal for C, and maintains 1:1 identical capabilities with the proposed secondary Lambda part (§ 3.3 Lambdas). There are also other reasons as to why having lambdas is good (particularly, for macros and for use as an expression). But, the general improvement here is that one can have Capture Functions that are rooted in C history and C syntax and C needs, while maintaining just enough of Lambdas that serve as a compatibility layer.

The fact that this paper is already proposing a departure from C++ for C lambdas and Capture Functions by having accessible data captures already shows we have the power to improve on things in C’s favor, if we’re willing to hold onto that (§ 3.2.5 NEW: Data Captures are Accessible).

5.1.5. Insufficient

Given the state of [n3654], it seems like it has not sufficiently explored the consequences or implications of its proposed design, nor grounded it in sufficient existing practice for us to consider yielding to its principles. That does not mean all of the ideas are bad. In the above sections, after we repair some of the broken examples, there is clearly some potential in _Closure and the idea of an "environment" pointer. There is also perhaps merit in having a pointer that ties a specific function frame to a specific function call so that variable lookup that does not find a local variable can look in the "environment"/"_Closure" first before checking further surrounding variables (e.g., file-scope or static objects). But that is a separable problem -- and a lower-level problem -- that the tying of "function and its associated data".

While certain tweaks that give more direct access to the nested function and its code can improve benchmark metrics (§ 3.4 Measuring Solution Spaces), it is unclear at this time if this achievable for all implementations and not just a select few powerful optimizers. It also does not solve the capture issue, and does not allow access to the variable stored inside the type (especially if the Nested Functions gets type-erased behind a normal function pointer or a wide function pointer). These are not the case for types that are mandated to be different and unique by the standard for Capture Functions and Lambdas, and it allows us to use normal . notation to access stored captures by their name.

A wide function pointer type (§ 5.2 Wide Function Pointer Type) would be a far better pursuit, separately, even if none of the solutions here or in other proposals are achieved.

The paper also seems to be driven, largely, by three things:

animus and disdain for C++ Lambdas and their design;
a desire to formalize the (potentially advanced) uses of __builtin_call_with_static_chain;
and, a strong preference for all of the design decisions of GNU Nested Functions.

It offers a "we can do these small things first" and then presents a wider narrative around how to handle the "data" part of "Functions with Data". Having a standardized solution that is less powerful than all of C++ Lambdas, GNU Nested Functions, Apple Blocks, and Jens Gustedt’s proposed C Lambdas does not seem like a good or useful starting point. As much as WG14 as a Committee has many members that continue to extol the virtues of being slow, we believe that there has been significant existing practice and useful explanations of designs to move forward with something much more comprehensive and robust. Giving in to the temptation of "simplified GNU Nested Functions" with a somewhat incomplete and incoherent design plan based around the idea of _Closure after 30+ years of design work in this space from directly-related and applicable languages is not something we consider a good use of time.

We do not comment on the Polymorphic Types API because that is beyond what we consider the useful scope of what can or should be addressed in our current proposal.

5.2. Wide Function Pointer Type

[n2862], by Dr. Martin Uecker, is already looking into standardizing a wide function pointer type. A wide function pointer type is necessary in the general-purpose ecosystem, but isn’t directly required to be tied to this proposal. Because it is a smaller entity, it can be put directly into the standard separately. We hope it’s explored that rather than using _Closure(function-type) or function-type _Wide syntax, that function-type% is deployed as a usable syntax instead. This would simplify its use and its introduction:

typedef int foo_fn_t(int);

foo_fn_t% call_me (int* x) {
  return [x](int y) { return *x + y; };
}

int use_me(foo_fn_t% fn) {
  return fn(2);
}

int main () {
  int x = 30;
  return use_me(call_me(&x));
}

In the above example, foo_fn_t% can be replaced with _Closure(foo_fn_t) or foo_fn_t _Wide; we prefer the former than the two latter for obvious grammatical and ease-of-use reasons. Most importantly, there is a canonical and viably implementable conversion path for not only whatever is standardized in ISO C, but all of the existing extensions such as Blocks, Nested Functions, C++ Lambdas, and language-external closure types.

NOTE: The caret (^) cannot be used for this purpose thanks to Apple and Objective-C/Objective-C++ taking that design space.

NOTE: The percent sign (%) does not conflict with Managed C++/CLI ref declarations that use % because naked % can only be applied to "value types" -- that is struct types. There is no callback type that fits this description in the garbage-collected .NET imperative language universe (C# or Managed C++/CLI); all callback types are declared as either raw function pointer types or class-based, "reference type" delegates in Managed C++/CLI.

5.3. Make Trampoline and Singular Function Pointers

In the later examples in § 2.2.3 Alternative Nested Function Implementations, a magic compiler builtin named __gnu_make_trampoline, with a secondary follow-on builtin named __gnu_destroy_trampoline, is used. This section talks about what that would look like, if it was to be implemented. In particular, an ideal solution that makes a trampoline needs to be an explicit request from the user because:

you want to opt-in to any dynamic allocations;
you want to provide a way to override the default allocation if possible;
and, you explicit control on when those resources (the allocation, the protected memory, and similar) are released.

While this section was spawned from GNU Nested Functions, this same technique can be used to make possible single function pointer trampolines for Blocks with or without captures (§ 2.3 Apple Blocks) as well as C++-style Lambdas (§ 2.4 C++-Style Lambdas).

Therefore, the best design to do this would be -- using the [_Any_func]* paper and its new type -- the following:

typedef void* allocate_function_t(size_t alignment, size_t size);
typedef void deallocate_function_t(void* p, size_t alignment, size_t size);

_Any_func* stdc_make_trampoline(FUNCTION-WITH-DATA-IDENTIFIER func);
_Any_func* stdc_make_trampoline_with(
  FUNCTION-WITH-DATA-IDENTIFIER func,
  allocation_function_t* alloc
);

void stdc_destroy_trampoline(_Any_func* func);
void stdc_destroy_trampoline_with(_Any_func* func, deallocate_function_t* dealloc);

stdc_make_trampoline(f) would use some implementation-defined memory (including something pre-allocated, such as in Apple blocks (§ 2.3.5 (Explicit) Trampolines: Page-based Non-Executable Implementation)). The recommended default would be that it just calls stdc_make_trampoline_with(f, aligned_alloc). stdc_destroy_trampoline(f) would undo, exactly, what stdc_make_trampoline would give. The recommended default would be that it is identical to stdc_destroy_trampoline_with(f, free_aligned_size). Providing an allocation and a deallocation function means that while the implementation controls what is done to the memory and how it gets set up, the user controls where that memory is surfaced from. This would prevent the problem of the Heap Alternative Nested Function implementation: rather than creating a special stack or having to rely on memory allocation functions, the compiler can instead source the memory from a user. This also makes such an allocation explicit, and means that its lifetime could be Though, given our memory primitives, a slightly better implementation that would allow the implementation to take care of (potentially) extra space handed down by alignment and what not would be:

struct allocation { void* data; size_t size; };
typedef allocation allocate_function_t (size_t alignment, size_t size);
typedef void deallocate_function_t (void* p, size_t alignment, size_t size);

_Any_func* stdc_make_trampoline(FUNCTION_TYPE func);
_Any_func* stdc_make_trampoline_with(FUNCTION_TYPE func, allocation_function_t* alloc);

void stdc_destroy_trampoline(_Any_func* func);
void stdc_destroy_trampoline_with(_Any_func*, deallocate_function_t* dealloc);

Regardless the form that the make/destroy functions take, this sort of intrinsic would be capable of lifting not just a typical GNU nested functions but all types of functions to be a single, independent function pointer with some kind of backing storage. Some desire may still exist to make the allocation and deallocation process automatic, but that should be left to compiler vendors to decide for ease-of-use tradeoffs versus e.g. security, like in § 2.2.2 Early Design Flaw: Nested Functions turn the stack Executable!.

It should be noted that Apple itself already has a version of this with this Objective-C Blocks Implementation ([objective-c-block-trampoline]), albeit with limitations discussed in § 2.3.5 (Explicit) Trampolines: Page-based Non-Executable Implementation. GCC does not expose an intrinsic for this per-se, but does provide __builtin_call_with_static_chain (GCC Documentation: Builtin Call with Static Chain). One can build a trampoline mechanism overtop of that, provided they had the properly-created function plus the right stack frame / "environment" chain pointer to go with the function callable. Since C++ Lambdas -- and the proposed Capture Functions and C-Style Lambdas here -- are by themselves Complete Objects, one can always create a "thunk" or "trampoline" for them manually, using a wide variety of allowable techniques from heap allocation to pre-stored arrays to _Thread_local/static data or otherwise. C++ could implement stdc_make_trampoline entirely as a library function, but C cannot; so, this is something vendors will have to figure out on their own.

The only part that needs to be user-configurable is the source of memory. Of course, if an implementation does not want to honor a user’s request, they can simply return a (Any_func*)nullptr; all the time. This would be hostile, of course, so a vendor would have to choose wisely about whether or not they should do this. The paper proposing this functionality would also need to discuss setting errno to an appropriate indicator after use of the intrinsic, if only to appropriately indicate what went wrong. For example, errno could be set to:

ENOMEM: the allocation function call failed (that is, alloc returned nullptr).
EADDRNOAVAIL: the address cannot be used for function calls (e.g., somehow being given invalid memory such as an address in .bss).
EINVAL: func is a null function pointer or a null object.
EACCESS: the address could be used for function calls but cannot be given adequate permissions (e.g., it cannot be succesfully mprotectd or VirtualProtectd).

to indicate a problem. Albeit, there are always complaints about errno, so it may also be possible to take an int* p_errcode parameter in the make_trampoline functions, and use that as a means of solving the problem (or swap the return type and the error code parameter to return the error code and output into an _Any_func*). The API design possibilities are, really, endless.

5.4. Executable Stack CVEs

THIS SECTION IS INCOMPLETE.

The following CVEs are related to executable stack issues.

N37XAFunctions with Data - Closures in C (A Comprehensive Proposal Overviewing Blocks, Nested Functions, and Lambdas)

Published Proposal, 2026-01-07

Abstract