N3657
Functions with Data - Closures in C (A Comprehensive Proposal Overviewing Blocks, Nested Functions, and Lambdas)

Published Proposal,

Previous Revisions:
None
Authors:
Paper Source:
GitHub
Issue Tracking:
GitHub
Project:
ISO/IEC 9899 Programming Languages — C, ISO/IEC JTC1/SC22/WG14
Proposal Category:
Change Request, Feature Request
Target:
C2y

Abstract

Nested Functions (GCC), Blocks (Clang & Apple-derived compilers), Wide Function Pointers (Borland and existing C library functions), and Lambdas (C++) provide a series of takes on how to, effectively, bundle functions with data in different ways and transport that information to the caller. This proposal goes through the existing practice and enumerates their tradeoffs so as to propose the best possible solution to the problem space in C, prioritizing C-like syntax and avoiding pitfalls from the existing solutions.

1. Changelog

2. Revision 0 - July 24th, 2025

3. Introduction and Motivation

A colloquial overview (but with a bit less technical depth) of these options is available as a writeup here ([lambdas-nested-functions-block-expressions-oh-my]), though it was written in 2021 before Heap-based trampolines existed and some of the other proposals discussed here existed. We keep it as a gentler, milder introduction to the problem space written with a much less serious prose.

C has had an extremely long and somewhat complicated history of wanting to pair a set of data with a given function call. Early problems first started with the standardization of C89’s qsort, which only took a single function pointer argument and no way to pass data through for additional constraints:

void qsort(
  void* ptr, size_t count, size_t size,
  int (*comp)(const void* left, const void* right)
);

This worked, until -- occasionally -- people wanted to provide modifications to certain behaviors based on local data rather than static data. For example, to modify the sort order of a call to qsort, one would originally have to program it like this in standard C:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

static int in_reverse = 0;

int compare(const void* untyped_left, const void* untyped_right) {
  const int* left = untyped_left;
  const int* right = untyped_right;
  return (in_reverse) ? *right - *left : *left - *right;
}

int main(int argc, char* argv[]) {
  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare);
	
  return list[0];
}

There were multiple limiting factors in getting data into the function outside of using static. Accessing data in the local block was impossible for a compare function that, necessarily, had to be defined outside of main. This necessitated a way of transporting data to the compare function to do the work in a way that would work for qsort. Since it offered no way to transmit user data parameter, other forms of data transfer became commonplace:

Still, for all of its benefits and ease-of-use, these techniques are not perfect. Concerns and problems around (potentially false) sharing data only grew in time as each program had to manage larger and larger swaths of 1980’s-style global variable soups, something that has famously ended up being used as a sign of negative code quality in legal matters. This particular problem was combatted by "reentrant" functions or "reentrancy" requirements, which is where the family of qsort_s and qsort_r-style functions (from Annex K implementations or BSD libraries) came from:

void qsort_s(
  void* ptr, size_t count, size_t size,
  int (*comp)(const void* left, const void* right, void* user),
  void* user
);

void qsort_r(
  void* ptr, size_t count, size_t size,
  int (*comp)(const void* left, const void* right, void* user),
  void* user
);

And they are used like so:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int compare(const void* untyped_left, const void* untyped_right, void* user) {
  const int* in_reverse = (const int*)user;
  const int* left = untyped_left;
  const int* right = untyped_right;
  return (*in_reverse) ? *right - *left : *left - *right;
}

int main(int argc, char* argv[]) {
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  qsort_r(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare, &in_reverse);
	
  return list[0];
}

While this example just has a single int as the type, other instances of using callbacks in this manner have resulted in Type Confusion bugs, where the void* pointer is cast to the wrong type or the wrong callback is used in conjunction with that void* callback type. The lack of type safety occasionally bites people, and given that it’s ferreted through a void* it’s hard to tackle this problem when dozens of little "helper" functions have to litter source code at file-scope. Finally, we add this frivolous example of ISO C (that does not make much sense upon first read):

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

int compare(const void* untyped_left, const void* untyped_right, void* user) {
  const int* in_reverse = (const int*)user;
  const int* left = untyped_left;
  const int* right = untyped_right;
  return (in_reverse) ? *right - *left : *left - *right;
}

compare_fn_t* make_compare(int argc, char* argv[], int* in_reverse) {
  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        *in_reverse = 1;
      } 
    }
  }
	
  return compare;
}

int main(int argc, char* argv[]) {
  int in_reverse = 0;

  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t* compare = make_compare(argc, argv, &in_reverse);
  qsort_r(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare, &in_reverse);
	
  return list[0];
}

The point of the above code is to show that it’s legal to return a function pointer to a normal function call, because it has a duration for the lifetime of the program (static storage duration, basically). This serves as a good proxy for the various Closure types we will be looking at: can they be returned safely from a function? Is there a way to make it return safely from the function? If we didn’t pass &in_reverse in as an argument, but created a local variable inside of make_compare, can it survive the return? These are all important questions to answer, especially as global variables became more discouraged towards C95 and C99 to prevent unintended data clobbering or sharing, and as reentrancy started to become more important. Because of void*’s lack of type safety which enabled unfortunate type confusion bugs, sharing issues, and the lack of locality of the writing of functions, a new extension was cooked up by GCC to handle this program. Similar to Ada and Algol features but spun directly for C at the time, it was a feature lovingly dubbed Nested Functions.

3.1. Preamble

Before we talk about GNU Nested Functions, Apple Blocks, C++ Lambdas, Literal Functions, or anything else, there’s a few examples similar to the qsort one that will be used above. The purpose of these examples will be to talk about:

The one returning a function pointer seems utterly frivolous in the above for make_compare, but is meant as a proxy to determine how to handle dynamic lifetime of functions, where a closure needs to outlive the scope in which it was created or returned. This serves as a stand-in for code similar to returning things "up" the stack, or where the closure is meant to be invoked at some later point like in asynchronous code. That is how the current landscape is going to be evaluated.

To aid in this evaluation, in the final section of this introduction, we will be filling out this table of features to see what each ecosystem brings to the table in terms of features. There are explanations below for each one in the first column:

Feature GNU Nested Functions Apple Blocks C++-Style Lambdas in C Literal Functions
Capture By-Name
Capture By-Value
Selective Capture
Safe to Return Closure
Relocatable to Heap
(Lifetime Management)
Usable Directly as Expression
Immediately Invokable
Convertible to Function Pointer
Convertible to "Wide" Function Type
Access to Non-Erased Object/Type
Recursion Possible

3.2. GNU Nested Functions

Nested Functions ([nested-functions]) were the logical extension of function definition syntax pulled down into the local level. Despite being the oldest functions-with-data attempt (30+ years), it only has one proposal by Dr. Martin Uecker done recently ([n2661]) and is not widely adopted as an extension across C compilers. The goal was very simple, and during the C89 timeframe eminently doable due to the absence of a deep understanding of security implications for machines not yet being actively exploited in both civilian and military contexts. The syntax of a function definition within a block scope created a function that could be called in the obvious way but still reference surrounding block scope objects by name:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int main(int argc, char* argv[]) {
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
  int compare(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  }
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare);
	
  return list[0];
}

The notable benefits of this approach were:

Unfortunately, the early and enduring design of this feature -- in order to enable some of the benefits listed above -- very quickly ran afoul of early security concerns, and soon earned itself a big CVE due to the way it worked. In particular, the implementation strategy for a nested function is a brilliant piece of engineering that runs afoul of one of the only enduring security mitigations that have not fallen by the wayside: Non-Executable Stacks. To understanding why this matters, a brief description of what (Non-)Executable Stacks are, and why they are important.

3.2.1. Non-Executable Stacks

In C -- using common practice to leverage lots of hot-patching, assembly hot-fixing, direct opcode injection, and more -- programs frequently made use of stack-based data buffers where they also dumped their programs. This meant that binaries would read from their stack with the instruction pointer, allowing programs to dynamically inject behavior into a currently running program either in parts or -- if that data came from outside sources -- a fully dynamic sort of live-machine coding. The problem with this approach was that if a normal user could use the stack to make the program do a set of behaviors, so could a malicious actor: this was the easiest and widest branch of attacks upon C programs, and it was an endemic issue given the extremely large number of stack-based, fixed-sized buffers or -- after the advent of C99 -- variable-length arrays. Commandeering programs was as easy as finding a place where naïve programmers and hackers employed the now-deprecated gets, or where input was loaded in a too-relaxed way into a stack buffer. Overrunning the buffer and gaining control of the program by getting a jmp instruction to jump not back to some expected place in the stack but instead to a piece of a new program written by malicious inputs into the program made it easy to exploit C programs day in and day out, over and over again.

After quite a few attempts and retries on mitigations and a several years of evolution, one of the lowest-hanging and easiest mitigations for a large class of direct attacks using stack buffers was to simply make the stack non-executable. This meant directly writing shellcode and jumping to it was no longer a valid way to attack many programs, and as a simple mitigation it has endured as one of many different mitigation techniques that prevents exploits. Executable stacks are, to this day, still one of the easiest-to-exploit properties of programs on a large share of modern computing platforms.

NOTE: This does not mitigate ALL stack-based exploits completely. It just eliminated the bottom-of-the-barrel, 40% ones; instead, folks now had to use what was already on the stack in terms of data to try and trick the program’s own logic to enter an unexpected state and, summarily put it into a vulnerable position. The kickoff from that vulnerable position to either abuse another vulnerability (e.g. an improperly checked heap pointer) can then be elevated to the point of illicit code execution ([solar-non-executable-stack-exploits]). Both formally and informally, this is known as Return-Oriented Programming (ROP).

Most programs, whether using Variable of Fixed-Length Arrays, loved to keep (small)-ish buffers on the stack that they constantly wrote data into in response to user input or read data (such as configuration files or network input). Preventing the ability of a poorly written program that did not guard against all forms of malicious input from turning into an easy Remote Code Execution issue was a gigantic security win. The overwhelming majority of exploits running at the time were effectively halted, even if they contained the perfect shell code to do so.

NOTE: Similar exploit-prevention techniques such as inserting security cookies on the stack, using Address Space Layout Randomization (ASLR), Control-Flow Integrity (CFI) checks, and more have also been developed further to shore up other issues from exploitation and vulnerabilities in C code since then, after ROP, Return to Libc, Heap-based Exploiters (Write-What-Where primitives), and Type-Confusion, became the new dominating exploit techniques.

This means that as a "table stakes" or entry-level bit on security, avoiding an Executable Stack is important for most C features.

3.2.2. Early Design Flaw: Nested Functions turn the stack Executable!

The original design of Nested Functions suffered for its compact and brilliant design, unfortunately. Let’s remind ourselves of the previous example relating to qsort: somehow, without marking the variable as static or _Thread_local, a nested function is able to access all of the object by name from its enclosing scope.

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int main(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
  int compare(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    /* HOW is `in_reverse` usable here?! */ 
    return (in_reverse) ? *right - *left : *left - *right;
  }
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare);
	
  return list[0];
}

The secret of this is in the brilliance of the original design: since it was so commonplace at the time, Nested Functions decided that the best way to be able to find the variables in the enclosing scope was to take a location associated with the enclosing block scope -- the Stack -- and turn it into an executable piece of code. This executable piece of code had an address. Because the executable code and its address came from the program’s stack, it meant that there was no need to provide a pointer to do reference-based / "by-name" object capturing. The function pointer itself served as both the jump to the code to execute, and a location that could then have a number of pre-determined, fixed offsets applied to it to reach all of the variables in that piece of code. This meant that rather than needing an object with static storage or _Thread_local storage, the block-scope variable could be accessed directly!

Unfortunately, this required the stack, still, to be executable in order to function.

This is the critical issue that has resulted in most other vendors not picking up the feature. Compilers want to work flawlessly with GCC-compiled code: this means that compiling a nested function and passing it to a function pointer has to result in a (somewhat) similar Application Binary Interface (ABI) that can be used as applications expect. This means the decision to make GCC nested functions trampoline from the stack is a non-ignorable detail. This is the primary reason Clang and other vendors have refused to implement Nested Functions, chiefly citing security concerns and the inability to develop a worthwhile ABI that is compatible. Just how prevalent are the security issues that Clang and other vendors avoided by refusing to implement the GNU Nested Functions extension? Well...

3.2.2.1. The Prevalence of Executable Stack, and GNU Nested Functions

A ton of users relied on or otherwise kept using Executable Stacks, even into today. While the non-executable stack fix was deployed to great success in the early 2020s, some platforms -- particularly, Linux and similar POSIX-based platforms -- kept this up. This clashed with users who became far too comfortable with the issues related to Executable Stack:

Year of Linux desktop will never happen as long as glibc is de-facto standard libc. As far as I know, this regression is wontfix, so many games just won’t work anymore.

Imagine being a game dev from 5 years ago and making a native Linux version of your game in a good faith, that’s 100% a net-negative money-wise, only for it to stop working in a couple years. Backward compatibility on Linux does not exist as far as regular users concerned, and the only way to make software that works in years to come is to make it for Windows, and hopefully let Valve and Wine teams handle the rest.

-- February 19, 2025, Valentin Ignatev

Briefly ignoring the extreme hyperbole: note the dates and the mentioned time of this tweet. 5 years ago; e.g. developed in 2020, with the complaint happening in 2025. The tweet included a screenshot from the article titled "The glibc 2.41 update has been causing problems for Linux gaming" ([gamingonlinux-dawe]). Windows and MacOS have been disabling executable stacks globally, by default, for years and refusing to load such applications. This complaint happened when glibc stopped forcefully setting executable stack, even if the tweet that garnered the public push back did not mention WHY these games were breaking. This talks to the severe need for a solution in this space that is not security breaking, and even telling folks to simply "turn on executable stack" for themselves is not wonderful:

... Don’t just take my advice on it though if you’re a developer or gamer reading this, always look up what you’re doing fully. Run at your own risk. ...

-- February 13, 2025, Article Author Liam Dawe

Unfortunately, this lax approach to security -- especially for video games -- has resurfaced quite a few truly unfortunate bugs. Popular Windows games such as Dark Souls: PREPARE TO DIE Edition and Call of Duty: WWII have exploitable Remote Code Execution (RCE) bugs in their code. One of them seems to be getting patched, but the other -- in Dark Souls -- is not going to be patched out. The idea that glibc -- or platforms in general -- can take a lax position to security on devices, even for something as "frivolous" as the (multi-billion dollar revenue) video game industry is simply not tenable. From a highly skilled Rendering Engineer commenting on the controversy caused by the glibc 2.41 update:

glibc is in the right here. iirc windows and mac DEP policy disabled executable stack by default for the past ~20 years or so. shocked this was not already the case in linux userland

-- February 21, 2025 A. W. R.

The engineer in charge of pushing the change and looking over the situation commented on the article and general attitude represented by Mr. Ignatev:

It is interesting that the headline did not get into details why I made this change: https://sourceware.org/pipermail/libc-alpha/2024-December/163146.html

In a short: the old behavior was used in a know RCE described in CVE-2023-38408.

-- February 20, 2025, Adhemerval Zanella

It is important to know that many other developers do not share this perception. Even as far back as 2018, when Microsoft kept up its Security Posture in its Windows Subsystem for Linux, people railed against the high-security default of refusing to work with programs that allowed executable stack ([WSL-no-executable-stack]):

The accepted trade-off is to have a non-executable stack be default but have an executable stack for programs which need them. Not supporting this is just a deficiency in WSL.

NOTE: This does not necessarily mean that Dr. Uecker wanted an executable stack in all programs. It’s just that for backwards compatibility purposes, it should be allowed to happened rather than fully banned, which is the opposite opinion of how WSL1, SELinux, several BSDs, and many other operating system loaders work.

Whose perspective is correct?

3.2.2.2. The Standard Is To Blame

This proposal agrees with both Mr. Zanella and Ms. A.W.R., and disagrees with Dr. Uecker and Mr. Ignatev; it is impossible to pretend like this is not a problem with more and more exploits taking advantage of not only executable stack, but directly targeting "harmless" software that makes use of it (such as video games). But, even more importantly, the real culprit here is ISO C. There were many alternatives to executable stack-based GNU Nested Functions, and other such entrapments. However, because of how convenient GNU Nested Functions are and how accepted they are in the GNU ecosystem, it has led to multiple security vulnerabilities that should have never existed in the first place (§ 6.4 Executable Stack CVEs).

Unlike Address Space Layout Randomization (ASLR) and several other run-time mitigations, non-executable stacks have been both the cheapest and most enduring security wins in the last 50 years of computing. Marking sections of memory as unable to be run as part of the program permanently shifted the landscape of targeted exploits to be focused almost exclusively on buffer overrun-into-heap-exploitation bugs or through ROP gadgets, as well as trying to find sequences of logic to put programs in a state of disrepair that ultimately grant attackers either a Denial of Service (DoS) attack or full control VIA Remote Code Execution (RCE). Exploits were now exceedingly harder and required orchestration of multiple carefully-crafted scenarios to hit the "Weird Machine" state so coveted by exploiters. This is not the case for executable stack.

3.2.3. Alternative Nested Function Implementations

We have already discussed the first and most popular attempt which leaves the stack executable. Because of the negative security properties of this, there were two more attempts, one still on-going (§ 3.2.3.2 Attempt 3) and one with limited success that requires the entire world to be recompiled or face potential ABI breaks (§ 3.2.3.1 Attempt 2).

3.2.3.1. Attempt 2

An Ada-style Function Descriptors implementation was attempted. The problem with this change for GCC is that it uses a bit (the lowest bit, which traditionally is always 0) in the function pointer itself to mark the function pointer as one that is relying on the Function Descriptor technology. Setting the lowest bit on the function pointer means that it is unsafe to call directly, and therefore every function call must first be masked with func_ptr & ~0b1ull before being called. This is a runtime cost and a general-purpose pessimization that applies to ALL function pointers, making the Function Descriptor approach unsuitable for solving the ABI problem both internally with existing GCC code and to the satisfaction of other developers.

3.2.3.2. Attempt 3

A third attempted implementation of Nested Functions attempts to use a separately-allocated trampoline. It can come from either: a stack that is set up at program load time in coordination with the compiler and whose exclusive purpose is to be a memory region for both trampolines and a slot for a void* environment/context; OR, a dynamic allocation that serves as the function pointer plus a void* environment/context pointer. These approaches simply do not work in the general case because it is unclear when, if ever, the function pointer will stop being used. However, one part of Nested Functions -- the fact that they refer to everything by name / "capture by reference" all of the things they use -- means that this can be sufficiently approximated by simply stating that GNU Nested Functions will deallocate such a trampoline (or shrink the stack it was cut from) when the enclosing block that was used for the nested function is gone. (Of course, this means that the function pointer will exhibit the exact same lifetime issues as with the current stack, so it solves some problems but leaves others on the table.)

There is also the slight issue that using a separated trampoline that is on a separate heap or a separate stack might need (but is not necessarily required) a secondary level of indirection. The original, executable stack implementation of GNU Nested Functions prevents this because both the code and the variables are in-line: having a separated trampoline may require such a trampoline to first load the right function call with __builtin_call_with_static_chain compiler intrinsic within the code of the trampoline to have the code work properly.

NOTE: As of early 2025 in GCC 14, GCC provided a heap-based implementation that got rid of the executable stack. This requires some amount of dynamic allocation in cases where it cannot prove that the function is only passed down, not have its address taken in a meaningful way, or if it is not used immediately (as determined by the optimizer). It can be turned on with -ftrampoline-impl=heap.

3.2.3.3. A Possible 4th Attempt: Explicit User Control

An experimental technique for allocating a trampoline can be done by having an _Any_func* pointer, as is being standardized by using an in-progress proposal [_Any_func]. Then, rather than needing to implicitly create a trampoline on usage, a user can instead request it and control the allocation explicitly, while passing it back. A fictional example of such an intrinsic -- called __gnu_make_trampoline -- is seen here:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int main(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
  typedef int compare_fn_t(const void* left, const void* right);

  int compare(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  }
  // explicitly make a single-function-pointer trampoline, without an executable stack
  // __gnu_make_trampoline takes a function identifier, returns a _Any_func*
  compare_fn_t* compare_fn_ptr = __gnu_make_trampoline(compare);

  // use it
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_fn_ptr);

  // explicitly free a single-function-pointer trampoline, without an executable stack
  __gnu_destroy_trampoline(compare_fn_ptr);

  return list[0];
}

This is discussed further in § 6.3 Make Trampoline and Singular Function Pointers.

3.2.4. The Nature of Captures

There’s a final issue with nested functions, and it’s that it is not suitable for use with asynchronous code or code that returns "up". Consider the same bit of code as before, but slightly modified:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t* make_compare(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int compare(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  }
  return compare;
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t* compare = make_compare(argc, argv);
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare);
	
  return list[0];
}

In this example, we have simply moved the in_reverse and compare generation into a function, for ease-of-use. One can imagine that we need to create this sort of function multiple times, from perhaps different sources of data. GNU Nested Functions allow us to do this and to return the function "up" the call stack. The problem, of course, is that Nested Functions (in a heap-based implementation or the current executable stack implementation) both point to the current "function frame" that it is created in. That is, while make_compare -- once it has returned -- is no longer alive and all of its automatic storage durations have died, the compare function pointer is still there and passed up the stack. This means that all accesses to in_reverse are accessing memory that is no longer alive, and it is effectively Undefined Behavior.

The actual manifestation of the undefined behavior in this program is very clear: adding the -r argument to make in_reverse turn to 1 does not have any effect on the program anymore:
https://godbolt.org/z/81d7Tqn1E

This is a critical failure of Nested Functions: it only ever "captures" function values by-name / by-reference. There is no option to capture by-value, and therefore the transportation and use of these function pointers to asynchronous code or "up" the call stack means it is fundamentally dangerous. This was not a huge problem in the early days of C, where programs were very flat and it was easy to always "move" function calls up. Unfortunately, we now have asynchronous programming, coroutine libraries / green threading models, callbacks that are saved and invoked much, much later in a program, and all sorts of models for shared code. This is the part of Nested Functions that cannot be saved; it is an intrinsic part of the design that unfortunately will always lead to Undefined Behavior because there is no way to get around that limitations in the GNU Nested Functions design. This is another serious problem that ultimately make it impossible to consider Nested Functions as THE solution for all of the C ecosystem.

NOTE: As a general rule of thumb, if the entity being designed has the ability to be transferred out of its current scope (Blocks, Nested Functions, Lambdas, ...) then it must as a rule allow for determining how it interacts with the scope it is nested in. The answer for function calls in most languages (without a garbage collector or other memory-preserving solution) is "cannot interact with its surrounding scope", which simplifies the problem. But, the whole point of these features IS to interact with the surrounding scope, and so care must be taken to make it work better.

3.2.5. GNU Nested Functions By-Name Captures Cannot Be Worked Around Normally

The deeply unfortunate part of GNU Nested Functions is that even if someone realizes the issue with by-name capture of the surrounding scope and tries to escape it, there is not successful way to actually provide that escape. Consider, briefly, the make_compare-style example from before but with a slight modification to "heapify" the in_reverse variable for safety reasons:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t* make_compare(int argc, char* argv[]) {
  /* LOCAL, heap-allocated variable.... */
  int* in_reverse = malloc(sizeof(int));
  *in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        *in_reverse = 1;
      } 
    }
  }
	
  int compare(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (*in_reverse) ? *right - *left : *left - *right;
  }
  return compare;
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t* compare = make_compare(argc, argv);
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare);
	
  return list[0];
}

Ignoring for a moment that there’s no free called in this scenario, the bigger and more pressing problem here is that this code does not even work. Despite having properly had the memory on the heap now, a by-name reference to the in_reverse pointer means that once the make_compare function exits, that pointer is no longer alive. Doing *in_reverse is just deferfencing a pointer whose value may or may not have changed and is effectively a form of stack-based use-after-free. This means that even if you want to try to make local variable references "safe" by relocating objects in the local arena to the heap, it is still not enough. You would need to deploy static or _Thread_local data in order to solve the problem, still, which brings us back to the original problems at the beginning of this introduction (§ 3 Introduction and Motivation).

3.2.6. Additional Modifications for Nested Functions

The paper [n3654] discusses various potential modifications and directions for GNU Nested Functions. It contains many assertions and future directions, which are discussed in the Appendix (§ 6.1 Accessing Context in Nested Functions).

3.3. Apple Blocks

Blocks are an approach to having functions and data that originate from Objective-C ([apple-blocks]) and are associated with Apple’s Clang. They were proposed a long time ago in an overview of Apple Extensions for C by Garst in 2009 ([n1370]), refined into a specific proposal in 2010 ([n1451] [n1457]). It was later further refined into a proper "Closures" proposal, and the name changed ([n2030]). However, none of these papers made the standard and despite a brief moment where GCC maintained a Blocks runtime, the extension has not been adopted outside of the C / Objective-C ecosystem (and the Blocks extension is no longer allowed or used at the moment).

Because there is a wealth of proposals and literature talking about Blocks, their implementation, their runtime, and more (see an answer discussing the intrinsics and implementation bits for Blocks), rather than inform proposal readers how they work and why they are not suitable for ISO C, this proposal will focus exclusively on why they are not usable for the whole C ecosystem.

3.3.1. Expression

Unlike GNU Nested Functions, Block definitions are an expression. That means that -- given appropriate typing -- one can use blocks directly within a function call:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int main(int argc, char* argv[]) {
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
  qsort_b(list, (sizeof(list)/sizeof(*list)), sizeof(*list),
    ^(const void* untyped_left, const void* untyped_right) {
      const int* left = untyped_left;
      const int* right = untyped_right;
      return (in_reverse) ? *right - *left : *left - *right;
    }
  );
	
  return list[0];
}

The special function_type^ and return_type (identifier^)(argument0, argumentN...) are Block types, which are special types that act as a handle to a Block. Blocks are not just a simplistic combination of a function and a context, however: much more effort is put into making them safe at execution time, and that is done by putting everything related to Blocks behind a hefty runtime.

3.3.2. Runtime Required

Apple Blocks are, at their core, dynamic objects that engage in type-erasure at the top-most level. Where C++ lambdas are completely non-type-erased and each contains a unique type, and where Nested Functions are completely type-erased behind normal function pointers with executable stacks + trampolines, Blocks are callbacks that are unique but have all of their type information erased and carted around in a new function pointer-like construct: the block. Block types are denoted by the ^ in their function type name, and typically are cheaply copiable handles to a heap-stored callable.

NOTE: The layout of the heap-stored callable is dictated by Apple, and has been reverse-engineered and deconstructed many times. Some of this work has been done by the Clang Team, and is the most up-to-date, thorough specification for it is stored in with the Clang documentation ([clang-blocks-spec]).

Of course, all of this is just to illustrate the problem: while Microsoft as a platform may not need to care, GCC and Clang both tend to occupy the same hardware and software spaces. Even if one compiler or another figured out how to be clever, the base layout -- and the premise of blocks being a handle to a heap, and not a compile-time sized object -- means that some form of dynamic allocation or heap is required. This is a net-negative for memory-constrained environments, and in implementations that attempt to be ABI-compatible with the original Apple Blocks implementation will be forced to lay their blocks out in the way that Apple has already specified.

3.3.2.1. More Complications: Generally Unsafe to Return

All block literals are not initially placed on the heap or allocated through the run-time, as a blanket optimization applied to all block literals. They start out on the stack! Which means that while the above code using a literal works just fine, this code is actually Undefined Behavior, in a way that’s equally as bad as GNU Nested Functions:

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t^ make_compare(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };
  return compare;
}

int compare_trampoline(const void* left, const void* right, void* user) {
  compare_fn_t^* p_compare = (compare_fn_t^*)user;
  return (*p_compare)(left, right);
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t^ compare = make_compare(argc, argv);
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);
	
  return list[0];
}

This code does not work. Despite having a runtime, it will NOT perform the lift to the heap automatically. The return from make_compare into compare_fn_t^ compare is a dangling reference, and is explicitly discouraged by reference materials and documentation from Apple. It must be modified to use Block_copy on the return:

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t^ make_compare(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };
  return Block_copy(compare);
}

int compare_trampoline(const void* left, const void* right, void* user) {
  compare_fn_t^* p_compare = (compare_fn_t^*)user;
  return (*p_compare)(left, right);
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t^ compare = make_compare(argc, argv);
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);
  Block_release(compare);

  return list[0];
}

Annoyingly, every call to Block_copy must be paired with a call to Block_release. This means that there’s now an invisible (from the perspective of main) block copy that now needs to be managed with a specific call to Block_release. One can imagine that every function that returns a Block type should just be assumed to need releasing, but this isn’t always the case: this means there’s an invisible lifetime tracking that even the runtime and the heap does not solve for us! Truly, unfortunate.

There is also a small gotcha in this example, that only shows up based on where the compare block is created. This has to do with how Captures work under the Blocks feature.

3.3.3. Captures

Captures in Apple Blocks work in one of two ways.

The reason Apple used this technique, as talked about before, is because it’s safer than Nested Functions in the particular regard of using variables and carting them around.

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t^ make_compare(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };
  return Block_copy(compare);
}

int compare_trampoline(const void* left, const void* right, void* user) {
  compare_fn_t^* p_compare = (compare_fn_t^*)user;
  return (*p_compare)(left, right);
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t^ compare = make_compare(argc, argv);
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);
  Block_release(compare);
	
  return list[0];
}

The first thing to note is that, because this cannot be translated to a single function pointer, we cannot use qsort. This means that using such a function without creating some kind of special trampoline is off-limits to us. Again, this is something that could be solved by the introduction of an explicit heap-based trampoline creator (§ 6.3 Make Trampoline and Singular Function Pointers). Or, one would need to introduce a "wide function pointer" type -- which is exactly what function_type^ is -- and change qsort’s signature to use it for the callback.

NOTE: Simply upgrading qsort with a Block type is an ABI break that would cause old, already-compiled libraries and programs mixed with new programs to combust in painful, hard-to-detect ways. To solve this problem one would need to rely on existing C extensions like Assembly Labels, or work towards Transparent Aliases ([transparent-aliases]).

But, if someone uses qsort_s -- which takes a void* user data parameter -- one can use it without altering the signature of qsort directly and create a void*-kick off point as a trampoline. However, there is a bit of an interesting conundrum. Take, for example, moving the creation of the block function further upwards inside of make_compare:

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t^ make_compare(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;
	
  compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  return Block_copy(compare);
}

int compare_trampoline(const void* left, const void* right, void* user) {
  compare_fn_t^* p_compare = (compare_fn_t^*)user;
  return (*p_compare)(left, right);
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t^ compare = make_compare(argc, argv);
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);
  Block_release(compare);
	
  return list[0];
}

This code will sort the list incorrectly even if in_reverse is set to 1. That’s because Blocks will capture the variables that get used at the point-of-creation. The value at the point-of-creation is 0, therefore, in_reverse is 0 when the function is called later. Even though the in_reverse variable is copied into the block, the block is now sensitive to where it is being created without any indication that it behaves that way. This is safe, but the behavior would throw someone who uses Nested Functions religiously off completely.

This is, of course, easily fixable by just... moving it down, so it’s hardly a problem:

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t^ make_compare(int argc, char* argv[]) {
  /* LOCAL variable.... */
  int in_reverse = 0;
	
  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }

  // Compare function, with block copy: copies the right value
  compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };

  return Block_copy(compare);
}

int compare_trampoline(const void* left, const void* right, void* user) {
  compare_fn_t^* p_compare = (compare_fn_t^*)user;
  return (*p_compare)(left, right);
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t^ compare = make_compare(argc, argv);
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);
  Block_release(compare);
	
  return list[0];
}

If you don’t want to move the creation of the Block for some particular reason, it is not the only way to capture a variable for Blocks! The second way it can capture variables is by using __block, which works like so:

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

compare_fn_t^ make_compare(int argc, char* argv[]) {
  /* BLOCK-QUALIFIED variable.... */
  __block int in_reverse = 0;
	
  compare_fn_t^ compare = ^(const void* untyped_left, const void* untyped_right) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  return Block_copy(compare);
}

int compare_trampoline(const void* left, const void* right, void* user) {
  compare_fn_t^* p_compare = (compare_fn_t^*)user;
  return (*p_compare)(left, right);
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  compare_fn_t^ compare = make_compare(argc, argv);
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);
  Block_release(compare);

  return list[0];
}

This will work as expected, even though the creation of the block comes after in_reverse is declared and before modification of the variable. __block lifts the variable being declared up into a heap (or heap-like space) that is managed by either a garbage collector or an automatic reference-counted memory implementation. This both reverses the onus of capturing / handling the variable onto the surrounding scope and makes it safe-by-design. This does leave an unfortunate gap in that there’s no way to do the dangerous thing or opt into a direct reference without making an explicit declaration of the pointer and then using the pointer by-copy instead in that block, which can leave some memory footprint and program speed on the table without aggressive optimization.

NOTE: The address of in_reverse might actually change, depending on if the Apple intrinsic Block_copy is used to copy the block itself before being run. This code does not depend on it, but a hash map that stores the address of variables might experience the address of any __block-annotated variables changing between creation and the innovation of Block_copy. This was changed later on to always set the variable in a location so that a steady address exists, but how conforming it is to keep the old behavior is likely an implementation choice. While there was previously a Blocks runtime for GCC, it’s fallen off: it may make a comeback again in order to be more compatible with the C and C++ code on Apple platforms: whether they will choose the same implementation technique is not known as of the writing of this paper.

All in all, however, these two things are safe: either a copy is happening and is stored along with the creation of the block on the heap, or the variable itself is having its lifetime prolonged by a sort of automatic-tracking. The second of these is very against the typical properties of C, but that matters little in the face of the obvious safety it brings to the table. Unfortunately, because all of this happens magically and in a mostly-unspecified manner, it’s very detrimental to the proliferation of the C ecosystem and having several loosely-connected implementations working towards the same improved implementations.

3.3.4. Optimization: Folding Escapes

As a matter of optimization, Blocks do not necessarily have to pollute the heap. And indeed, most immediately invocations of a block or pass-down (rather than pass-up) invocations that are visible will be optimized into a direct call. Unfortunately, this is not something that is encouraged by the general design of Apple Blocks and Objective-C or Objective-C++. Because taking the address or passing the function along leaves it open to how far the handle-to-some-heap Block-typed object might be passed, compilers have to correctly (and pessimistically) generate the full, indirect-function-call representation as a matter of course. Block_copy also needs to be used, explicitly, in many cases, making it not much better in regular C code that uses malloc.

As a brief aside in Programming Language design, this sort of optimization problem is mitigated by changing how the defaults are applied and giving the user explicit control. For example, the Swift Programming Language solves this problem while still being compatible with Objective-C and C++ by making it so every "Block" or "Closure" type must be annotated if it "escapes" beyond the compiler’s knowledge, otherwise the program just refuses to compile ([swift-escapes]). This allows aggressive optimization to be applied by-default, with weaker static analysis-based optimizations or escape analysis optimization only acting as a fallback in the @escaping-annotated case. The annotation also makes it so Block_copy does not need to be invoked and rather than having to copy it to a heap version of itself, it is simply put in the right format and place, ready to interoperate cleanly with C, C++, Objective-C or Objective-C++.

NOTE: Swift’s native function type and ABI is not identical to the Blocks type at all. This is just an example in how designing with no-escape as a default and then taking it off to allow for taking its address or returning it from a function

In the opposite direction, the C-style attribute that can be used to say "the closure never escapes", which enables optimizations for crunching the object down and assuming that it never is invoked outside of the function. This is available through the NS_NOESCAPE macro, which expands to __attribute__((noescape)) and can be used to gain better binary size and reduce indirection for code execution speed where possible.

3.3.5. (Explicit) Trampolines: Page-based Non-Executable Implementation

Objective-C has the ability to create a C-style function pointer from a Block ([objective-c-block-trampoline]), and this implementation can be wielded from regular C code on Apple too. This implementation is an entirely different version that does not use a heap but instead a separate "stack" (a single page). That page is a writable (but NOT executable) slab of memory trampolines and Block pointer data in it. The actual trampoline function pulls from this single page one of the data pointers and then sets it up to call; because the trampoline is separate from the actual data, there is much less security risk from writing and reading a pointer that otherwise might be local to executable memory.

However, because of the implementation, occasionlly Objective-C can run out of trampolines: if enough are created before any are deallocated, the entire page can be filled up with trampolines. This will trigger errors and failures to create the block. Therefore, even compared to heap-based GNU Nested Function implementation (§ 3.2.3.2 Attempt 3), there exists a potential tradeoff in the designs. This is why trampolines need to be explicit and should be handled in a separate paper, with an interface that can be generalized and can report potential errors (§ 6.3 Make Trampoline and Singular Function Pointers).

3.4. C++-Style Lambdas

C++ lambdas, despite coming from C++ initially, is the only solution not to apply executable stack, separate stack, Function Descriptors, or other dynamic/runtime data, to the problem. It was detailed in a large collection of proposals from Jens Gustedt and almost made C23, but ambition in trying to allow for type-generic programming through lambdas with auto parameters stalled progress and ultimately halted everything ([n2923] [n2924] [n2892] [n2893]). It makes a unique type for each and every lambda object that gets created using the syntax [ captures ... ] ( args ... ) { /* code ... */ }, and that object has an implementation-defined but compile-time known size.

NOTE: While type inference for function returns are not yet in, a section of [n2923] was broken off for just variable definitions which ultimately succeeded. Type-inferred variable declarations work properly in C23 and are an important part of Lambdas being able to work in the manner envisioned by Gustedt’s proposals and by C++.

If a lambda has no captures, it can reduce to a function pointer like so:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

static int in_reverse = 0;

int main(int argc, char* argv[]) {

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
  auto compare = [](const void* untyped_left, const void* untyped_right) {
    const int* left = (const int*)untyped_left;
    const int* right = (const int*)untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };
  // "compare" below becomes a function pointer
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare);
	
  return list[0];
}

The one thing that makes it different from GNU Nested Functions -- and somewhat similar to Blocks -- is that it is also an expression. That means that it can also be used inside of a function call expression as an argument, or as part of any other complicated chain of additions:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

static int in_reverse = 0;

int main(int argc, char* argv[]) {

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list),
    [](const void* untyped_left, const void* untyped_right) {
      const int* left = (const int*)untyped_left;
      const int* right = (const int*)untyped_right;
      return (in_reverse) ? *right - *left : *left - *right;
    }
  );
	
  return list[0];
}

Again, this only works because there are no captures. When captures get involved, the above code will simply stop compiling at all because the conversion to a function pointer stops working. This can be solved in the ways discussed previously, such as:

This gives Lambdas a heavy weakness similar to all of the other solutions: there must be either a trampoline, a wide function pointer type, a void* user data/context pointer, or a something else to accommodate the lack of transformation into a singular function pointer.

NOTE: C++ as a language has a more robust set of core primitives, they don’t have to worry about this problem. std::function<FUNCTION_TYPE> is a type-erased way to transport a whole function object, as a copy, through API boundaries that is defined entirely in their library mechanisms. For a view into any old function, there is std::function_ref<FUNCTION_TYPE>, which is like a function pointer but allows pointing into many different kinds of functions that exist in C++. This makes the integration of lambdas into C++ easy; in C, it is much more difficult to have this all participate in the system automatically. There is no std::function-alike in C, and there’s no std::function_ref-alike in C either for new function types like GNU Nested Functions, Apple Blocks, or otherwise. This problem has to be solved separately, with a Wide Function Pointer type (§ 6.2 Wide Function Pointer Type) or with explicit user trampoline-making capability such as § 6.3 Make Trampoline and Singular Function Pointers.

3.4.1. Captures and Data

Lambdas do not capture any context by default, unlike both GNU Nested Functions or Apple Blocks. Instead, every capture must be manually annotated as either being taken by value or by reference, or a "universal capture" must be used to set a default method of capturing all visible, in-scope, block objects.

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

// a new kind of return type: "inferred"
auto make_compare(int argc, char* argv[]) {
  int in_reverse = 0;
	
  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  // explicitly capture in the "[ ]" of the lambda
  auto compare = [in_reverse](const void* untyped_left, const void* untyped_right) {
    const int* left = (const int*)untyped_left;
    const int* right = (const int*)untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };
  return compare;
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  auto compare = make_compare(argc, argv);
  auto compare_trampoline = [](const void* left, const void* right, void* user) {
    typeof(compare)* p_compare = user;
    return (*p_compare)(left, right);
  };
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);

  return list[0];
}

This, unfortunately, also makes them susceptible to location just like Blocks; the moment of creation during execution is the state they capture when using unadorned identifiers in_reverse and = captures. This code would capture in_reverse before any important modifications happens.

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

// a new kind of return type: "inferred" (`auto`)
auto make_compare(int argc, char* argv[]) {
  int in_reverse = 0;
	
  // "capture all variables" annotation `=` -- same as writing the flat name of
  // every object currently in-scope.
  auto compare = [=](const void* untyped_left, const void* untyped_right) {
    const int* left = (const int*)untyped_left;
    const int* right = (const int*)untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };
  // lambdas and captures do not reflect any changes beyond this point,
  // including the `in_reverse`

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  return compare;
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  auto compare = make_compare(argc, argv);
  auto compare_trampoline = [](const void* left, const void* right, void* user) {
    typeof(compare)* p_compare = user;
    return (*p_compare)(left, right);
  };
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);

  return list[0];
}

Like Apple Blocks, a lambda with captures is safe to return with the by-value capture (if one briefly ignores the need for Block_copy to reseat the memory of a Block). Additionally, it is better here because there is no usage of the heap needed to do this.

ASSERTION: This is PRIMARILY due to C++-style Lambdas just being normal objects. They have a compile-time sizeof(...), a compile-time alignof(...), can have their unique type inspected with typeof(...) (decltype(...) in C++), and are generally autonomous. There’s no erasure happening here as is with Blocks (the function_type^ type) or Nested Functions (the function_type* type); each Lambdas is a unique type, similar to the unique type gained by pairing a function with a hand-made struct.

The only problem is that this requires a feature that was proposed for C23 but didn’t make it (along with Lambdas not making it): deduced return types. There was consensus to have the feature, but the feature was bundled with the set of Lambda proposals, and thus fell through during the final stretch of C23. Therefore, a proposal for inferring the type of a function return should be separated from the previous proposals (such as [n2923]) in order to accommodate such behavior in C.

One of the things that’s better about Lambdas over Apple Blocks is that they also allow for by-name capture, just like Nested Functions do. So, this code -- despite having the lambda defined in main and before in_reverse is changed -- will work as expected:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int main(int argc, char* argv[]) {
  int in_reverse = 0;

  // & is by-reference capture
  auto compare = [&](const void* untyped_left, const void* untyped_right) {
    const int* left = (const int*)untyped_left;
    const int* right = (const int*)untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
	
  auto compare_trampoline = [](const void* left, const void* right, void* user) {
    typeof(compare)* p_compare = user;
    return (*p_compare)(left, right);
  };
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);
	
  return list[0];
}

This will invoke undefined behavior in the case of moving this by-name capturing lambda into a function and then returning it. Capturing a name with &some_identifier (or using the "default capture" of & by itself) always captures by pointer of the variable.

Even if it is more explicit that in the Nested Functions case, the danger is still present and so care must still be exercised. That is, the following is undefined behavior because of the explicit by-reference & capture:

#define __STDC_WANT_LIB_EXT1__ 1

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

// a new kind of return type: "inferred" (`auto`)
auto make_compare(int argc, char* argv[]) {
  int in_reverse = 0;
	
  // capture just one variable, and capture it "by-name" / "by-reference"
  auto compare = [&in_reverse](const void* untyped_left, const void* untyped_right) {
    const int* left = (const int*)untyped_left;
    const int* right = (const int*)untyped_right;
    return (in_reverse) ? *right - *left : *left - *right;
  };

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        // lambda will reflect this change
        in_reverse = 1;
      } 
    }
  }

  // uh oh...
  return compare;
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  auto compare = make_compare(argc, argv);
  auto compare_trampoline = [](const void* left, const void* right, void* user) {
    typeof(compare)* p_compare = user;
    return (*p_compare)(left, right);
  };
  qsort_s(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare_trampoline, &compare);

  return list[0];
}

This means that lambdas can be made to be unsafe, by capturing things whose lifetime dies even as the lambda itself is passed around or returned. This allows for a sleeker representation and no runtime-heap, but with the OBVIOUS drawback that no automated reference-counted variable also means no implicit lifetime safety like with the __block variables. Thankfully, since captures can be done both ways, the user can either choose to capture by reference, choose to capture by value, or -- if needed -- choose to allocate and then capture the new allocated pointer by value themselves. Of course, any explicit allocation will need to be freed, just as it would in a Blocks scenario. This usually implies waiting for a signaling callback from the API that it is done, or elevating the lifetime to a higher level to be deleted at a later time.

3.4.2. What About Lifetime / Destructors?

A common criticism of Lambdas and their unique type, whole-object approach is that such an approach with captures requires C++-style destructors to work well. We are completely unsure why this is the case or why this criticism keeps being levied specifically at Lambdas. In the previous section on Apple Blocks (§ 3.3.2.1 More Complications: Generally Unsafe to Return), coordinated function calls and documentation are the only way to communicate that a user has Block_copy’d an object and therefore requires Block_release. Similarly with GNU Nested Functions, returning them up the stack at all is pure undefined behavior, that has tangible effects on the program (§ 3.2.4 The Nature of Captures): these are problems endemic to C. Using a complex data structure like a binary tree or allocating memory requires that it is documented and communicated to the user: capturing such complex types and having it called over a longer period of time simply means the user has the responsibility to clean up or free the resources.

C APIs will always provide a user with provided functions a way to know when something must be cleaned up. For example, the Lua C API has an allocation function that is specifically called with a "new size" parameter of 0, it means memory passed in must be freed; that’s how it communicates what the current action is. Similarly, ev/libev -- with ev_set_allocator -- provides a hook to allow a user to manage the memory of the library, while also providing several statuses in callbacks for watchers (initialized/pending/running/stopped/etc.). Even for standard C, thrd_create passes a void* user data to the func that gets run on the new thread: a user must allocate and then pass the user data to the thread, and it becomes the thread’s responsibility to manage the lifetime of that type in a manner that is thread safe.

Any C API worth its salt, when dealing with convoluted lifetimes, provides to a given callback (through its parameters) a notification that the memory is not usable anymore, OR a separate callback (for more full-fledged APIs) that notifies the API that its finished or done with a specific operation, and therefore safe to close things out. The secondary alternative is, of course, statically sourcing lifetime until the user themselves guarantees all resources can be safely freed using some outside knowledge (e.g., an explicit set of calls after the start of the library to cleanup/close/stop the library). This is not just a point with lambdas: every attempt at solving this problem has to engage with this. Whether it’s using Block_copy to ensure the lifetime of an Apple Block, or malloc to make sure a struct type being pointed to by a void* is accessible after a dispatch to an asynchronous function call. This is simply not a problem unique to lambdas: lifetime tracking and safety will always be a problem in C because C has no extended concept of lifetime duration beyond Effective Types.

Any problem with lifetime is going to be present in every single iteration of the solution to this problem, and is going to manifest in different ways:

and so on, and so forth. That’s a C-intrinsic problem, and the only thing any design in this space can do is offer better tools or better control to manage or avoid such problems where possible. Lifetime management is not solvable in C as it stands, and no amount of features or tinkering or attributes will really change the intrinsic language design flaw that is pointers and references that do not have any compile-time trackable properties asides from what can be inferred with (potentially strenuous) static analysis.

One way to alleviate this -- which would be beyond what is currently within C++ and what has been proposed previously -- is to allow for lambdas (beyond their initialization/creation) to have "accessor" syntax for any of its captures using the lambda.identifier syntax. This is explored in the later design for the solutions, within § 4.2.5 NEW: Data Captures are Accessible.

3.5. Literal Functions

There is not much to say about Literal Functions ([n3645]) as they deliberately do not engage with the problem of trying to capture and use data. The syntax is based on Compound Literals, wherein the (type-name){ ... } syntax is repurposed. Currently, type-name being a function type is just a constraint violation in C, so it is safe to repurpose this syntax.

By not engaging with the closure/capture issue, Literal Functions seek to just be a prettier form of ISO C regular functions. This provides the benefits of not needing to have a wide function pointer type immediately (albeit one is still needed for the general ecosystem), and it allows code to be read in a much more friendly format by localizing the function pointer and, potentially, any user data structures that go with a void*. As a compound literal, it also immediately works since it can be created/used as an expression, meaning it can be passed to function arguments:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

static int in_reverse = 0;

int main(int argc, char* argv[]) {
  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list),
    (int(const void* untyped_left, const void* untyped_right)) {
      const int* left = untyped_left;
      const int* right = untyped_right;
      return (in_reverse) ? *right - *left : *left - *right;
    }
  );
	
  return list[0];
}

Unfortunately, not having any solution or future direction for captures and repurposing the compound literal syntax for it means that it seems more like a dead end. In the above example, we still have to transfer the in_reverse with a static for the qsort call. It gets slightly better if the API has a void* user data carveout, like for qsort_r:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

int main(int argc, char* argv[]) {
  int in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
	
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  qsort_r(list, (sizeof(list)/sizeof(*list)), sizeof(*list),
    (int(const void* untyped_left, const void* untyped_right, void* user)) {
      const int* in_reverse = (const int*)user;
      const int* left = untyped_left;
      const int* right = untyped_right;
      return (*in_reverse) ? *right - *left : *left - *right;
    },
    &in_reverse
  );
	
  return list[0];
}

A more pronounced example that uses more than an int shows that one can write both the struct and callback locally:

#include <stdlib.h>

typedef void async_callback_t(int result, void* data);
void async(async_callback_t* callback, void* data);

int main() {
	
  // struct and callback are next to each other
  struct { int value; }* capture = calloc(1, sizeof(*capture));  
  auto function  = (void (int result, void * data)) {
    // anonymous struct only identified by `typeof`,
    // keeps the exact type and helps reduce Type Confusion errors
    typeof(capture) captured = data;
    free(captured);
  };
  async(function, capture); // used immediately: hard to lose track
  return 0;
}

4. Design

Given the following properties from all of the extensions and proposals for this in the wild:

Feature GNU Nested Functions Apple Blocks C++-Style Lambdas in C Functions Literals
Capture By-Name
(default, use-based)

(__block ident;)

([&], [&ident])
Capture By-Value
(default, used-based)

([=], [ident])
Selective Capture
(use-based, by-name only)

(for by-name)

(for by-value, use-based)
Safe to Return Closure ⚠️
(requires Block_copy)

(never unsafe)
Relocatable to Heap
(Lifetime Management)

(Block_copy/Block_release)

(malloc/memcpy/free)

(not needed)
Usable Directly as Expression
Immediately Invokable
Convertible to Function Pointer ⚠️
(only capture-less)
Convertible to "Wide" Function Type
Access to Non-Erased Object/Type
((wide) function pointer only)

(Block type/wide function pointer only)

(unique type/size)

(no object to access)
Recursion Possible
(use the identifier of the nested function)

(__self_func required)

(__self_func required)

(__self_func required)

This proposal is going to propose two distinct options for standardization, with the recommendation to do both. It is critical to do both for the approval of the C ecosystem in general (with § 4.2 Capture Functions: Rehydrated Nested Function), and for the maximum amount of external language compatibility (C++ in particular, with § 4.3 Lambdas). The necessary and core goals of this proposal are focused on:

Lambdas already have usage experience with well-known properties that can be directly translated to C and is easy enough to understand, despite the unfortunate syntax. Capture Functions are a simple modification of Nested Functions that produce a sized object (similar to Lambdas) and makes their captures explicit, allowing for a degree of control and additional safety that was not present in the original Nested Functions design.

We are not focused on interoperability with singular function pointers. We believe that should be left to a separate, explicit mechanism in the language, capable of allowing the user to choose where the memory comes from and setting it up appropriately. This way, a user can make the decision on their own if they want to use e.g. executable stack (with the consequences that it brings) or just have a part of (heap) memory they set with e.g. Linux mprotect(...) or Win32 VirtualProtect to be readable, writable, and executable. Such a trampoline-maker (as briefly talked about in § 6.3 Make Trampoline and Singular Function Pointers) can also be applied across implementations in a way that the secret sauce powering Nested Functions cannot be: this is much more appealing as an approach.

We DO NOT take any of the design from Blocks because the Blocks design is, as a whole, unsuitable for C. While its deployment of a blocks "type" to fulfill the necessary notion of a "wide function pointer" type is superior to what Nested Functions have produced, the implementation details it imposes for __block variables and the excessive reliance on an (underspecified) runtime/heap are detrimental to a shared & unified approach to C.

NOTE: The Blocks runtime/heap layout has changed (at least) once in its history: the only reason this worked is because Apple owned every part of the Blocks ecosystem. Apple can do whatever they want with it, however they want, whenever they want: this does not work in a language with diverse, loosely coordinated implementations like C and not Objective-C, Objective-C++, or Swift.

As the heap is (typically) repulsive to some freestanding implementations, we do not want to standardize something that will have similar technological drawbacks like VLAs, where -- even if no syntactical or language-design issues exist from the way blocks are written -- the presence of an unspecified source of memory (stack or heap) produces uncertainty in the final code generation of a program.

The feature table for these two looks like this:

Feature C Lambdas Capture Functions
Capture By-Name ✅ ([&], [&ident]) ✅ (_Capture(&), _Capture(&ident))
Capture By-Value ✅ ([=], [ident]) ✅ (_Capture(=), _Capture(ident))
Selective Capture
Safe to Return Closure
Relocatable to Heap
(Lifetime Management)
✅ (malloc/memcpy/free) ✅ (malloc/memcpy/free)
Usable Directly as Expression
Immediately Invokable
Convertible to Function Pointer ⚠️
(only capture-less)
⚠️
(only capture-less)
Convertible to "Wide" Function Type
Access to Non-Erased Object/Type
(unique type/size)

(unique type/size)
Recursion Possible
(__self_func required)

4.1. What is NOT Being Proposed!

While we would like to standardize them in the future, this proposal is NOT looking to standardize statement expressions, a "make trampoline" compiler intrinsic, or a wide function pointer type.

4.1.1. Statement Expressions?

Statement Expressions should be standardized. While it is related to these efforts, it is entirely separate and has a full, robust set of constraints and concerns in standardizing. It has more existing implementation experience, deployment experience, and implementer practice than any of Blocks or Nested Functions combined. Therefore, it will be pursued in a different proposal. This was briefly noted in a proposal collecting existing extensions in 2007 by Stoughton ([n1229]); while there was enthusiastic support at the time, nothing materialized of the mention nor the in-meeting enthusiasm. Some attempts are being made at standardizing it, but it is notably difficult to standardize due to the large number of corner cases that arise from needing to clarify semantics of constructs that normally cannot appear in certain places being able to suddenly appear there, like a break; being placed in the initializer expression of a for loop.

Another advantage of Statement Expressions is that, unlike any of Apple Blocks / C++ Lambdas / GNU Nested Functions, there is no separating function body. This is critical for writing macros that coordinate with one another, AND is critical in writing reusable macros that have no additional cost and does not set up extra individual entry points. For example, there are hundreds of permutations of the functions in C2y’s <stdmchar.h> that could be written to make them easier to use, to make them not require double-pointers, to infer the size from a C-style string, and so on, and so forth. The choice of having a bunch of macros which simply repeat the same code means not having to add hundreds of permutations of the <stdmchar.h> functions (5 different character types across 5 different encoding types with 6 forms of "pointer and length, just pointer" for input/output, and typical skip/ignore/replace-character error handling strategies, pairwise with one another where order matters).

Another place that statement expressions come in handy is with RESULT/TRY/etc. macros, primarily used for low-level code where handling (and possibly enforcing error handling) is desirable through error codes and return types, as demonstrated by jade and lak here: https://gist.github.com/LAK132/0d264549745e8196df1e632d5b518c37. Being able to error and jump out or error and stop if things do not work is a very common (and powerful) idiom for writing straightforward code, and is employed heavily in many different ways across the C ecosystem in various forms.

This paper does not standardize Statement Expressions, and leaves that to a future paper similar to n3643 ([n3643]).

4.1.2. Wide Function Pointer Type?

We do hope that another paper creates a new "Wide Function Pointer" type of some kind. Some suggestions can be found in § 6.2 Wide Function Pointer Type.

4.2. Capture Functions: Rehydrated Nested Function

Capture Functions are a slight modification of the design of Nested Functions. We start from the base of Nested Functions with three goals in mind.

A brief demonstration of all of the well-defined behavior:

auto make_seven (int x) {
  int y = 7;
  int seven_fn() _Capture(x, y) {
    return x * y;
  }
  return seven_fn; // OK: unique type which
  // is a complete object
}

typedef int eight_fn_t();

eight_fn_t* make_eight () {
  int eight_fn () _Capture() {
    return 8;
  }
  return eight_fn; // OK: empty capture converts to function pointer
}

#if 0
typedef int nine_fn_t();

nine_fn_t* make_nine () {
  int val = 30;
  int nine_fn () _Capture(val) {
    return val;
  }
  return nine_fn; // constraint violation: cannot convert
  // captures to function pointer
}
#endif

int main () {
  int x = 3;
  int zero () {
    // OK, no external variables used
    return 0;
  }
  int also_zero () _Capture() {
    // same as above, just explicit
    return 0;
  }
#if 0
  int double_it () {
    return x * 2; // constraint violation
  }
#endif
  int triple_it () _Capture(x) {
    return x * 3; // OK, x = 3 when called
  }
  int quadruple_it () _Capture(&x) {
    return x * 4; // OK, x = 5 when called
  }
  int quintuple_it () _Capture(=) {
    return x * 5; // OK, x = 3 when called
  }
  int sextuple_it () _Capture(&) {
    return x * 6; // OK, x = 5 when caled
  }
  x = 5;
  auto seven_tuple_it = make_seven(x);
  eight_fn_t* eight = make_eight();
  return zero() + triple_it() + quadruple_it()
    + quintuple_it() + sextuple_it() + seven_tuple_it()
    + eight();
  // same as
  // return 117;
  // 0 + (3 * 3) + (5 * 4)
  // + (3 * 5) + (5 * 6) + (5 * 7)
  // + 8
}

We go over the purpose of the design of this and the reasons for that design here.

4.2.1. Capture Functions are Complete Objects

The most important change from typical GNU Nested Functions and mirroring behavior from C++ Lambdas is that nested functions -- the identifier itself introduced by the definition of the function -- is a regular, normal, complete C object. This enables it to be:

These are important qualities to allow these functions with data to be used with asynchronous code, as (stored) callbacks, and in other scenarios. The size and alignment of the object is implementation-defined, and its layout is also entirely implementation-defined, much like the properties of a regular struct or union type. This allows implementations to not have to figure out how to squash everything into a single erased type, and instead enforce the Single Responsibility Principle; they already know how to create unique types, they already know how to create and fill structure types, and now separately a wide function pointer type or a "make trampoline" compiler feature (§ 6.3 Make Trampoline and Singular Function Pointers) can be developed.

Given an extremely simple example:

#include <stdlib.h>
#include <stdio.h>

typedef void work_fn_t(void* user);
void add_work(work_fn_t* work, void* user);
bool work_done();

void kickoff(int start, int limit) {
  void work() _Capture(start, limit) {
    printf("doing work for %d to %d\n", start, limit);
    for (int i = start; i < limit; ++i) {
      printf("sooo much work - %d\n", i);
    }
  }
  void work_trampoline(void* user) {
    (*((typeof(work)*)user))()
    // free lambda after work is done
    free(user);
  };
  // elevate to higher lifetime to survive async function call time
  void* work_ptr = malloc(sizeof(work));
  memcpy(work_ptr, &work, sizeof(work));
  add_work(work_trampoline, work_ptr);
}

int main (int argc, char* argv[]) {
  int start = 0;
  int limit = 30;
  if (argc > 1)
    start = atoi(argv[1]);
  if (argc > 2)
    limit = atoi(argv[2]);

  kickoff(start, limit);

  while (!work_done());
  // no memory leaks at the end of the program
  return 0;
}

4.2.2. Deduced Return Types, Unique Types

Reusing an example from the above code, the make_seven function needs to have a special, inferred/deduced return type. This is because the type of a Capture Function is not known until it is defined:

auto make_seven (int x) {
  int y = 7;
  int seven_fn() _Capture(x, y) {
    return x * y;
  }
  return seven_fn; // OK: unique type which
  // is a complete object
}

The auto return type here just means "the first return expression is the return type of the function". This only works with in-line function definitions, and does not allow for a separated function declaration/definition (as the separated declaration would not have a material, real type until the definition could be read). This only applies to functions with inferred return types like this, where the first declaration of such a function must also be its definition.

If no return appears in such a function, or all the returns do not contain an expression, the return type is inferred to be void. Otherwise, all the return <expr>; must return the exact same type. If there exists one or more return <expr>;s and the types are not exactly the same in the whole function definition, then it is a hard error. This is already partly described in Jens Gustedt’s "Type inference for variable definitions and function returns v6" ([n2923]); reviving this paper would be a matter of rebasing it on the current working draft and improving the wording present.

4.2.3. Data Captures are Explicit

Data captures, the way in which local data is accessible inside of the function, are explicit. The only reason captures are explicit is because it is impossible to tell if something should be captured by value (and copied into whatever implementation-defined holding space is used for the Capture Function’s complete object), or if something should be captured by name/reference (and only have its pointer/address copied into whatever implementation-defined holding space is used for the Capture Function’s complete object). This detail matters both for safety reasons when assigning, copying, storing, and otherwise relocating a capture function from its original scope.

NOTE: static and _Thread_local objects, as well as typical file-scope declarations, are accessible within a capture function in the normal way. constexpr objects, without a static specifier, at local scope are also accessible.

Allowing for explicit captures also allows for better type checking (used objects must be explicit acknowledged by the programmer that they should be used), and allows for covering both the use cases of Apple Blocks (default by-value capture) and GNU Nested Functions (default by-name capture) without breaking anything. The lack of a capture also covers all of the use cases that Literal Functions would have covered, which means that Capture Functions can sufficiently cover all of the existing use cases currently in production in C ecosystems. To match the default behaviors:

Only one "capture all" is allowed. That is, _Capture(=, &) (and vice-versa) is illegal. The rest of the specific captures for accessible identifiers can be specified in any order. Note that specific captures for a given object override the default implicit "capture all" behavior. For example:

int main () {
  int x = 30;
  int y = 10;
  int fn () _Capture(&, x) {
    return x + y;
  }
  x = 50;
  y = 40;
  return fn();
}

This program returns 70 (x is captured by-value as 30, y is captured by-name and is changed to 40 before invocation). The change to x on the outside to 50 is not reflected inside of the invocation. This allows an ease-of-use for specifying the "default" implicit all-capture, while letting the user select specifically which captures should work.

4.2.4. Data Captures can be Renamed

Data captures can be renamed (or computed, with an expression that does not include a , unless it is parenthesized). This is important for e.g. incrementing reference counters for copying large, important data structures into callbacks that may either be invoked multiple times or have their own long-lived lifetime. The syntax for this occurs within the _Capture clause of a Capture Function:

#include <tree.h>

TREE_DECLARE(int_tree_t, int_tree, int);
TREE_IMPLEMENT(int_tree_t, int_tree, int);

#include <stdcountof.h>

enum queue_status {
  qs_success,
  qs_timedout,
  qs_busy,
  qs_fail,
  qs_invalid
};

typedef int work_fn_t(void* user);

queue_status add_dispatch_work(work_fn_t* work, void* user);
queue_status is_work_done();
void work_shutdown();

int main () {
  int data[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  int_tree_t tree = int_tree_init_with(data, data + countof(data));
  int work () _Capture(my_tree = int_tree_copy(tree)) {
    /* do work.... */
    int elem = int_tree_remove(my_tree, int_tree_min_node(my_tree));
    /* blah blah blah */
    return 0;
  }
  int work_trampoline (void* user) _Capture() {
    return (*((typeof(work)*)user))();
  }
  if (add_dispatch_work(work_trampoline, &work) != qs_success) {
    return 1;
  }
  queue_status err;
  while ((err = work_done()) != qs_success) {
    swith () {
      case qs_invalid:
      case qs_timedout:
      case qs_failed:
        // some error happened
        work_shutdown();
        return 2;
      default:
        break;
    }
  }
  work_shutdown();
  return 0;
}

4.2.5. NEW: Data Captures are Accessible

An important adjustment to make sure this code works better than the way it did for Blocks or Nested Functions is the ability not only to copy (§ 4.2.1 Capture Functions are Complete Objects) or otherwise rename objects (§ 4.2.4 Data Captures can be Renamed), but ALSO to get at the internals of a given Capture Function. This is something missing from GNU Nested Functions (which provides no real resolution for it) as well, and something that could matter for Apple Blocks but does not in practice because they can turn any object into a shared one with the __block modifier on an object. In particular, this only matters in the case of a closure which is given a (copied) resource that must either be released or freed.

NOTE: Thanks to Alex Celeste, for being the first person to bring this to my attention!

The syntax looks just like normal structure access, and is based on the names placed in the _Capture clause:

#include <stdio.h>

int main () {
  int x = 30;
  double y = 5.0;
  char z = 'a';

  int cap_fn0 () _Capture(=, &renamed_x = x) {
    printf("inside cap_fn0 | renamed_x: %d, y: %f, z: %c\n",
      renamed_x, y, z);
  }
	
  int cap_fn1 () _Capture(&, renamed_y = y) {
    printf("inside cap_fn1 | x: %d, renamed_y: %f, z: %c\n",
      x, renamed_y, z);
  }
	
  x = 60;
  y = 10.0;
  z = 'z';

  cap_fn0();
  cap_fn1();
	
  printf("\n");

  printf("inside main fn | cap_fn0.renamed_x: %d, cap_fn0.y: %f, cap_fn0.z: %c\n",
    cap_fn0.renamed_x, cap_fn0.y, cap_fn0.z);
  printf("inside main fn | cap_fn1.x: %d, cap_fn1.renamed_y: %f, cap_fn1.z: %c\n",
    cap_fn1.x, cap_fn1.renamed_y, cap_fn1.z);

  return 0;
}

This would print:

inside cap_fn0 | renamed_x: 60, y: 5.0, z: a
inside cap_fn1 | x: 60, renamed_y: 10.0, z: z

inside main fn | cap_fn0.renamed_x: 60, cap_fn0.y: 5.0, cap_fn0.z: a
inside main fn | cap_fn1.x: 60, cap_fn1.renamed_y: 10.0, cap_fn1.z: z

How the implementation actually accesses the information is implementation-defined, and the layout of the Capture Function object is not defined the specification, except to say it’s implementation-defined (§ 5 Wording).

NOTE: This leaves room for an implementation to, for example, use creative ways to retrieve objects and object references. Using a pointer to the current stack frame and then computing a raw offset to get to a specific bit of data, or using entirely registers, are all possible depending on how the captures are implemented. Such improvements and optimizations -- especially in the face of potential asynchronous calls and the need to protect against false sharing -- must be left up to Quality of Implementation.

As an example for releasing resources outside of the function call itself for the purposes of a function call that gets used more than once and isn’t passed a "We’re Done" signal, we can reuse the example from § 3.2.5 GNU Nested Functions By-Name Captures Cannot Be Worked Around Normally:

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

typedef int compare_fn_t(const void* left, const void* right);

auto make_compare(int argc, char* argv[]) {
  /* LOCAL, heap-allocated variable.... */
  int* in_reverse = malloc(sizeof(int));
  *in_reverse = 0;

  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        *in_reverse = 1;
      } 
    }
  }
	
  int compare(const void* untyped_left, const void* untyped_right) _Capture(in_reverse) {
    const int* left = untyped_left;
    const int* right = untyped_right;
    return (*in_reverse) ? *right - *left : *left - *right;
  }
  return compare;
}

int main(int argc, char* argv[]) {
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };  

  auto compare = make_compare(argc, argv);
  qsort_r(list, (sizeof(list)/sizeof(*list)), sizeof(*list), compare, &compare);
  // with data field captures, we can now `free` the
  // field `in_reverse` from the lambda
  free(compare.in_reverse);
	
  return list[0];
}

Thanks to the capture of in_reverse with the by-value _Capture(in_reverse) indication, the return of this function is safe. And, since we have access to the unique type that is generated (through the auto return type), we can access the pointer in_reverse normally and naturally. This isn’t possible with normal C++-style lambdas, as they haven’t decided to make this available (though our design for Lambdas in C will also include the named captures as accessible fields). It’s also not possible in the other solutions which rely on type-erasure as a first-class part of the design (Apple Blocks with the Blocks type, GNU Nested Functions only being accessible through a pointer or convertible to a wide function pointer in [n2661] or [n3564], Borland’s closure annotation or function literals). This is why making it possible to access the unique type first and foremost is of great benefit.

4.2.6. Capable of Recursion

Capture functions are able to refer to themselves for the purpose of recursion. This means that __self_func ([__self_func]), unlike for expression-based/unnamed Literal Functions/Lambdas/Block literals, is not required:

int main () {
  int tripling (int times, int start) {
    if (times >= 5) {
      return start;
    }
    return tripling(times + 1, start * 3); // normal recursion
  }
  return tripling(0, 1);
}

4.2.7. Not An Expression

The one true technical downside is that Capture Functions are declarations / definitions. They cannot be used (without the Statement Expression extension) in a function call’s argument list, which means that (short) closures and anonymous functions still need the full function definition. This is annoying and, honestly, one of the reasons § 4.3 Lambdas are preferred as a shorthand syntax.

It also means that, without Statement Expressions, Capture Functions cannot be used for the implementation of many macros which are typically expected to be usable as normal expressions.

4.2.8. Footgun: By-Name Capture Exceeds Captures’s Lifetime

A brief display of the undefined behavior:

auto ub (int parameter) {
  int automatic = 7;
  int fn() _Capture(parameter, &automatic) {
    return parameter + automatic;
  }
  return fn; // well-defined copy return
  // but dangling reference to `automatic`!
}

int main () {
  auto fn = ub(2);
  return fn(); // undefined behavior:
  // `automatic` no longer exists.
}

In general, undefined behavior occurs in the same way that it occurs within existing C code: use of an object after its lifetime has ended (in this case, an automatic storage duration object has gone out-of-scope). The fix for ub in this case is to capture automatic by-value. This makes it safe to copy that function object to the heap, or the stack. Additionally, no UB is possible by conversion to a function pointer.

4.2.9. Future Footgun: Wide Function Pointers

Wide function pointers, if and when they come to C, can make for footguns with capturing lambdas given that they will (likely) allow conversions from any Nested Function / Block / Lambda to them implicitly. Using a fictional wide function pointer syntax using %:

typedef int foo_fn_t(int);

foo_fn_t% call_me (int x) {
  return [x](int y) { return x + y; }; // converts to wide function pointer type!
  // undefined behavior if the return value is ever
  // called outside of this function 
}

int use_me(foo_fn_t% fn) {
  return fn(2);
}

int main () {
  int x = 30;
  return use_me(call_me(x));
}

This is a similar problem to Nested Functions returning a regular function pointer from a function call. Unfortunately, a conversion being allowed here is necessary to allow the 75%+ use case of passing it as a parameter, such as:

typedef int foo_fn_t(int);

void pass_to_me (foo_fn_t% func);

int main () {
  int x = 30;
  pass_to_me(
    [x](int y) { return x + y; }
  ); // converts to wide function pointer type!
  return 0; 
}

Thusly, in a future with a wide function pointer type, such a problem might be allowed. This is similar to the § 4.2.8 Footgun: By-Name Capture Exceeds Captures’s Lifetime. A special carveout in the specification for the return value case could be developed, but this would need work to avoid precluding useful cases.

4.3. Lambdas

Lambdas are simply a reskinned version of Capture Functions. They have all the same functionality, but with the benefits that they are:

We are deliberately leaving these as the only three benefits of lambdas over Capture Functions for the sole reason that, after Capture Functions, Lambdas will be VERY minimal effort to support. The reason for that is that they are, semantically, just a "Syntactic Reskin" of Capture Functions, save for their presence as an expression.

auto make_seven (int x) {
  int y = 7;
  return [x, y]() { return x * y; };
}

int main () {
  int x = 3;
  auto zero = [] () {
    // OK, no external variables used
    return 0;
  };
#if 0
  auto double_it = [] () {
    return x * 2; // constraint violation
  };
#endif
  auto triple_it = [x] () {
    return x * 3; // OK, x = 3 when called
  };
  auto quadruple_it = [&x] () {
    return x * 4; // OK, x = 5 when called
  };
  auto quintuple_it = [=] () {
    return x * 5; // OK, x = 3 when called
  };
  auto sextuple_it = [&] () {
    return x * 6; // OK, x = 5 when caled
  };
  x = 5;
  auto seven_tuple_it = make_seven(x);
  return zero() + triple_it() + quadruple_it()
    + quintuple_it() + sextuple_it() + seven_tuple_it();
  // return 109;
  // 0 + (3 * 3) + (5 * 4)
  // + (3 * 5) + (5 * 6)
  // + (5 * 7)
}

Given this, there is nothing else to write for this section: all of the benefits of Capture Functions (§ 4.2 Capture Functions: Rehydrated Nested Function) applies to these types in full, and just copying all of that text from one to another to say exactly the same thing is not important. We will instead just talk about the differences exclusively in comparison to Capture Functions in the next few sections.

4.3.1. Lambdas are Expressions

#include <stdlib.h>
#include <string.h>
#include <stddef.h>

static int in_reverse = 0;

int main(int argc, char* argv[]) {
  if (argc > 1) {
    char* r_loc = strchr(argv[1], 'r');
    if (r_loc != NULL) {
      ptrdiff_t r_from_start = (r_loc - argv[1]);
      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 
    }
  }
  int list[] = { 2, 11, 32, 49, 57, 20, 110, 203 };
  qsort(list, (sizeof(list)/sizeof(*list)), sizeof(*list),
    // expression, fits in-line
    [](const void* untyped_left, const void* untyped_right) {
      const int* left = (const int*)untyped_left;
      const int* right = (const int*)untyped_right;
      return (in_reverse) ? *right - *left : *left - *right;
    }
  );
	
  return list[0];
}

This also makes it suitable for use in macros, which is not something a regular Capture Functions can accomplish.

NOTE: This can be alleviated by using Statement Expressions, which would allow Capture Functions to work within typical macros contexts.

4.3.2. Recursion Is Impossible

Unfortunately, it is impossible to call a lambda from within itself (not without C++'s feature "deducing this", which requires templates and other things to work), and therefore that is another disadvantage. It can be fixed with the proposed __self_func feature ([__self_func]):

int main () {
  int tripling (int times, int start) {
    if (times >= 5) {
      return start;
    }
    return __self_func(times + 1, start * 3); // __self_func feature
  }
  return tripling(0, 1);
}

4.3.3. Trailing Return Types / Deduced Return Type

Finally, one may need to add the concept of a "trailing return type" to C in order to allow modifying the return type of a lambda. At the moment, the way a lambda with no specified return type works is that every single return statement must have exactly the same type (there is no negotiation for some "promoted" type or similar). That is, returning a long in one branch and an int in another branch is an error: they all must be cast to int or they all must be cast to long:

int main () {
  auto okay0 = []() {
    if (1) {
      return 0;
    }
    else {
      return 0;
    }
  }(); // ok
  auto violation0 = []() {
    if (1) {
      return 0U;
    }
    else {
      return 0L;
    }
  }(); // constraint violation: different return types
  auto okay1 = []() {
    if (1) {
      return (unsigned long long)0U;
    }
    else {
      return (unsigned long long)0L;
    }
  }(); // ok: cast to identical types
  return 0;
}

This can be extremely annoying to deal with. Trailing return types fix this problem by allowing lambdas to use a trailing -> type-name to have the function return type become type-name:

int main () {
  auto okay0 = []() {
    if (1) {
      return 0;
    }
    else {
      return 0;
    }
  }(); // ok
  auto violation0 = []() -> unsigned int {
    if (1) {
      return 0U;
    }
    else {
      return 0L;
    }
  }(); // now okay: fixed return type, conversions happen normally
  auto okay1 = []() {
    if (1) {
      return (unsigned long long)0U;
    }
    else {
      return (unsigned long long)0L;
    }
  }(); // ok: cast to identical types
  return 0;
}

This fixes other problems in the C language as well, such as not being able to specify functions with proper variable-length array returns without using ugly syntax. The auto part only applies for regular function definitions, and could also be applied to Capture Functions for ease-of-use (but is not required for it to function appropriately). One could also just have auto but no -> to have regular functions achieve the lambda behavior, where all return expressions must evaluate to the exact same type. Not having a return or having a return; both imply the return type is void, and thus any other kind of return <expr>; in that function would be illegal.

That is the full set of notable technical differences between Lambdas and Capture Functions.

4.4. Solution

This proposal is going to work to standardize both Capture Functions as a C extension-familiar way of working with data that is based on existing practice. It is also going to standardize lambdas for the technical differences between it and Capture Functions, in particular its ability to be used for macros (small but important) and its ability to be C++-compatible (unifying more header and in-line code).

A different proposal is going to work on the "Make Trampoline" aspect, to allow interoperation with old code. Another different proposal is going to work on providing a "wide function pointer" type. As used in the examples here, we hope to see % as a pointer-like modifier for a "wide function pointer" type, and if not that perhaps a _Closure(function-type) spelling to make it directly accessible by most.

5. Wording

THIS SECTION IS NOT GOING TO BE STARTED UNTIL THE DESIGN SHAKEDOWN IS COMPLETE.

6. Appendix

6.1. Accessing Context in Nested Functions

A newer paper by Dr. Martin Uecker discusses the various ways to access GNU Nested Functions and their potential future standardization ([n3654]). It addresses the executable stack / general-trampoline problem of GNU Nested Functions (by providing a wide function pointer type to get around it) before discussing various ways forward and various improvements around GNU Nested Functions, but takes a dissimilar approach to the one outlined in our proposal. We will go through the some of the sections in the paper and talk about how it differs from the approach this paper is going to take, and the criticisms it levies at the various aspects of other solutions such as Apple Blocks, Lambdas, GNU Nested Functions, and more.

6.1.1. §1 & §2

These are sections we agree with the most: the introduction of a wide function pointer type is necessary (§ 6.2 Wide Function Pointer Type), no matter which solution is picked. Wide Function Pointers are a unifying part to make C more of the appropriate "lingua franca" between languages. This proposal even agrees that naked, unadorned GNU Nested Functions can be introduced as part of C: however, the caveat would be that, insofar as the design in § 4.2 Capture Functions: Rehydrated Nested Function is concerned, it would produce a constraint violation to not appropriately capture any objects from the outside local scope that are used inside. Secondly, the use of it in [n3654]'s api_old would not be "implementation-defined", but rather a constraint violation that GNU (and other compilers) could turn into well-defined behavior. From §2:

typedef int (*cb_t)(int);
typedef int (*cb_wide_t)(int) _Wide;

void api_old_simple(cb_t cb);
void api_old(cb_t cb, void *data);
void api_new(cb_wide_t cb);

void example4()
{
  int d = 4;
  int bar(int x) {
    return x + d; // constraint violation: `d` not captured
  }
  int bar_fixed(int x) _Capture(&) {
    return x + d; // ok
  }
  api_old(bar, nullptr); // GNU extension, constraint violation in ISO C
  api_new(bar); // ok
}

NOTE: [n3654] does not seem to use its own API correctly, so the code above does not look identical to what is in [n3654]: e.g. api_old is called in [n3654] with just bar and nothing else, leaving off the second required parameter.

Our hope is to fix that with § 6.3 Make Trampoline and Singular Function Pointers:

typedef int (*cb_t)(int);
typedef int (*cb_wide_t)(int) _Wide;
// or: typedef int (%cb_wide_t)(int);

void api_old_simple(cb_t cb);
void api_old(cb_t cb, void *data);
void api_new(cb_wide_t cb);

void example4()
{
  int d = 4;
  int bar(int x) { // GNU Extension, Nested Functions
    return x + d; 
  }
  int bar_fixed(int x) _Capture(&) { // (Proposed) ISO C, Capture Function
    return x + d;
  }
  cb_t bar_fn_ptr = stdc_make_trampoline(bar); // Extension, but works
  cb_t bar_fixed_fn_ptr = stdc_make_trampoline(bar_fixed);
  // all ok now
  api_old_simple(bar_fn_ptr); 
  api_old(bar_fn_ptr, nullptr);
  api_new(bar);
  api_old_simple(bar_fixed_fn_ptr); 
  api_old(bar_fixed_fn_ptr, nullptr);
  api_new(bar_fixed);
  // trampolines must be freed
  stdc_destroy_trampoline(bar_fn_ptr);
  stdc_destroy_trampoline(bar_fixed_fn_ptr);
}

Individuals can rely on the GNU Nested Functions, but would have an explicit way to opt-in to get ISO Standard C behavior. We think this is a better path forward for harmonizing things, and would let the user be explicit about where and when trampolines (and their effects) are created/used.

6.1.2. §3

We agree with the premise of section 3, including of the way that any capture can be used with the "old" style of API, so long as it passes a userdata parameter:

typedef int (*cb_t)(int);
typedef int (*cb_wide_t)(int) _Wide;

void api_old(cb_t cb, void *data);

void example5()
{
  int d = 4;
  int bar(int x) { return x + d; }
  // static (capture-less) nested function
  static int trampoline(int x, void *ptr)
  {
    return (*(cb_wide_t)ptr)(x);
  }
  api_old(trampoline, &(cb_wide_t){ bar });
}

[n3654] then introduces a potential new keyword to capture what is, effectively, the current function frame and reuse it in the same place:

typedef int (*cb_t)(int);

void api_old(cb_t cb, void *data);

void example6()
{
  const int d = 4;
  // static chain passed via specified argument
  int bar(int x, void *data) _Closure(data)
  {
    return x + d;
  }
  api_old(bar, &_Closure(bar));
}

_Closure(bar) can effectively be seen as a signal to the implementation for the invocation of __builtin_frame_address, while _Closure(data) attached to the function definition is a directive to the compiler to use __builtin_call_with_static_chain. Semantically, the use of _Closure(data) on the definition ties the contents of the nested function to the surrounding scope from the perspective of whatever function definition _Closure is attached to. It’s a way to saying the surrounding scope is being provided by the void* argument (data in this case). This is mildly more type-safe than just a regular void* cast to a structure type inside. It is impossible to cast to the wrong type since it’s some (unnamed) type related to the current scope, and therefore the location provides the safety. It also offers a way to have two different closures use the same void* data, meaning that one could theoretically optimize a function taking 2 or three callbacks to have only one void* userdata-style parameter.

The problem with that is that assuming two or three callbacks all have the same environment or use the same userdata is, oftentimes, not a good idea. An example in the thrd_create_attrs_err proposed function (Thread Attributes); if the API were to assume that all three void* provided to this function can or should be the same, there could be many possible issues (thread of invocation does not match expectations, race conditions, having access to the wrong data, and more). So it’s unclear whether or not that would be good in general purpose, widely-adopted, or prolofic library interfaces.

NOTE: Folding together multiple nested functions to have similar closure data would certainly be useful for internal APIs where the caller of a specific API can make assumptions of how things work; but it does not hold up for external or uncontrollably-available APIs.

The final problem with this section is that it still assumes that the only kind of closure one would want is one that refers to variables in the current scope. This results in all the same problems documented in § 3.2.4 The Nature of Captures; undefined behavior, lifetime failures, and more. This could especially be the case for thrd_create_attrs_err, thread/worker pools, thread queues, and other asynchronous scheduling initiatives.

6.1.3. §4

This section introduces the concept of modifying how Nested Functions capture variables, recommending that some variables are captured by value inside of the data stored for a closure. The recommendation is that values that are const should be captured, while other mutable values are not. The examples do not seem to explain why capturing only const variables is helpful, as the primary reason to capture by value (particularly as Apple Blocks has explained (§ 3.3.2 Runtime Required)) is for safety in copying the closure to another location. The following example, using an old-style, void* based API for the purposes of copying, is given:

typedef int (*cb_t)(int);
void api_old_copy(cb_t cb, void *data, size_t data_size);

void example7()
{
  // const-qualified variables can be copied
  int (*const p)[10] = malloc(sizeof *p);
  if (!p) return;
  int bar(int x, void *data) _Closure(data)
  {
    return (*p)[x];
  }
  // sizeof can be used to obtain the required size
  api_old_copy(bar, &_Closure(bar), sizeof(_Closure(bar)));
}

We do not see how the const qualification helps in this scenario, and also note that this isn’t helpful for the vast majority of declarations and types that are non-const qualified. For example, if this pointer was not const qualified, a copy would have to be created solely for the purpose of capture:

typedef int (*cb_t)(int);
void api_old_copy(cb_t cb, void *data, size_t data_size);

void example7_modified()
{
  // non const-qualified variable is by-name
  int (*p)[10] = malloc(sizeof *p);
  if (!p) return;
  // change to `const` to enable capture
  int (const *cap_p)[10] = p;
  int bar(int x, void *data) _Closure(data)
  {
#if 0
    // dangerous -- may not exist
    return (*p)[x];
#else
    // not dangerous -- captured by value
    return (*cap_p)[x];
#endif
  }
  // sizeof can be used to obtain the required size
  api_old_copy(bar, &_Closure(bar), sizeof(_Closure(bar)));
}

One would need to form a copy of any mutable variable into a const one in order to ensure that it gets its whole value placed inside of whatever the implementation decides to place inside of _Closure. This is, in many ways, a by-proxy form of doing C++ Lambda captures or using __block in Apple blocks. At the very least with C++ Lambdas and Apple Blocks, their design is once again explicit; it allows the user to decide if something should be transported by-value, and then can be moved into a by-name state by the user. For 1980 direction, this requires duplicated variables, and given how _Closure works rather than the user stating their intent directly ("put this variable inside of this thing so I can carry it around in the manner of my chosing"), they have to instead contort their declarations to be const as a means of perhaps making safe access to these variables ("this variable is now const so it should be copied in, but anything else is implementation-defiend or something"). This is a roundabout way of just being clear about what is coming and going and what the properties of that thing are; we believe this to be infinitely less clear than erroring on something that is not captured and making the user specify explicitly.

Capture-by-const is not a useful or reliable scheme and would require users to contort their declarations for the sole purpose of making it work better with this new solution: we do not believe it to be a viable path forward.

6.1.4. §5 and §6

This is where the paper starts departing more strongly from what we believe to be the right direction. This section opens with a use of api_old_copy_del:

typedef int (*cb_t)(int);
void api_old_copy_del(cb_t cb, void *data, size_t size, void (*del)(void*));

void example8()
{
  int (*const p)[10] = malloc(sizeof *p);
  if (!p) return;
  int bar(int x, void *data) _Closure(data)
  {
    return (*p)[x];
  }
  // static nested functions acting as destructor
  static void del(void *_data)
  {
    // the structure type is visible at this point
    typeof(_Closure(bar)) *data = _data;
    free(data->p);
  }
  api_old_copy_del(bar, &_Closure(bar), sizeof(_Closure(bar)), del);
}

The problem is that there seems to be a limitation in how _Closure can be used; the assertion is that _Closure is meant to strongly mimic "call with static chain" and map entirely towards that. This is fine for that architecture, but it begs the question in this example: why is Closure(_data) in del not allowed like so?

typedef int (*cb_t)(int);
void api_old_copy_del(cb_t cb, void *data, size_t size, void (*del)(void*));

void example8_modified()
{
  int (*const p)[10] = malloc(sizeof *p);
  if (!p) return;
  int bar(int x, void *data) _Closure(data)
  {
    return (*p)[x];
  }
  // static nested functions acting as destructor
  static void del(void *_data) _Closure(data)
  {
    free(p);
  }
  api_old_copy_del(bar, &_Closure(bar), sizeof(_Closure(bar)), del);
}

One should be able to simply say that local variables can be looked up through whatever is given to _Closure. For example, if someone were to make a global void* variable and set it to the value, it would also be a viable way to saying "this is the function’s current frame / environment" without necessarily requiring that the function be explicitly used with a static chain:

typedef int (*cb_t)(int);
// `del` takes no void* now
void api_old_copy_del(cb_t cb, void *data, size_t size, void (*del)());

static void* my_env;
	
void example8_modified()
{
  int (*const p)[10] = malloc(sizeof *p);
  if (!p) return;
  int bar(int x, void *data) _Closure(data)
  {
    return (*p)[x];
  }
	
  static void del() _Closure(my_env)
  {
    free(p); // `p` is found because we have statically asserted
    // that the stack frame of `example8_modified`
    // comes from the variable `my_env`
  }

  my_env = &_Closure(bar); // get closure data pointer
  api_old_copy_del(bar, &_Closure(bar), sizeof(_Closure(bar)), del);
}

This is hinted at in §6 of the paper, but the chosen syntax and explanation uses a plain naked nested function that implicitly (?) knows the static chain without a void *data or void *_data argument:

typedef int (*cb_t)(int);
void api_old_copy_del(cb_t cb, void *data, size_t size, void (*del)());

void example9()
{
  int (*p)[10] = malloc(sizeof *p);
  if (!p) return;
  int bar(int x, void *data) _Closure(data)
  {
    return (*p)[x];
  }
  // nested function acting as destructor
  void del() { free(p); p = NULL; } // missing... `void*` and `_Closure`?
  // wrong function name, as well
  api_data_copy_del(bar, &_Closure(bar), sizeof(_Closure(bar)), del);
}

It’s completely unclear how del receives the environment for bar here: is it simply assumed that nested functions contained in the same scope all implicitly receive the environment? If so, how? And, importantly, how does an API compiled separately (e.g., as a DLL in a library) know to make the association between bar and del here? Is there something that needs to be done internally in api_data_copy_del (meant to be api_old_copy_del?) for this to happen?

Adjusting this to allow for a callback that takes a void*, AND making it static so that the environment can be shared while del is can be used as a normal function pointer with a shared environment would likely look more like this:

typedef int (*cb_t)(int);
// signature adjusted to allow for `void*` into `del` callback
void api_old_copy_del(cb_t, void *data, size_t size, void (*del)(void*));

void example9_modified()
{
  int (*p)[10] = malloc(sizeof *p);
  if (!p) return;
  int bar(int x, void *data) _Closure(data)
  {
    return (*p)[x];
  }
	
  static void del(void *data) _Closure(data)
  {
    free(p);
    p = NULL;
  }
  api_old_copy_del(bar,
    &_Closure(bar), // "environment" of `bar`
    sizeof(_Closure(bar)), // size for closure data to be copied in and survive
    del // `del` now appropriate is just a regular function pointer
  );
}

Whether the void* data is passed to the del callback or it comes from some other (_Thread_local or static) object, there’s some amount of potential for "can take a pointer and using the surrounding scope assert that it is some implementation-defined environment containing values for use". There’s nothing wrong with using the location of the nested function as a way of saying:

But [n3654] has a hard time communicating that effectively, if that is indeed what it is trying to communicate at all.

NOTE: We are assuming this is what it means. This is why many of the code samples taken from the paper have been changed with the addition of _modified in the example function’s name.

We do not critique much of the rest of the paper because it is simply building on top of this API, but using partially related orchestrations for Polymorphic Types. We are not interested in what polymorphic types will or will not do for this, and it is outside the scope of what we care about for this.

Appendix B: C++ Lambda Quiz

The final problem of this proposal is in the appendices. We will start with Appendix B, which has a quiz formulated using C++ Lambdas and asking "what will it print?":

#include <stdio.h>

int j = 3;

int main()
{
  int i = 3;
  auto foo = [=](){ printf("%d\n", i); };
  auto bar = [=](){ printf("%d\n", j); };
  i = j = 4;
  foo();
  bar();
}

The answer is "3" and then "4" (https://godbolt.org/z/KW4j1zG93). Before we talk about the answer, we are going to compare this answer to what the answer would be with GNU Nested Functions and Apple Blocks. To start, let’s try this quiz with GNU Nested Functions:

#include <stdio.h>

int j = 3;

int main()
{
  int i = 3;
  void foo(){ printf("%d\n", i); };
  void bar (){ printf("%d\n", j); };
  i = j = 4;
  foo();
  bar();
}

The answer here is "4" and then "4" (https://godbolt.org/z/voWG3Gjo3). The file-scope variable still gives the same answer (because of course it does): the change here is in the local variable. As explained in the above introduction to GNU Nested Functions (§ 3.2.4 The Nature of Captures), it captures by-name, so the value is updated. This makes sense to what the expectation is. Apple Blocks behave differently:

#include <stdio.h>

int j = 3;

int main()
{
  int i = 3;
  auto foo = ^(){ printf("%d\n", i); };
  auto bar = ^(){ printf("%d\n", j); };
  i = j = 4;
  foo();
  bar();
}

The answer is now back to "3" and then "4" (https://godbolt.org/z/a9c79cjYb). That is because, as explained above, the default for Apple Blocks is capturing by-value (§ 3.3.3 Captures).

The implication of this quiz is that Apple Blocks -- the thing that has worked for the entirety of the Apple ecosystem -- is wrong and unexpected, and the GNU Nested Function behavior is correct and expected. It wouldn’t be a "Quiz", after all, if the answer was anticipated to be entirely normal. In the rush to make a point about captures doing certain things, [n3654] effectively called the entirety of the Apple Blocks ecosystem fraudulent in its expectations. That’s certainly a choice that can be made, but a more important point that overshadows this is that C++ Lambdas can have the same behavior as GNU Nested Functions:

#include <stdio.h>

int j = 3;

int main()
{
  int i = 3;
  auto foo = [&](){ printf("%d\n", i); };
  auto bar = [&](){ printf("%d\n", j); };
  i = j = 4;
  foo();
  bar();
}

This changes the answer to "4" and then "4" (https://godbolt.org/z/EW8PETdxz). What this means is that this Quiz -- when properly displayed next to its counterparts -- shows that C++ Lambdas can be naturally configured to work like Apple Blocks OR like GNU Nested Functions, at the cost of one (1) character change in its capture clause. We can imagine that a person writing from the perspective of Apple Blocks could present the preceding C++ Lambda that doesn’t have the same behavior they expect to be a "Quiz" that contains a big "Gotcha". The reality is that C++'s design can handle both defaults without compromising the ergonomics in any serious manner. The syntax for lambdas is, of course, "sinfully ugly" -- even C++ enthusiasts acknowledge this readily -- but the acknowlegement that there are engineering tradeoffs to be had -- and not things to poke fun at or make "gotcha"s out of -- is why the design is mature and useful.

In contrast, [n3654] proposes capturing and copying based on things such as whether or not a variable is const, which does not approximate how it works in any existing practice.

Appendix A: List of Issues with C++ Lambdas

Appendix A is a laundry list of issues with C++ Lambdas, in no particular order. Some of them are already addressed in the introduction of Lambdas (§ 3.4 C++-Style Lambdas), but in-general the list of issues has many flaws in its reasoning.

Every single solution requires trampolines or a new type to be useful, GNU Nested Functions and Apple Blocks included. C++ did not have this problem because they have a much stronger base language that can do this as a library type: C needs a fundamental "wide function pointer" type no matter what (§ 6.2 Wide Function Pointer Type) and it needs trampoline-making functionality (§ 6.3 Make Trampoline and Singular Function Pointers).

This is discussed earlier in the introduction, but GNU Nested Functions have an identical problem. At the very least, Lambdas have a mechanism to stop this from being a problem: there is no built-in solution with GNU Nested Functions. Apple Blocks use an entire heap runtime and thus avoid this problem completely.

Everything in C is a byte copy, unless an explicit function is inserted to do something just before that byte copy happens. An example of this comes from Apple Blocks, with a required Block_release and Block_copy required to make usage of stored blocks safer (§ 3.3.2.1 More Complications: Generally Unsafe to Return). To uphold this as a problem is to lambast the entirety of C and its object model as unsafe; which, honestly, is not an unfair assessment. The fix to that is to restore access to the objects captured inside of an object, as shown in § 4.2.5 NEW: Data Captures are Accessible, which makes any capturing entity -- whether it’s a _Closure like in [n3654], Capture Functions as in this paper, or Lambdas -- accessible once more.

This criticism is partly untrue: complete objects in C can have their type retrieved with typeof. That means it can be cast/assigned into heap storage, copied around, and called just fine in certain contexts (c.f. the "static trampoline" technique mentioned both in the paper and used extensively in the example code above). We also already have the auto type-specifier: as regular complete objects, such types can be created and then stored in what already exists as a feature in C23. Macros are a foundationally important use case: it means that expressions being passed into function-like macros can be evaluated once, and only once, by being passed to an immediately-invoked lambda expression.

Storage outside of those contexts has to be powered by a wide function pointer type, and this proposal acknowledges that it will be necessary to solve that problem (but not in this proposal). GNU Nested Functions also need such a type, as do Apple Blocks (though they do come with their own Block type as well using the ^ syntax). This is also partly discussed in § 4.3.3 Trailing Return Types / Deduced Return Type. Actual returns were handled by [n2923] but only the auto for variable definitions was handled: functions was left for later, and there was initially consensus for something of this nature for the purposes of lambdas.

The complexity is necessary, as demonstrated by the Quiz example in Appendix B: C++ Lambda Quiz: the fact that captures can be changed and are not dependent on unrelated properties like const-ness (as in [n3654]) is how it successfully blends into e.g. the GNU ecosystem or the Apple ecosystem without breaking either of them. The complexity is inherent to the problem domain: glossing over it like GNU Nested Functions does limits whether or not this can successfully be deployed to replace Apple Blocks as an ISO C Standard solution.

It is unclear how captures in which the user makes an explicit choice can be more confusing than one where the user has no choice but the behavior changes. We have already established this in the Apple Ecosystem perspective with Blocks versus the GNU Nested Functions perspective in Appendix B: C++ Lambda Quiz; switching from one to another can result in bugs when people do not expect the default capturing style to change. Being explicit means nobody is surprised, and having renames prevents shadowing confusion: but it all has to be the user’s choice.

Trailing return types enhance what is capable, but are not strictly required (§ 4.3.3 Trailing Return Types / Deduced Return Type). Generic arguments are the one part that was opposed for inclusion in C23 and did not have consensus; it was, in fact, generic arguments that served as one of the primary reason Gustedt’s Lambdas were completely and utterly tanked. This proposal does not use it and the design below does not require it to be useful, especially as anything relating to an immediately-invoked lambda in a macro can be covered by the necessary typeof(...) from C23.

As a sole solution, certainly. But this proposal provides Capture Functions (§ 4.2 Capture Functions: Rehydrated Nested Function) as the flagship proposal for C, and maintains 1:1 identical capabilities with the proposed secondary Lambda part (§ 4.3 Lambdas). There are also other reasons as to why having lambdas is good (particularly, for macros and for use as an expression). But, the general improvement here is that one can have Capture Functions that are rooted in C history and C syntax and C needs, while maintaining just enough of Lambdas that serve as a compatibility layer.

The fact that this paper is already proposing a depature from C++ for C lambdas and Capture Functions by having accessible data captures already shows we have the power to improve on things in C’s favor, if we’re willing to hold onto that (§ 4.2.5 NEW: Data Captures are Accessible).

6.1.5. Insufficient

Given the state of [n3654], it seems like it has not sufficiently explored the consequences or implications of its proposed design, nor grounded it in sufficient existing practice for us to consider yielding to its principles. That does not mean all of the ideas are bad. In the above sections, after we repair some of the broken examples, there is clearly some potential in _Closure and the idea of an "environment" pointer. There is also perhaps merit in having a pointer that ties a specific function frame to a specific function call so that variable lookup that does not find a local variable can look in the "environment"/"_Closure" first before checking further surrounding variables (e.g., file-scope or static objects). But that is a separable problem -- and a lower-level problem -- that the tying of "function and its associated data".

A wide function pointer type (§ 6.2 Wide Function Pointer Type) would be a far better pursuit, separately, even if none of the solutions here or in other proposals are achieved.

The paper also seems to be driven, largely, by three things:

It offers a "we can do these small things first" and then presents a wider narrative around how to handle the "data" part of "Functions with Data". Having a standardized solution that is less powerful than all of C++ Lambdas, GNU Nested Functions, Apple Blocks, and Jens Gustedt’s proposed C Lambdas does not seem like a good or useful starting point. As much as WG14 as a Committee has many members that continue to extol the virtues of being slow, we believe that there has been significant existing practice and useful explanations of designs to move forward with something much more comprehensive and robust. Giving in to the temptation of "simplified GNU Nested Functions" with a somewhat incomplete and incoherent design plan based around the idea of _Closure after 30+ years of design work in this space from directly-related and applicable languages is not something we consider a good use of time.

We do not comment on the Polymorphic Types API because that is beyond what we consider the useful scope of what can or should be addressed in our current proposal.

6.2. Wide Function Pointer Type

[n2862], by Dr. Martin Uecker, is already looking into standardizing a wide function pointer type. A wide function pointer type is necessary in the general-purpose ecosystem, but isn’t directly required to be tied to this proposal. Because it is a smaller entity, it can be put directly into the standard separately. We hope it’s explored that rather than using _Closure(function-type) or function-type _Wide syntax, that function-type% is deployed as a usable syntax instead. This would simplify its use and its introduction:

typedef int foo_fn_t(int);

foo_fn_t% call_me (int* x) {
  return [x](int y) { return *x + y; };
}

int use_me(foo_fn_t% fn) {
  return fn(2);
}

int main () {
  int x = 30;
  return use_me(call_me(&x));
}

In the above example, foo_fn_t% can be replaced with _Closure(foo_fn_t) or foo_fn_t _Wide; we prefer the former than the two latter for obvious grammatical and ease-of-use reasons. Most importantly, there is a canonical and viably implementable conversion path for not only whatever is standardized in ISO C, but all of the existing extensions such as Blocks, Nested Functions, C++ Lambdas, and language-external closure types.

NOTE: The caret (^) cannot be used for this purpose thanks to Apple and Objective-C/Objective-C++ taking that design space.

NOTE: The percent sign (%) does not conflict with Managed C++/CLI ref declarations that use % because naked % can only be applied to "value types" -- that is struct types. There is no callback type that fits this description in the garbage-collected .NET imperative language universe (C# or Managed C++/CLI); all callback types are declared as either raw function pointer types or class-based, "reference type" delegates in Managed C++/CLI.

6.3. Make Trampoline and Singular Function Pointers

In the later examples in § 3.2.3 Alternative Nested Function Implementations, a magic compiler builtin named __gnu_make_trampoline, with a secondary follow-on builtin named __gnu_destroy_trampoline, is used. This section talks about what that would look like, if it was to be implemented. In particular, an ideal solution that makes a trampoline needs to be an explicit request from the user because:

While this section was spawned from GNU Nested Functions, this same technique can be used to make possible single function pointer trampolines for Blocks with or without captures (§ 3.3 Apple Blocks) as well as C++-style Lambdas (§ 3.4 C++-Style Lambdas).

Therefore, the best design to do this would be -- using the [_Any_func]* paper and its new type -- the following:

typedef void* allocate_function_t(size_t alignment, size_t size);
typedef void deallocate_function_t(void* p, size_t alignment, size_t size);

_Any_func* stdc_make_trampoline(FUNCTION-WITH-DATA-IDENTIFIER func);
_Any_func* stdc_make_trampoline_with(
  FUNCTION-WITH-DATA-IDENTIFIER func,
  allocation_function_t* alloc
);

void stdc_destroy_trampoline(_Any_func* func);
void stdc_destroy_trampoline_with(_Any_func* func, deallocate_function_t* dealloc);

stdc_make_trampoline(f) would use some implementation-defined memory (including something pre-allocated, such as in Apple blocks (§ 3.3.5 (Explicit) Trampolines: Page-based Non-Executable Implementation)). The recommended default would be that it just calls stdc_make_trampoline_with(f, aligned_alloc). stdc_destroy_trampoline(f) would undo, exactly, what stdc_make_trampoline would give. The recommended default would be that it is identical to stdc_destroy_trampoline_with(f, free_aligned_size). Providing an allocation and a deallocation function means that while the implementation controls what is done to the memory and how it gets set up, the user controls where that memory is surfaced from. This would prevent the problem of the Heap Alternative Nested Function implementation: rather than creating a special stack or having to rely on memory allocation functions, the compiler can instead source the memory from a user. This also makes such an allocation explicit, and means that its lifetime could be Though, given our memory primitives, a slightly better implementation that would allow the implementation to take care of (potentially) extra space handed down by alignment and what not would be:

struct allocation { void* data; size_t size; };
typedef allocation allocate_function_t (size_t alignment, size_t size);
typedef void deallocate_function_t (void* p, size_t alignment, size_t size);

_Any_func* stdc_make_trampoline(FUNCTION_TYPE func);
_Any_func* stdc_make_trampoline_with(FUNCTION_TYPE func, allocation_function_t* alloc);

void stdc_destroy_trampoline(_Any_func* func);
void stdc_destroy_trampoline_with(_Any_func*, deallocate_function_t* dealloc);

Regardless the form that the make/destroy functions take, this sort of intrinsic would be capable of lifting not just a typical GNU nested functions but all types of functions to be a single, independent function pointer with some kind of backing storage. Some desire may still exist to make the allocation and deallocation process automatic, but that should be left to compiler vendors to decide for ease-of-use tradeoffs versus e.g. security, like in § 3.2.2 Early Design Flaw: Nested Functions turn the stack Executable!.

It should be noted that Apple itself already has a version of this with this Objective-C Blocks Implementation ([objective-c-block-trampoline]), albeit with limitations discussed in § 3.3.5 (Explicit) Trampolines: Page-based Non-Executable Implementation. GCC does not expose an intrinsic for this per-se, but does provide __builtin_call_with_static_chain (GCC Documentation: Builtin Call with Static Chain). One can build a trampoline mechanism overtop of that, provided they had the properly-created function plus the right stack frame / "environment" chain pointer to go with the function callable. Since C++ Lambdas -- and the proposed Capture Functions and C-Style Lambdas here -- are by themselves Complete Objects, one can always create a "thunk" or "trampoline" for them manually, using a wide variety of allowable techniques from heap allocation to pre-stored arrays to _Thread_local/static data or otherwise. C++ could implement stdc_make_trampoline entirely as a library function, but C cannot; so, this is something vendors will have to figure out on their own.

The only part that needs to be user-configurable is the source of memory. Of course, if an implementation does not want to honor a user’s request, they can simply return a (Any_func*)nullptr; all the time. This would be hostile, of course, so a vendor would have to choose wisely about whether or not they should do this. The paper proposing this functionality would also need to discuss setting errno to an appropriate indicator after use of the intrinsic, if only to appropriately indicate what went wrong. For example, errno could be set to:

to indicate a problem. Albeit, there are always complaints about errno, so it may also be possible to take an int* p_errcode parameter in the make_trampoline functions, and use that as a means of solving the problem (or swap the return type and the error code parameter to return the error code and output into an _Any_func*). The API design possibilities are, really, endless.

6.4. Executable Stack CVEs

THIS SECTION IS INCOMPLETE.

The following CVEs are related to executable stack issues.

References

Informative References

[__SELF_FUNC]
JeanHeyd Meneide; Shepherd (Shepherd's Oasis, LLC). __self_func. February 11th, 2025. URL: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20__self_func.html
[_Any_func]
JeanHeyd Meneide; Shepherd (Shepherd's Oasis). _Any_func - A Universal Function Pointer Storage Type. July 6th, 2025. URL: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20_Any_func.html
[APPLE-BLOCKS]
Apple & Contributors. Documentation Archive: Declaring and Creating Blocks. May 3rd, 2025. URL: https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/Blocks/Articles/bxDeclaringCreating.html#//apple_ref/doc/uid/TP40007502-CH4-SW1
[BUILTIN_CALL_WITH_STATIC_CHAIN_GCC]
GNU Compiler Collection Contributors; Free Software Foundation. GCC Online Documentation: Constructing Calls. May 3rd, 2025. URL: https://gcc.gnu.org/onlinedocs/gcc/Constructing-Calls.html#index-_005f_005fbuiltin_005fcall_005fwith_005fstatic_005fchain
[CLANG-BLOCKS-SPEC]
The Clang Team; LLVM and Contributors; Apple. Clang + LLVM (Latest): Block Implementation Specification. July 8th, 2025. URL: https://clang.llvm.org/docs/Block-ABI-Apple.html
[GAMINGONLINUX-DAWE]
Liam Dawe. The glibc 2.41 update has been causing problems for Linux gaming. February 13th, 2025. URL: https://www.gamingonlinux.com/2025/02/the-glibc-2-41-update-has-been-causing-problems-for-linux-gaming/
[LAMBDAS-NESTED-FUNCTIONS-BLOCK-EXPRESSIONS-OH-MY]
JeanHeyd Meneide. Lambdas, Nested Functions, and Blocks, oh my!. July 16th, 2021. URL: https://thephd.dev/lambdas-nested-functions-block-expressions-oh-my
[N1229]
Nick Stoughton. Potential Extensions For Inclusion In a Revision of ISO/IEC 9899. March 26th, 2007. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1229.pdf
[N1370]
Blaine Garst; Apple, Inc.. n1370: Apple Extensions to C. March 10th, 2009. URL: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1370.pdf
[N1451]
Blaine Garst; Apple. n1451: Blocks Proposal. April 13th, 2010. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1451.pdf
[N1457]
Blaine Garst; Apple. n1457: Blocks. April 20th, 2010. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1457.pdf
[N2030]
Blaine Garst. n2030: A Closure for C. March 11th, 2016. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n2030.pdf
[N2661]
Martin Uecker. n2661: Nested Functions. February 13th, 2021. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n2661.pdf
[N2862]
Martin Uecker; Jens Gustedt. n2862: Wide Function Pointer Types for Pairing Code and Data. November 30th, 2021. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n2862.pdf
[N2892]
Jens Gustedt. Basic lambdas for C. December 25th, 2021. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n2892.pdf
[N2893]
Jens Gustedt. Options for Lambdas. December 25th, 2021. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n2893.htm
[N2923]
Jens Gustedt. Type inference for variable definitions and function returns. January 30th, 2022. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n2923.pdf
[N2924]
Jens Gustedt. Type-generic Lambdas. January 30th, 2022. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n2924.pdf
[N3564]
N. Gustafsson, D. Brewis, H. Sutter, S. Mithani. Resumable Functions. 15 March 2013. URL: https://wg21.link/n3564
[N3643]
Jakub Łukasiewicz. n3643: Statement Expressions (draft). July 10th, 2025. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3643.htm
[N3645]
Thiago R. Adams. n3645: Literal Functions. July 11th, 2025. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3645.pdf
[N3654]
Martin Uecker. n3654: Accessing the Context of Nested Functions. July 20th, 2025. URL: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n3654.pdf
[NESTED-FUNCTIONS]
GNU Compiler Collection Contributors. Nested Functions (Using the GNU Compiler Collection (GCC)). May 3rd, 2025. URL: https://gcc.gnu.org/onlinedocs/gcc/Nested-Functions.html
[OBJECTIVE-C-BLOCK-TRAMPOLINE]
Objective-C Development Team and Contributors; Apple. Objective-C Runtime / imp_implementationWithBlock. July 17th, 2025. URL: https://developer.apple.com/documentation/objectivec/imp_implementationwithblock(_:)?language=objc
[SOLAR-NON-EXECUTABLE-STACK-EXPLOITS]
solar FALSE COM (Solar Designer). Getting around non-executable stack (and fix). August 10th, 1997. URL: https://seclists.org/bugtraq/1997/Aug/63
[SWIFT-ESCAPES]
Swift Development Team and Contributors; Apple. The Swift Programming Language: Closures. July 6th, 2025. URL: https://docs.swift.org/swift-book/documentation/the-swift-programming-language/closures/#Escaping-Closures
[THREAD-ATTRIBUTES]
JeanHeyd Meneide; Shepherd (Shepherd's Oasis, LLC). Thread Attributes - Implementation Extensible and ABI-Resistant. July 6th, 2025. URL: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Thread%20Attributes%20-%20Implementation%20Extensible%20and%20ABI-Resistant.html
[TRANSPARENT-ALIASES]
JeanHeyd Meneide; Shepherd (Shepherd's Oasis, LLC). Transparent Aliases. February 20th, 2025. URL: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20Transparent%20Aliases.html
[WSL-no-executable-stack]
Microsoft; WSL Authors and Contributors; Martin Uecker. fis-gtm does not run due to missing support for executable stack. August 7th, 2018. URL: https://github.com/Microsoft/WSL/issues/286