N3469: The Big Array Size Survey

1. Changelog

1.1. Revision 1 - February 3^rd, 2025

Added statistical analysis which strongly backs up countof as the overwhelming positive choice.

1.2. Revision 0 - January 17^th, 2025

Initial release. ✨

2. Introduction and Motivation

This is a short paper detailing the surface level information from the C Community Survey conducted through https://thephd.dev about the new _Lengthof feature, as described here.

3. Methodology

The survey was conducted through https://allcounted.com from November 6th, 2024, through December 20th, 2024. Periodic reminders were released to Social Media (Twitter, Mastodon, Bluesky, Telegram) after the inital release of the blog post (https://thephd.dev/the-big-array-size-survey-for-c) about the survey and filling it out. A total of 1,049 unique visitors responded, either partially or fully. The respondents come from all over the globe, though there were high concentrations in the United States and Europe. Some participants were from Russia, India, Japan, and China. There was significant South American participation, as well as Austrialian representation. There was a very small amount of African representation.

A map showing the geographic distribution of respondents to the survey. The transparent dots are most densely gathered in Western Europe and both coasts of the United States, with a smaller selection in Russia, India, China, and Brazil.

About 19 unique respondents only partially filed out the survey:

17 filled out the survey in full but simply refused to provide an e-mail address; their entries were counted as part of the 1,049 used to generate analysis and create graphics. As stated in the original survey, the e-mail address was simply for ensuring contact to follow up on incomplete or strange entries.
1 filled out the survey but missed one mandatory question. They were reached out to VIA e-mail but they never responded; the missed answer for one of the required questions was given a neutral score that would not affect the outcomes of the survey.
1 refused to answer literally any of the questions but left a comment that simply read "arraycount()". We assume they simply hated all of the options and had a single-track mind. They did not have an e-mail address to reach out and ask what they were thinking.

The data was processed from the AllCounted text format and cleaned into a CSV released to the public domain, with the last entry in a single list being a quote-escaped, newline-escaped version of any comment left in the text. No e-mails, IP addresses, or locational information was left in the publicly-released sources. This means it will be impossible to reproduce certain graphics showing city, country, and general location information. This is intentional.

A non-default seed was applied to a random number generator and used to skew each dot placement on the Respondent Geographic Distribution map such that all location dots in the generated Mercator Projection-style map cannot be used to reverse engineer the latitude/longitude coordinates and thereby uniquely identify any specific individual who filled out the survey.

No complex analysis (e.g. cross-referencing preference of spelling against provided skill level, or similar) was performed: this is an evaluation of raw vote counts. More sophisticated analysis of the data may be possible. Here are the skill level and usage experience distributions of the 1,049 respondents:

No trap questions or self-contradiction questions were imposed in order to cull potential illegitimate responses or self-inconsistent responses. The need was not present for this survey as this survey was a matter of preferences anyways.

The survey was originally protected with Captcha-like Slide-and-rotate-to-match technology. It was later disabled after complaints of getting past the security measure by legitimiate people were given. The first 240 responses (the last 240 in the CSV) were done with security present.

We did read all the comments. This paper does not comment on those comments. You can instead find that here - https://thephd.dev//the-big-array-size-survey-for-c-results.

4. Results

We present the data visually, but the CSV and the script to generate this data can be found at this Git Repository.

A stacked horizontal bar chart showing the Extreme Like, Strong Like, Mild Like, No Preference, Mild Dislike, Strong Dislike, and Extreme Dislike ratios for each of the presented options for the exposure and delivery mechanism for this array size operator.

The data clearly shows the following preferences.

Lowercase keywords without a macro are more preferential to a _Keyword with a macro, whereas a _Keyword with no macro is exceedingly despised.
countof and _Countof are more preferential than lengthof and _Lengthof, while extentof and _Extentof are exceedingly despised.
Specific spelling for both a keyword countof with no macro and _Countof with a macro beat out _Lengthof with a macro and a keyword lengthof. All other options had two to three times are many dislikes for all of their exact spelling forms.

Using weighted graphs with actual preference considerations and statistical analysis makes the direction and sentiment of the polled questions much stronger:

Based on these results, we propose one of the two following changes:

Perform a global find-and-replace for _Lengthof to become countof as a lowercase keyword; OR,
Perform a global find-and-replace for _Lengthof to become _Countof and add a new header <stdcountof.h> to Clause 7.

We expect that another vote given this new information may be beneficial for WG14.

Similarly, a paper should be written for a _Rankof/rankof operator. There was a significant amount of commentary and media chatter asking for this; it should be provided in a separate paper regardless of whether the changes in this paper go through or not.

5. Wording

The following wording is relative to the latest standard. If applicable, a vote should be taken by WG14 to choose the spelling for ${VOTE-CHOSEN TOKEN} and, if necessary, it’s header-macro version of ${LOWERCASE VOTE-CHOSEN TOKEN}.

5.1. Perform a global find-and-replace for `_Lengthof` to become `${VOTE-CHOSEN TOKEN}`

7.✨ ${VOTE-CHOSEN TOKEN} <std${LOWERCASE VOTE-CHOSEN TOKEN}.h>

¹ The <std${LOWERCASE VOTE-CHOSEN TOKEN}.h> header provides the following macro:
#define ${LOWERCASE VOTE-CHOSEN TOKEN} ${VOTE-CHOSEN TOKEN}

N3469
The Big Array Size Survey

Published Proposal, 2025-02-03

Abstract

1. Changelog

1.1. Revision 1 - February 3^rd, 2025

1.2. Revision 0 - January 17^th, 2025

2. Introduction and Motivation

3. Methodology

4. Results

5. Wording

5.1. Perform a global find-and-replace for `_Lengthof` to become `${VOTE-CHOSEN TOKEN}`

5.2. (If Applicable) Add a new header `<std${LOWERCASE VOTE-CHOSEN TOKEN}.h>`

N3469The Big Array Size Survey

Published Proposal, 2025-02-03

Abstract

1. Changelog

1.1. Revision 1 - February 3rd, 2025

1.2. Revision 0 - January 17th, 2025

2. Introduction and Motivation

3. Methodology

4. Results

5. Wording

5.1. Perform a global find-and-replace for _Lengthof to become ${VOTE-CHOSEN TOKEN}

5.2. (If Applicable) Add a new header <std${LOWERCASE VOTE-CHOSEN TOKEN}.h>

N3469
The Big Array Size Survey

1.1. Revision 1 - February 3^rd, 2025

1.2. Revision 0 - January 17^th, 2025

5.1. Perform a global find-and-replace for `_Lengthof` to become `${VOTE-CHOSEN TOKEN}`

5.2. (If Applicable) Add a new header `<std${LOWERCASE VOTE-CHOSEN TOKEN}.h>`