1. Revision History
1.1. Revision 0 - November 26th, 2018
-
Initial release.
2. Motivation
This sort of feature has seen many previous iterations, albeit their motivations were only tangentially related. [p0424] and [p0732] all attempted to create a form of this for various different reasons, from better UDL handling to a proper string type with a fixed backing storage for non-type template parameters. Early forms of this proposal, such as [p0259], also focused on making string-like behaviors available for
programming, but were superseded by the previous proposal. Ongoing work has been dedicated to making
fully constexpr to allow for its usage in more complicated syntaxes.
It is clear that
and
are going to be and already are fully
, respectively. This still does not solve the inherent problem that has been run into recently, which is knowing that an argument provided is indeed a string literal. In particular, [p1040] was discussed in the San Diego 2018 ISO C++ Standards Meeting. The primary problem identified was with that of tooling: because a resource identifier could be computed at constexpr time and passed as the argument to
, then having a proper association between a
resource and the object file result is hard to communicate to the user or build system without performing full semantic analysis. Many suggestions were given, but it became apparently that all of the solutions relied on the crux of one common realization: the string for embedded a file can be computed at
time, which makes it impossible to extract resource identifier dependency information pre-Phase 7 of compilation.
Since, full semantic analysis must be run before the compiler can be sure of the value passed to
, it becomes impossible to list the file as an explicit dependency in the graphs exported by compiler options such as
, which only have to perform preprocessing and some basic amount of lexical analysis. It would be better if we could properly express simple dependencies that can participate in the build system without explicitly listing it by ensuring that the value passed to
is a string literal.
More broadly, the goal of this type is to solve 2 key use cases. The first is that some functions are increasingly interested in the source of some string data, especially when it comes to strings. Functions that take
are inherently losing information from the user: usages of string literals as arrays are lossy transformations that deprive interested parties in necessary source information. For example, is the array backed by compiler-created storage, or did the user create one themselves? Is it null-terminated, or not? We have absolutely no guarantees right now, and that is frustrating to most programmers who care.
Secondly, we care if our tools are able to know the value without having to perform full semantic analysis. A
or
do not buy us this guarantee: they can be fully composed at semantic analysis time when the
-evaluator runs, generating things that the compiler cannot track until during/after Phase 7 of compilation. This is too late for tools to know the value without significantly slowing down dependency graph generation and other useful compiler services to the build system.
We propose a type that can only be constructed with a non-empty value by the compiler for all the string literal types (
,
,
,
, and
). Objects of this type can be captured by application and library developers as the type
.
3. Design
This type can implicitly decay to a
lvalue reference to an array of
. It can be default-constructed and will represent an empty null-terminated (byte) string (NT(B)S), which will be a
. As a NTS, the size of the underlying array will is guaranteed to be 1 or more and
will always be valid and equal to
as it is today. The result of a
or similar string literal will always be a
:
#include <type_traits>#include <string_literal>int main () { auto x = "woof" ; const auto & x_ref = "purr" ; static_assert ( std :: is_same_v < std :: string_literal < 5 > , decltype ( x ) > ); static_assert ( std :: is_same_v < std :: basic_string_literal < char , 5 > , decltype ( x ) > ); static_assert ( std :: is_same_v < const std :: basic_string_literal < 5 >& , decltype ( x_ref ) > ); return 0 ; }
3.1. Safe to convert to NTS
This type avoids the problems of knowing whether a character array or a
is a real NTS. This means that rather than having to run
on the input functions can take the size and immediately know the string has a null terminator at the end. Running afoul of potentially embedded nulls or running off the end of an user-declared array that forgets to null-terminate its storage need not be a concern. APIs transition from accepting the blunt type which does not preserve any source information (e.g.
) to a type that preserves source information and gives up guarantees (e.g.
). We can also declare an API that properly handles string literals without decaying to pointers. This can also provide code size and performance benefits, as demonstrated in Jason Turner’s C++Now 2018 talk on initializer_list with various different string types and their interaction with types like
vs.
vs.
vs.
.
Another big problem with using purely
and/or
is the lack of a guarantee about what memory is being referred to by the time it is received in e.g. a function call. Is it truly read-only memory? Did the user initialize an array on the stack that is backing this type? There are so many questions and there is no way to retrieve the answer properly. With this type, we know for sure that the string literal is stored in constant memory that cannot be modified and is null-terminated.
3.2. Backwards Compatibility
It is imperative that this type does not break the assumptions that come from existing code. Because of the implicit conversion to an array on this type, operations such as indexing, arithmetic from the decay-to-pointer, and other properties of the original array type are preserved:
// used to be const char[5], // now is std::basic_string_literal<char, 5> // NOTE: CTAD (with p1021, approved in San Diego) // allows us to leave off char/N specifiers in the type name std :: string_literal x = "bark" ; // okay: conversion, decay const char * first = x ; // okay: conversion, decay, then addition const char * last = x + 5 ; // okay: conversion, then indexing operation char letter_a = x [ 1 ]; // okay: conversion, then indexing operation auto letter_a = x [ 1 ];
(To see how the CTAD might work in a pre-p1021 (C++17) world, see the stub example working on Coliru and Compiler Explorer)
The one place where this might provide a breaking change is for users who take the address of a string literal using
. Because the type has changed, this operation may not do what it is expected of code that may have had to use this operation. Currently, the wording synopsis for
below does not provide an
; chiefly, taking the address of a literal directly is an exceedingly rare use case, even for generic code. The author encourages individuals to voice any concerns they have while this potential breakage is considered. If this is deemed a significant use case for strings, then this paper will add the overloaded
to assuage those compatibility concerns.
There is also little concern about ABI. The
type is meant to be binary-compatible with a regular built-in array, the same way
is. Because the type never existed before, name mangling is only a problem for individuals who took built-in arrays as parameters or returned them as values that they expected to be strings with
. It seems incredibly unlikely that an interface which returns an array by reference through use of
exists and is in prevalent use with C++ compiled at ABI boundaries.
3.3. Conversion Rankings
One of the biggest problems is that the moment someone looks at an array even the tiniest bit funny, it converts down to a pointer to its first element. This has long caused issues of overload resolution (and more issues of people confusing pointers to represent arrays, though this proposal does not solve that unfortunate association). By making it so the type of all string literals are
, this proposal ensures that users can catch the string literal type before the resulting conversion to
and the subsequent conversions to
. Note that, unlike user-defined conversions, built-in conversions are allowed to happen an infinite number of times (as compared to user-defined conversions, of which there may only be one on the way to the final destination type). This means that converting to a built-in array allows the regular pointer conversions to happen naturally afterwards, while providing unambiguous overload resolution:
#include <string_literal>template < size_t N > void f ( const std :: string_literal < N >& lit ) { // 1 } template < size_t N > void f ( const char ( & arr )[ N ] ) { // 2 } void f ( const char * ptr ) { // 3 } int main () { const char arr [ 1 ]{}; const char * ptr = arr ; f ( "" ); // picks 1, unambiguously f ( arr ); // picks 2, unambiguously f ( ptr ); // picks 3, unambiguously return 0 ; }
4. Proposed Wording
Help for wording (especially core) would be appreciated! All wording is relative to [n4762]-ish (for example, this anticipates
changes being applied that were approved in San Diego).
4.1. Feature Test Macro
The desire feature test macro for the language change is
. The desired feature test macro for the library change is
.
4.2. Intent
The intent of this wording is to supply the following:
-
Create a new type
.std :: basic_string_literal < CharType , N > -
The type shall implicitly convert to an array of
const
.CharType -
The type shall work with ranged for loops.
-
The type will perform a shallow copy of the data, not fully copy the contents (it does not provide modification operations).
-
The type’s interface will be entirely read-only / "morally
".const -
The type shall be generated only by the core language (except for when default constructed).
-
-
String literals in C++ are now of this type, and they convert to
implicitly.const CharType ( & )[ N ] -
They can still be used to initialize character arrays: §9.3.2 (dcl.init.string) remains unchanged.
-
-
It shall not prohibit implementations from storing the data in constant memory as implementations have always done.
-
Supply a feature test macro that indicates the core language will generate such a type,
._cpp_impl_string_literals -
Supply a feature test macro for the library that provides the type itself,
._cpp_lib_string_literals
4.3. Proposed Core Wording
Modify §5.13.5 [lex.string], clauses 6, 7, 10, 11, and 12 to change the type:
6 After translation phase 6, a string-literal that does not begin with an encoding-prefix is an ordinary string literal. An ordinary string literal has type"array of n const char"(16.� [support.stringlit]) where
std :: string_literal < n > is the size of the string as defined below, has static storage duration (6.6.4), and is initialized with the given characters.
n
7 A string-literal that begins with, such as
u8 u8
, is a UTF-8 string literal, also referred to as a char8_t string literal. A char8_t string literal has type"asdf" "array of n const char8_t"(16.� [support.stringlit]) , where n is the size of the string as defined below; each successive element of the object representation (6.7) has the value of the corresponding code unit of the UTF-8 encoding of the string.
std :: u8string_literal < n >
10 A string-literal that begins with, such as
u u
, is a char16_t string literal. A char16_t string literal has type"asdf" “array of n const char16_t”(16.� [support.stringlit]) , where n is the size of the string as defined below; it is initialized with the given characters. A single c-char may produce more than one char16_t character in the form of surrogate pairs.
std :: u16string_literal < n >
11 A string-literal that begins with U, such asU
, is a char32_t string literal. A char32_t string literal has type"asdf" “array of n const char32_t”(16.� [support.stringlit]) , where n is the size of the string as defined below; it is initialized with the given characters.
std :: u32string_literal < n >
12 A string-literal that begins with, such as
L L
, is a wide string literal. A wide string literal has type"asdf" “array of n const wchar_t”(16.� [support.stringlit]) , where n is the size of the string as defined below; it is initialized with the given characters.
std :: wstring_literal < n >
Append to §14.8.1 Predefined macro names [cpp.predefined]'s Table 16 with one additional entry:
Macro name Value __cpp_impl_string_literals 201902L
4.4. Proposed Library Wording
Append to §16.3.1 General [support.limits.general]'s Table 35 one additional entry:
Macro name Value __cpp_lib_string_literals 201902L
Add an entry to §16.1 General [support.general] as follows:
Subclause Header(s) 16.� String Literals <string_literal>
Add a new section §16.� [support.stringlit]:
16.� String Literals [support.stringlit]The header
16.�.1 Headerdefines a class template and several support functions related to string literals (5.13.5 [lex.string]). All functions specified in this sub-clause are signal-safe (16.12.4 [support.signal]).
< string_literal > synopsis [stringlit.syn]
< string_literal >
namespace std { template < class CharType , std :: size_t N > class basic_string_literal { private : using storage_type = CharType [ N ]; // exposition-only public : using value_type = CharType ; using reference = const CharType & ; using const_reference = const CharType & ; using size_type = size_t ; using iterator = const CharType * ; using const_iterator = const CharType * ; // 16.�.2, String literal conversions constexpr operator const storage_type & () const ; // 16.�.3, String literal access constexpr const CharType * data () const noexcept ; constexpr const CharType * c_str () const noexcept ; constexpr size_type size () const noexcept ; constexpr iterator begin () const noexcept ; constexpr iterator end () const noexcept ; constexpr const_iterator cbegin () const noexcept ; constexpr const_iterator cend () const noexcept ; private : const storage_type * arr ; // exposition-only }; // 16.�.4, string literal range access template < classCharType , size_t N > constexpr const CharType * begin ( const basic_string_literal < CharType , N >& ) noexcept ; template < classCharType , size_t N > constexpr const CharType * end ( const basic_string_literal < CharType , N >& ) noexcept ; template < classCharType , size_t N > constexpr const CharType * cbegin ( const basic_string_literal < CharType , N >& ) noexcept ; template < classCharType , size_t N > constexpr const CharType * cend ( const basic_string_literal < CharType , N >& ) noexcept ; template basic_string_literal ( const CharType ( & )[ N ] ) -> basic_string_literal < CharType , N > template using string_literal = basic_string_literal ; template using wstring_literal = basic_string_literal ; template using u8string_literal = basic_string_literal ; template using u16string_literal = basic_string_literal ; template using u32string_literal = basic_string_literal ; } An object of typeprovides access to an array of objects of type
basic_string_literal < CharType , N > .
const CharType If an explicit specialization or partial specialization ofis declared, the program is ill-formed.
basic_string_literal
16.�.2 String literal conversions [stringlit.conv]
operator storage_type & () const noexcept ; Effects: returns.
* arr
16.�.4 String literal access [stringlit.access]
constexpr const CharType * data () const noexcept ; Effects: returns.
* arr
constexpr const CharType * c_str () const noexcept ; Effects: returns.
* arr
constexpr size_type size () const noexcept ; Effects: returns.
N - 1
constexpr const CharType * begin () const noexcept ; Effects: returns.
* arr
constexpr const CharType * end () const noexcept ; Effects: returns.
begin () + size ()
constexpr const CharType * cbegin () const noexcept ; Effects: returns.
* arr
constexpr const CharType * cend () const noexcept ; Effects: returns.
cbegin () + size ()
16.�.5 String literal range access [stringlit.range]
template < CharType , size_t N > constexpr const CharType * begin ( const basic_string_literal < CharType , N >& lit ) noexcept ; Effects: returns.
lit . begin ()
template < CharType , size_t N > constexpr const CharType * end ( const basic_string_literal < CharType , N >& ) noexcept ; Effects: returns.
lit . end ()
template < CharType , size_t N > constexpr const CharType * cbegin ( const basic_string_literal < CharType , N >& ) noexcept ; Effects: returns.
lit . cbegin ()
template < CharType , size_t N > constexpr const CharType * cend ( const basic_string_literal < CharType , N >& ) noexcept ; Effects: returns.
lit . cend ()
4.4.1. Non-Compatible Wording Changes
The below set of changes might change how strings behave due to developers inserting premature null terminators in strings and having the constructor for
behaving differently than anticipated (it takes the whole string).
These should be carefully considered in the first conversation.
Modify §20.3.2 [basic.string] to add the following constructor:
template < size_t N > basic_string ( const basic_string_literal < charT , N >& );
Modify §20.3.2.3 [string.cons] to add the following constructor:
template < size_t N > basic_string ( const basic_string_literal < charT , N >& lit ); Effects: behaves the same as if invoking:.
basic_string ( lit . data (), lit . size ())
Modify §20.4.2 [string.view.template] to add the following constructor:
template < size_t N > basic_string ( const basic_string_literal < charT , N >& );
Modify §20.4.2.1 [string.view.cons] to add the following constructor:
template < size_t N > basic_string_view ( const basic_string_literal < charT , N >& lit ); Effects: behaves the same as if invoking:.
basic_string_view ( lit . data (), lit . size ())
5. Acknowledgements
Thanks to Colby Pike (vector-of-bool) for helping to incubate and brew this idea. Thanks to Jason Turner for elaborating in quite a bit of detail the pitfalls of string initialization and the need for a string literal type.