Char types in cpp

Built-in types (C++)

Built-in types (also called fundamental types) are specified by the C++ language standard and are built into the compiler. Built-in types aren’t defined in any header file. Built-in types are divided into three main categories: integral, floating-point, and void. Integral types represent whole numbers. Floating-point types can specify values that may have fractional parts. Most built-in types are treated as distinct types by the compiler. However, some types are synonyms, or treated as equivalent types by the compiler.

Void type

The void type describes an empty set of values. No variable of type void can be specified. The void type is used primarily to declare functions that return no values or to declare generic pointers to untyped or arbitrarily typed data. Any expression can be explicitly converted or cast to type void . However, such expressions are restricted to the following uses:

  • An expression statement. (For more information, see Expressions.)
  • The left operand of the comma operator. (For more information, see Comma Operator.)
  • The second or third operand of the conditional operator ( ? : ). (For more information, see Expressions with the Conditional Operator.)

std::nullptr_t

The keyword nullptr is a null-pointer constant of type std::nullptr_t , which is convertible to any raw pointer type. For more information, see nullptr .

Boolean type

The bool type can have values true and false . The size of the bool type is implementation-specific. See Sizes of built-in types for Microsoft-specific implementation details.

Читайте также:  Javascript this value set

Character types

The char type is a character representation type that efficiently encodes members of the basic execution character set. The C++ compiler treats variables of type char , signed char , and unsigned char as having different types.

Microsoft-specific: Variables of type char are promoted to int as if from type signed char by default, unless the /J compilation option is used. In this case, they’re treated as type unsigned char and are promoted to int without sign extension.

A variable of type wchar_t is a wide-character or multibyte character type. Use the L prefix before a character or string literal to specify the wide-character type.

Microsoft-specific: By default, wchar_t is a native type, but you can use /Zc:wchar_t- to make wchar_t a typedef for unsigned short . The __wchar_t type is a Microsoft-specific synonym for the native wchar_t type.

The char8_t type is used for UTF-8 character representation. It has the same representation as unsigned char , but is treated as a distinct type by the compiler. The char8_t type is new in C++20. Microsoft-specific: use of char8_t requires the /std:c++20 compiler option or later (such as /std:c++latest ).

The char16_t type is used for UTF-16 character representation. It must be large enough to represent any UTF-16 code unit. It’s treated as a distinct type by the compiler.

The char32_t type is used for UTF-32 character representation. It must be large enough to represent any UTF-32 code unit. It’s treated as a distinct type by the compiler.

Floating-point types

Floating-point types use an IEEE-754 representation to provide an approximation of fractional values over a wide range of magnitudes. The following table lists the floating-point types in C++ and the comparative restrictions on floating-point type sizes. These restrictions are mandated by the C++ standard and are independent of the Microsoft implementation. The absolute size of built-in floating-point types isn’t specified in the standard.

Type Contents
float Type float is the smallest floating point type in C++.
double Type double is a floating point type that is larger than or equal to type float , but shorter than or equal to the size of type long double .
long double Type long double is a floating point type that is larger than or equal to type double .

Microsoft-specific: The representation of long double and double is identical. However, long double and double are treated as distinct types by the compiler. The Microsoft C++ compiler uses the 4- and 8-byte IEEE-754 floating-point representations. For more information, see IEEE floating-point representation.

Integer types

The int type is the default basic integer type. It can represent all of the whole numbers over an implementation-specific range.

A signed integer representation is one that can hold both positive and negative values. It’s used by default, or when the signed modifier keyword is present. The unsigned modifier keyword specifies an unsigned representation that can only hold non-negative values.

A size modifier specifies the width in bits of the integer representation used. The language supports short , long , and long long modifiers. A short type must be at least 16 bits wide. A long type must be at least 32 bits wide. A long long type must be at least 64 bits wide. The standard specifies a size relationship between the integral types:

An implementation must maintain both the minimum size requirements and the size relationship for each type. However, the actual sizes can and do vary between implementations. See Sizes of built-in types for Microsoft-specific implementation details.

The int keyword may be omitted when signed , unsigned , or size modifiers are specified. The modifiers and int type, if present, may appear in any order. For example, short unsigned and unsigned int short refer to the same type.

Integer type synonyms

The following groups of types are considered synonyms by the compiler:

Microsoft-specific integer types include the specific-width __int8 , __int16 , __int32 , and __int64 types. These types may use the signed and unsigned modifiers. The __int8 data type is synonymous with type char , __int16 is synonymous with type short , __int32 is synonymous with type int , and __int64 is synonymous with type long long .

Sizes of built-in types

Most built-in types have implementation-defined sizes. The following table lists the amount of storage required for built-in types in Microsoft C++. In particular, long is 4 bytes even on 64-bit operating systems.

Type Size
bool , char , char8_t , unsigned char , signed char , __int8 1 byte
char16_t , __int16 , short , unsigned short , wchar_t , __wchar_t 2 bytes
char32_t , float , __int32 , int , unsigned int , long , unsigned long 4 bytes
double , __int64 , long double , long long , unsigned long long 8 bytes

See Data type ranges for a summary of the range of values of each type.

For more information about type conversion, see Standard conversions.

Источник

char, wchar_t, char8_t, char16_t, char32_t

The types char , wchar_t , char8_t , char16_t , and char32_t are built-in types that represent alphanumeric characters, non-alphanumeric glyphs, and non-printing characters.

Syntax

char ch1< 'a' >; // or < u8'a' >wchar_t ch2< L'a' >; char16_t ch3< u'a' >; char32_t ch4< U'a' >; 

Remarks

The char type was the original character type in C and C++. The char type can be used to store characters from the ASCII character set or any of the ISO-8859 character sets, and individual bytes of multi-byte characters such as Shift-JIS or the UTF-8 encoding of the Unicode character set. In the Microsoft compiler, char is an 8-bit type. It’s a distinct type from both signed char and unsigned char . By default, variables of type char get promoted to int as if from type signed char unless the /J compiler option is used. Under /J , they’re treated as type unsigned char and get promoted to int without sign extension.

The type unsigned char is often used to represent a byte, which isn’t a built-in type in C++.

The wchar_t type is an implementation-defined wide character type. In the Microsoft compiler, it represents a 16-bit wide character used to store Unicode encoded as UTF-16LE, the native character type on Windows operating systems. The wide character versions of the Universal C Runtime (UCRT) library functions use wchar_t and its pointer and array types as parameters and return values, as do the wide character versions of the native Windows API.

The char8_t , char16_t , and char32_t types represent 8-bit, 16-bit, and 32-bit wide characters, respectively. ( char8_t is new in C++20 and requires the /std:c++20 or /std:c++latest compiler option.) Unicode encoded as UTF-8 can be stored in the char8_t type. Strings of char8_t and char type are referred to as narrow strings, even when used to encode Unicode or multi-byte characters. Unicode encoded as UTF-16 can be stored in the char16_t type, and Unicode encoded as UTF-32 can be stored in the char32_t type. Strings of these types and wchar_t are all referred to as wide strings, though the term often refers specifically to strings of wchar_t type.

In the C++ standard library, the basic_string type is specialized for both narrow and wide strings. Use std::string when the characters are of type char , std::u8string when the characters are of type char8_t , std::u16string when the characters are of type char16_t , std::u32string when the characters are of type char32_t , and std::wstring when the characters are of type wchar_t . Other types that represent text, including std::stringstream and std::cout have specializations for narrow and wide strings.

Feedback

Submit and view feedback for

Источник

char, wchar_t, char8_t, char16_t, char32_t

Типы char , wchar_t , char8_t , char16_t , и char32_t — это встроенные типы, представляющие буквенно-цифровые символы, не буквенно-цифровые глифы и непечатаемые символы.

Синтаксис

char ch1< 'a' >; // or < u8'a' >wchar_t ch2< L'a' >; char16_t ch3< u'a' >; char32_t ch4< U'a' >; 

Remarks

Тип char был исходным типом символа в C и C++. Тип char можно использовать для хранения символов из кодировки ASCII или любой из наборов символов ISO-8859, а также отдельных байтов многобайтовых символов, таких как SHIFT-JIS или кодировка UTF-8 в кодировке Юникод. В компиляторе char Майкрософт — это 8-разрядный тип. Это отличный тип как от , так signed char и unsigned char от . По умолчанию переменные типа char получают значение , int как будто из типа signed char , если /J не используется параметр компилятора. В /J они рассматриваются как тип unsigned char и получают повышение до int без расширения знака.

Тип unsigned char часто используется для представления байта, который не является встроенным типом в C++.

Тип wchar_t является определяемым реализацией расширенным типом символов. В компиляторе Майкрософт он представляет 16-разрядный символ, используемый для хранения Юникода в кодировке UTF-16LE, собственного типа символов в операционных системах Windows. В расширенных версиях символов функций библиотеки Универсальной среды выполнения C (UCRT) используются wchar_t и типы указателей и массивов в качестве параметров и возвращаемых значений, как и версии для расширенных символов собственного API Windows.

char8_t Типы , char16_t и char32_t представляют 8-разрядные, 16-разрядные и 32-разрядные символы соответственно. ( char8_t является новым в C++20 и требует параметра компилятора /std:c++20 или /std:c++latest .) Юникод, закодированный как UTF-8, может храниться в типе char8_t . char8_t Строки типа и char называются узкими строками, даже если они используются для кодирования символов Юникода или многобайтовых символов. Юникод, закодированный как UTF-16, может храниться в типе char16_t , а Юникод в кодировке UTF-32 — в типе char32_t . Строки этих типов и wchar_t называются широкими строками, хотя термин часто относится именно к строкам wchar_t типа .

В стандартной библиотеке C++ тип предназначен как для узких, basic_string так и для широких строк. Используется std::string , если символы относятся к типу char , std::u8string если символы относятся к типу char8_t , std::u16string если символы относятся к типу char16_t , std::u32string если символы относятся к типу char32_t , и std::wstring если символы имеют тип wchar_t . Другие типы, представляющие текст, включая std::stringstream и std::cout имеют специализации для узких и широких строк.

Источник

Оцените статью