Javascript json utf 8 encoding code example
Incorrect as well Solution 2: JSON uses unicode to be encoded, but it is specified that you can use escape codes to represent characters that don’t map into your computer native environment, so it’s perfectly valid to include such escape sequences and use only plain ascii encoding to transfer JSON serialized data. Your PHP file The content-type headers that you output Refer to: PHP json_encode json_decode UTF-8 EDIT : If you already have weird characters like in your JSON feed instead of the symbol, your problem has already been answered: https://stackoverflow.com/questions/1423846/convert-unicode-from-json-string-with-php Solution 1: If JSON is used to exchange data, it must use UTF-8 encoding (see RFC8259).
Parse JSON as UTF-8
Please make sure that every single step of your process is in UTF-8:
- The database connection
- The database tables
- Your PHP file
- The content-type headers that you output header(‘Content-Type: application/json; charset=utf-8’);
Refer to: PHP json_encode json_decode UTF-8
EDIT : If you already have weird characters like \u00a3 in your JSON feed instead of the £ symbol, your problem has already been answered: https://stackoverflow.com/questions/1423846/convert-unicode-from-json-string-with-php
Python — Decode UTF-8 encoding in JSON string, The JSON you are reading was written incorrectly and the Unicode strings decoded from it will have to be re-encoded with the wrong encoding used, then decoded with the correct encoding. Here’s an example: #!python3 import json # The bad JSON you have bad_json = r’
Standard way of Serializing utf-8 characters in a JSON String
If JSON is used to exchange data, it must use UTF-8 encoding (see RFC8259). UTF-16 and UTF-32 encodings are no longer allowed. So it is not necessary to escape the degree character. And I strongly recommend against escaping unnecessarily.
Of course, you must apply a proper UTF-8 encoding.
If JSON is used in a closed ecosystem, you can use other text encodings (though I would recommend against it unless you have a very good reason). If you need to escape the degree character in your non-UTF-8 encoding, the correct escaping sequence is \u00b0 .
Possible but not recommended
Your second approach is incorrect under all circumstances.
It is also incorrect to use something like «\xc2\xb0». This is the escaping used in C/C++ source code. It also used by debugger to display strings. In JSON, it always invalid.
JSON uses unicode to be encoded, but it is specified that you can use \uxxxx escape codes to represent characters that don’t map into your computer native environment, so it’s perfectly valid to include such escape sequences and use only plain ascii encoding to transfer JSON serialized data.
Python json encoding=»utf-8″ Code Example, data = < "name": "foo", "age": 27 >with open(«test.json», ‘w’, encoding=’utf8′) as outfile: json.dump(data, outfile, indent=2)
Encode Json Object Data with UTF-8 in java
Java string are UTF-16, you need to convert it to a byte array then to utf8 string.
import static java.nio.charset.StandardCharsets.*; byte[] bytes = "YOUR JSON".getBytes(ISO_8859_1); String jsonStr = new String(bytes, UTF_8);
Utf-8 encode javascript Code Example, //install using ‘npm install utf8’ const utf8 = require(‘utf8’); utf8.encode(string)
Decode utf8 entities from json into utf8 C++
First off, your use of std::wstring_convert is backwards. You have a UTF-8 encoded std::string that you want to convert to a wide Unicode string. You are getting the compiler error because to_bytes() does not take a std::string as input. It takes a std::wstring_convert::wide_string as input (which is std::u16string in your case, due to your use of char16_t in the specialization), so you need to use from_bytes() instead of to_bytes() :
std::string std = "\u0418\u043d\u0434\u0435\u043a\u0441"; std::wstring_convert, char16_t> convert; std::u16string dest = convert.from_bytes(std);
Now, that being said, Section 9 of the JSON specification states:
9 String
A string is a sequence of Unicode code points wrapped with quotation marks (U+0022). All characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark (U+0022), reverse solidus (U+005C), and the control characters U+0000 to U+001F. There are two-character escape sequence representations of some characters.
\» represents the quotation mark character (U+0022).
\\ represents the reverse solidus character (U+005C).
\/ represents the solidus character (U+002F).
\b represents the backspace character (U+0008).
\f represents the form feed character (U+000C).
\n represents the line feed character (U+000A).
\r represents the carriage return character (U+000D).
\t represents the character tabulation character (U+0009).
So, for example, a string containing only a single reverse solidus character may be represented as » \\ «.
Any code point may be represented as a hexadecimal number . The meaning of such a number is determined by ISO/IEC 10646. If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u , followed by four hexadecimal digits that encode the code point . Hexadecimal digits can be digits (U+0030 through U+0039) or the hexadecimal letters A through F in uppercase (U+0041 through U+0046) or lowercase (U+0061 through U+0066). So, for example, a string containing only a single reverse solidus character may be represented as » \u005C «.
The following four cases all produce the same result:
» \u002F «
» \u002f «
» \/ «
» / «
To escape a code point that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair . So for example, a string containing only the G clef character (U+1D11E) may be represented as » \uD834\uDD1E «.
The raw JSON data itself may be encoded in UTF-8 (the most common encoding), UTF-16, etc. But regardless of the encoding used, the character sequence «\u0418\u043d\u0434\u0435\u043a\u0441» represents the UTF-16 codeunit sequence U+0418 U+043d U+0434 U+0435 U+043a U+0441 , which is the Unicode character string «Индекс» .
If you use an actual JSON parser (such as JSON for Modern C++, jsoncpp, RapidJSON, etc), it will parse the UTF-16 codeunit values for you and return readable Unicode strings.
But, if you are processing the JSON data manually, then you will have to manually decode any \x and \uXXXX escape sequences. std::wstring_convert cannot do that for you. It can only convert the JSON from std::string to std::wstring / std. u16string , if that makes it easier for you to parse the data. However, you still have to parse the content of the JSON separately.
Afterwards, if so desired, you can use std::wstring_convert to convert any extracted std::wstring / std::u16string strings back to UTF-8 to save memory.
What you see are not entities but code points. You are defining characters via Unicode escape sequences and the compiler automatically converts them to UTF-8. A typical way to convert that to UTF-16 and vice versa is this:
static std::wstring_convert> converter; std::string ws2s(const std::wstring &wstr) < std::string narrow = converter.to_bytes(wstr); return narrow; >std::wstring s2ws(const std::string &str)
Of course you cannot convert the original string into another string of the same type (std::string) as it cannot hold such characters. This is why the UTF-16 code was converted to UTF-8 by your compiler in the first place.
How to properly encode UTF-8 for JavaScript and JSON?, Basically it works like this: If PHP connects to another server, then everything works. It also works through JavaScript (I have a custom sha1 () function there as well): var validationHash=sha1 (JSON.stringify ( <'name':'John Doe','city'=>‘New York’>)+key); My problem comes when the string contains UTF-8 …'name':'John>