Script type text javascript encoding utf 8

Javascript to csv export encoding issue

I need to export javascript array to excel file and download it I’m doing it in this code. data is a javascript object array.

var csvContent = "data:text/csv;charset=utf-8,"; data.forEach(function(dataMember, index) < dataString = dataMember.join(","); csvContent += index < data.length ? dataString+ "\n" : dataString; >); var encodedUri = encodeURI(csvContent); var link = document.createElement("a"); link.setAttribute("href", encodedUri); link.setAttribute("download", "upload_data" + (new Date()).getTime() + ".csv"); link.click(); 

All this stuff works fine till I have string properties that have non-english characters, like spanish, arabic or hebrew. How can I make an export with all this non-ASCII values?

the first line states utf-8, this is ASCII. maybe if you changed it to 16 this would use unicode maybe?

@Boltosaurus, I created a demo here: jsfiddle.net/8qPur. It looks OK to me : the downloaded file has the special characters encoded correctly.

11 Answers 11

You should add the UTF-8 BOM at the start of the text, like:

var csvContent = "data:text/csv;charset=utf-8,%EF%BB%BF" + encodeURI(csvContent); 

It worked for me with Excel 2013.

This worked for me, using FileSaver.js. Instead of URL-encoding, I did this: var blob = new Blob([‘\ufeff’ + csvString], );

This will fail and not print in Csv if it gets special symbols for.eg. «!@#$%^&1234567890.pdf». I am not able to find the solution for this corner case.

You can add the BOM at first, use this code and try

var BOM = "\uFEFF"; var csvContent = BOM + csvContent; 

and then crate the file headers with the data: «text/csv;charset=utf-8»

This worked for me when converting my stuff to a blob and then using the anchor tag click hack to trigger download: var downloadLink = document.createElement(«a»); downloadLink.download = fileNameToSaveAs; downloadLink.href = window.URL.createObjectURL(textFileAsBlob); downloadLink.onclick = function (e) < document.body.removeChild(e.target); >; downloadLink.style.display = «none»; document.body.appendChild(downloadLink); downloadLink.click();

Excel is really bad at detecting encoding, especially Excel on OSX.

The best solution would be to encode your CSV in the default Excel encoding: windows-1252 (also called ANSI, which is basically a subset of ISO-8859-1).

I put a complete example of how to do that at: https://github.com/b4stien/js-csv-encoding.

The 2 main parts are stringencoding (to encode the content of your CSV in windows-1252) and FileSaver.js (to download the generated Blob).

var csvContent = 'éà; ça; 12\nà@€; çï; 13', textEncoder = new TextEncoder('windows-1252'); var csvContentEncoded = textEncoder.encode([csvContent]); var blob = new Blob([csvContentEncoded], ); saveAs(blob, 'some-data.csv'); 

Amazing, thanks! Went through reams of SO pages and docs looking for something to solve an issue with an excel destroying CSVs after opening and saving them back out on OSX. This was the only thing that worked.

It wont work in the current version that i had to download the version from this link. Thanks for saving it

I ran into a similar issue — InDesign’s DataMerge obstinately refused to show my special characters, regardless of whether I attempted UTF-8, UTF-16, UTF-16LE, tabs, commas, anything. Using the files in b4stien’s repo above, and adapting his example, it worked perfectly! Worth noting that in my case, I only needed to target Chrome on Windows.

Thank you, @b4stien I am looking to find out an encoding for the uploaded CSV file. The users might upload in different languages. How can I find out that? I tried many solutions but nothing seems to work for me. Any help please?

use iconv-lite library and encode your output to ascii before send it back to the user. Example:

var iconv = require('iconv-lite'); buf = iconv.encode(str, 'win1255'); // return buffer with windows-1255 encoding 

Write on the head of the file the BOM header of UTF-8 encoding. Example:

res.header('Content-type', 'text/csv; charset=utf-8'); res.header('Content-disposition', 'attachment; filename=excel.csv'); res.write(Buffer.from('EFBBBF', 'hex')); // BOM header // rest of your code 

Use base64 url format like data:text/csv;base64,77u/Zm9vLGJhcg0KYWFhLGJiYg== . This method will work on client-side also (IE10+, FF, Chrome, Opera, Safari).

window.location = "data:text/csv;base64,77u/" + btoa("foo,bar\r\naaa,bbb"); 

Hey, thanks for your response. Can you please give a full example of option 2? What exactly is .header() method? What exactly is res object?

Thanks @MosheSimantov almost 9 years later, you saved my day. The ,77u/ after the base64 solved my Python Databricks streaming export. This was even not mentioned on the BOM WikiPedia pages!

It is not necessary to use the encodeURIComponent method and glue the data string snippets. Just glue the BOM character in front of the string.

const data = 'öäüÖÄÜ'; const BOM = '\uFEFF'; const blob = new Blob([BOM + data], < type: 'text/csv;charset=utf-8' >); const url = window.URL.createObjectURL(blob); const linkElem = document.createElement('a'); linkElem.href = url; linkElem.click(); 

somehow found Tab-Separated-CSV with utf-16le encoding with BOM works on WIN/MAC Excel

followed b4stien’s answer but make a little difference to archive:

var csvContent = 'éà; ça; 12\nà@€; çï; 13', textEncoder = new TextEncoder('utf-16le'); var csvContentEncoded = textEncoder.encode([csvContent]); var bom = new Uint8Array([0xFF, 0xFE]); var out = new Uint8Array( bom.byteLength + csvContentEncoded.byteLength ); out.set( bom , 0 ); out.set( csvContentEncoded, bom.byteLength ); var blob = new Blob([out]); saveAs(blob, 'some-data.csv'); 

with Linux /usr/bin/file tests:

Little-endian UTF-16 Unicode text, with very long lines, with CRLF line terminators 

unfortunately this won’t work anymore => «Note: Prior to Firefox 48 and Chrome 53, an encoding type label was accepted as a paramer to the TextEncoder object, since then both browers have removed support for any encoder type other than utf-8, to match the spec. Any type label passed into the TextEncoder constructor will now be ignored and a utf-8 TextEncoder will be created.» developer.mozilla.org/en-US/docs/Web/API/TextEncoder

I’ve add success with const blob = new Blob([new Uint8Array(iconv_lite.encode(csvContent, «utf16-le», ))]); and then saveAs (from file-saver )

 data=`"red","मुकेश"` var processdata = "data:text/csv;charset=utf-8,%EF%BB%BF" + encodeURIComponent(data); 

Please, fell free to expand on your answer. Specifically, try to explain why it solves the question (better than the already massively up-voted answers above).

I’ve been able to solve my issue with the help of https://stackoverflow.com/a/27975629/5228251 answer

const json2csv = require('json2csv'); const csvExport = (req, res) => < var csvContent = json2csv(< data, fields >) res.setHeader('Content-Type', 'text/csv') // just prepend the '\ufeff' to your csv string value return res.status(200).send('\ufeff' + csvContent) > 

B4stien, thank you to you for your answer! After testing several solutions based on charset «utf8», encoding windows-1252 is the only solution that allowed me to keep my accent in Excel 365!

Manetsus, the b4stien’s answer and his link were very usefull for my case: i have to export french and german data into csv file: no solution based on «utf8» has worked. Only his solution which use an «ANSI» (window-1252) encoder.

I give his code sample, and you can download the depending encoding-indexes.js, encoding.js and FileSaver.js from the link.

         Click me to download a valid CSV !    

Nevertheless, as Excel is relatively open in the support of languages and formats, I do not exclude that UTF8 is not supported in my development environment because of the way it is installed .

Note: I test it with Firefox, Chrome and IE 11 on windows 7, with Excel 365.

Источник

International Language Support in JavaScript

JavaScript is built to support a wide variety of world languages and their characters – from the old US ASCII up to the rapidly spreading UTF-8. This page clears up some of the difficulties encountered when dealing with multiple languages and their related characters.

JavaScript and Character Sets

When working with non-European character sets («charsets»), you may need to make changes to the way your page references external JavaScript(.js) files. Ideally, your .js files should saved in the UTF-8 character set in order to maximize its multilingual features — though you can use a different charset that supports your language, at the potential expense of users who can’t support it. Once your files are saved as UTF-8, they must be «served» in the UTF-8 charset in order to display correctly. There are a few ways to ensure this:

Serve the Web Page as UTF-8

If your page is already served as UTF-8 (i.e. Content-type=text/html; charset=UTF-8), you don’t need to make any changes — all embedded files in an HTML document are served in the same charset as the document, unless explicitly specified not to by you. You can do this by:

  • Use the Content-type meta tag — place at the TOP of your page’s section. )
  • Edit your webserver configuration to serve all documents as UTF-8
  • Send the Content-type header via your server-side scripts (i.e. PHP, ASP, JSP)

Use the charset attribute of the tag

The easiest way to ensure your script is served as UTF-8 is to add a charset attribute (charset=»utf-8″) to your tags in the parent page:

Modify your .htaccess files (Apache Only)

You can also configure your webserver to serve all .js files in the UTF-8 charset, or only .js files in a single directory. You can do the latter (in Apache) by adding this line to the .htaccess file in the directory where your scripts are stored:

Источник

Атрибут charset

Атрибут / параметр charset (от англ. «charset» ‒ «кодировка») указывает кодировку внешнего (подгружаемого) сценария.

Условия использования

Данный атрибут указывается, только при наличии атрибута « src ».

Поддержка браузерами

Спецификация

Верс. Раздел
HTML
2.0
3.2 STYLE and SCRIPT
4.01 12.2 The A element
charset = charset [CI].
DTD: Transitional Strict Frameset
5.0 4.11.1 The script element
The charset attribute.
5.1 4.12.1. The script element
The charset attribute.
XHTML
1.0 4.8. Script and Style elements
DTD: Transitional Strict Frameset
1.1 Extensible HyperText Markup Language

Значения

В качестве значения указывается кодировка внешнего ресурса. Примеры кодировок:

ISO-8859-1 Кодировка используемая большинством западноевропейских языков. (Данная кодировка также известна как «Latin-1».) ISO-8859-5 Кодировка поддерживающая кириллицу. SHIFT_JIS EUC-JP Японская кодировка UTF-8 Одна из общепринятых и стандартизированных кодировок текста, поддерживающая множество различных письменностей. windows-1251 Кодировка с поддержкой кириллицы.

Регистр символов: не учитывается.

Источник

Читайте также:  Python all but first element
Оцените статью