Html there is no attribute charset

Is the «charset» attribute required with HTML5?

What does «required» mean? Obviously, a browser will still render HTML5 without the charset meta attribute. If no encoding is specified, which encoding will a browser use? Basically, I want to know if it is actually necessary to include , or if 99% of the time browsers will use the correct encoding anyway.

If anyone is interested, I also happened to come across a page that explains how excluding the encoding can result in an XSS vulnerability: openmya.hacker.jp/hasegawa/security/utf7cs.html

4 Answers 4

It is not necessary to include . As the specification says, the character set may also be specified by the server using the HTTP Content-Type header or by including a Unicode BOM at the beginning of the downloaded file.

Most web servers today will send back a character set in the Content-Type header for HTML text data if none is specified. If the web server doesn’t send back a character set with the Content-Type header and the file does not include a BOM and the page does not include a declaration, the browser will have a default encoding that is usually based on the language settings of the host computer. If this does not match the actual character encoding of the file, then some characters will be displayed improperly.

Will browsers use the proper encoding 99% of the time? If your page is UTF-8, probably. If not, probably not.

Читайте также:  Подсчет символов строки java

The W3C provides a document outlining the precendence rules for the three methods that says the order is HTTP header, BOM, followed by in-document specification (meta tag).

HTTP header, BOM, followed by meta tag. I’ll update the answer with a link I found from W3C answering this very question.

That’s really interesting. I would have thought that the purpose of the meta tag would be to override everything else. It seems like it would actually be rather difficult to have a situation where the meta tag would be necessary. Am I missing something?

@twiz, it is necessary to use a meta tag to declare encoding when the server sends a Content-Type header without charset parameter and you cannot affect this (and you are not using UTF-8). This is not an uncommon scenario. Moreover, the meta tag is relevant if a page is saved locally by a user. (When opened later, there will be no HTTP headers.)

@JukkaK.Korpela I don’t know a lot about encoding, so just wondering, what would be an example of a common scenario where the charset might be left out?

According to the Google PageSpeed browser extension, declaring a charset in a meta element «disables IE8’s lookahead feature» which apparently forces it to download everything in serial.

My understanding was that was required for valid HTML5, but that is why I started browsing here.

That draft of the spec seems pretty clear to me and since I add the HTTP header via .htaccess , I am going to start leaving it out. even though I’m tempted not to, just make IE8 users suffer a bit more.

@Jules Mazur do you have any references about those points? Most of what I do is SEO and accessibility is important to me and if that is the case I am more than receptive to leaving the the meta declaration.

It’s important to specify a character set of the document as earlier as possible (either through the Content-Type header or the META tag), otherwise the browser will be left to determine the encoding before parsing the document and this may negatively impact the page load time.

The short answer is NO, the charset tag is not required, but recommended.

Modern HTML5 browsers all assume you are using UTF-8 encoding by default (it is the HTML5 standard encoding) AND nearly all of UTF-8 encoding/decoding routines work perfectly with older browser schemes of characters — like Latin-1, ASCII-127, etc. — because they both store character code point numbers the same starting with one byte of memory. UTF-8 was designed to address backwards compatibility issues like this and that is why HTML5 defaults to UTF-8. Many HTTP servers also deliver the correct charset encoding for HTML5 pages, anyway, which is UTF-8. If you leave it off of your HTML web pages, you should only see issues when using exotic upper plain Unicode characters or languages where the pages or character byte code was encoded incorrectly and the browser loses access to the right code points to a few Unicode characters. But again, UTF-8 is always assumed with modern browsers and HTML5. And most delivered pages, past and present, are easily decoded into the memory of the user agent correctly using HTML5’s UTF-8.

Since 1998, when most of these W3C HTML and encoding specifications we use today came out, the standards bodies have pushed vendors (makers of servers and browsers and document applications) to follow encoding rules and use meta tags to help determine intent.

But due to greed, poor browser design, and other factors very few have followed the specifications consistently over the years. As a result, we have a fractured system. Some vendors, like Mozilla, have followed the standards since 2001 for meta tags while others, like Microsoft and Google, have not.

For that reason, if you want your web pages viewable in 99.9% of user agents still around, all web developers should use contingency design in how all their web pages are constructed, and use meta tags and other standard markup to support the right character encoding used in construction of the web page, despite inconsistent support for such tags. In other words, use both meta tag types. Why? The short «charset» meta tag version works well in modern HTML5 browsers, while the latter is needed in many versions of web browsers prior to 2010 that defaulted to older standards, like Latin-1 and ASCII, but started to support UTF-8 encoding after 2000. Example:

. though in reality such markup above will rarely decide how modern web pages are decoded or interpreted by web browsers, past and present.

What encoding is used by the browser when interpreting the page will often be based on the software used in creating the web page itself (as someone above mentioned) which increasingly is UTF-8, but often an ASCII text editor. This is a just a standard encoding scheme of Unicode that’s currently popular in creating HTML5 web sites. The user’s browser will then likely skip over meta tags and check the page to guess the encoding intent of the author.

You will also notice, in a typical HTML5 page, when you provide or tags to external files, you can control encoding/decoding suggestions using the tag attributes. But those are again, like the meta tag, just «hints» to the browser of what encoding to use and do not fully control what the browsers actually decides what encoding the files are really encoded in, or what the server headers tell the browser they are encoded in.

The main driver of encoding scheme used is the web server whose HTTP response header will often tell the browser the encoding type used, which again for HTML5 pages is always UTF-8. Because old ASCII (first 127 characters) used in older web pages is fully «decodable» from ASCII to UTF-8 in most cases everything using English characters, users in the West rarely have issues between new and old encoding web page technology. Because of all these fall back designs, using meta tags is often not needed at all today and completely ignored in modern web page parsing for the reasons outlined above.

JavaScript using UTF-16 is a different story.

ADDITIONAL OLD BROWSER HISTORY

Some more history of meta tags. in 2000 this whole meta tag debate was much worse than it is today. Use of HTML 4 with embedded Unicode characters often meant pages where neither encoded correctly or rendered correctly, despite server HTTP headers, use of character entities, and meta tags simply because modern browsers back then did not follow the standards and didn’t look at meta tags, page encoding, or encoded character entities. Even today, old web pages encoded in old Windows ANSI still cannot be decoded by UTF-8 or UTF-16. That is why to battle all the complex combinations of support and systems in failed standards adoptions, it’s best to use all combinations of optional HTML tag technology to increase the ‘likelihood’ of your web pages being rendered correctly.

We learned a valuable lesson back then: Web standards would never be consistently followed by companies. When standards are not adopted consistently by private industry it’s always best to use all forms and version of tagging, all the time, in every form possible way to maximize your pages are viewed correctly across many different devices using various forms of those standards, even if today they don’t matter (as browsers now parse pages and determine encoding themselves).

This why I say, yes, you should use the charset meta tags, even if ignored by many browsers today. It can only help with cross-browser issues and maximize the percentage chance of user agents created the past 20 years can read your valuable web content.

That should be the strategy used for all web page design until we somehow enforce universal adoption of web standards which is increasingly unlikely now with mobile user-agents and HTML5 which have forced us to abandon yet again many of the XML standards that would have enforced better markup design.

Источник

Does HTML5 specify a default character encoding for HTML documents if no character encoding is supplied?

With regards to HTML5, is a default, for example UTF-8, assumed as the character encoding? Or is it entirely up the application reading the HTML document to choose a default?

1 Answer 1

The charset is determined using these rules:

  1. User override.
  2. An HTTP «charset» parameter in a «Content-Type» field.
  3. A Byte Order Mark before any other data in the HTML document itself.
  4. A META declaration with a «charset» attribute.
  5. A META declaration with an «http-equiv» attribute set to «Content-Type» and a value set for «charset».
  6. Unspecified heuristic analysis.
  1. Normalize the given character encoding string according to the Charset Alias Matching rules defined in Unicode Technical Standard #22.
  2. Override some problematic encodings, i.e. intentionally treat some encodings as if they were different encodings. The most common override is treating US-ASCII and ISO-8859-1 as Windows-1252, but there are several other encoding overrides listed in this table. As the specification notes, «The requirement to treat certain encodings as other encodings according to the table above is a willful violation of the W3C Character Model specification.»

But the most important thing is:

You should always specify a character encoding on every HTML document, or bad things will happen. You can do it the hard way (HTTP Content-Type header), the easy way ( declaration), or the new way ( attribute), but please do it. The web thanks you.

Источник

The Content-Type HTTP header is missing charset attribute

enter image description here

During security check, its reported that «The Content-Type HTTP header is missing charset attribute» is missing for js and css file. Please check below screenshot: My HTML Was look like below before i have added the charset

enter image description here

Still Charset is not added in Response Header, Please check below screenshot: So what should i do so my Content-Type header changed to below:

2 Answers 2

Let me try to answer, you can do that by editing apache2.conf (for Debian like OS(es) ) or httpd.conf ( CentOS like OS(es) ) and add following lines:

#Set the correct Char set so don't need to set it per page. AddDefaultCharset utf-8 #for css, js, etc. AddCharset utf-8 .htm .html .js .css 

So the reason adding it to the html doesn’t work is that doesn’t tell the server anything, that only tells the browser latter on when it goes to parse the file after downloading and only if it decides to parse meta tags. (which the major ones do)

but if you want to send data as part of the http header and your using php to serve up the page/site/file then you cause the header() command, what you put in there will be appended on to the http header. you can use this command more then once to add in new bits. but you must put all calls to this function before anything that outputs data. as once php has to output something it’ll compile and send the header at that point along with the output and so no further changes to it can be made.

so try this at the top of the file

I think that’ll do what you want.

Источник

Оцените статью