Javascript html entities decode

Содержание

html-entities
level
mode
numeric
decode(text, options)
level
scope
decodeEntity(text, options)
level
Performance
License
Security contact information
html-entities for enterprise
Keywords
How to Decode HTML Entities Using JavaScript
Decoding HTML Entities
Decoding HTML Entities with the DOM Element
Decoding HTML Entities with the DOMParser.parseFromString() method
Conclusion
About the author
Shehroz Azam
What’s the right way to decode a string that has special HTML entities in it? [duplicate]
7 Answers 7

html-entities

Encodes text replacing HTML special characters ( <>&»‘ ) and/or other character ranges depending on mode option value.

import encode> from 'html-entities'; encode(' < >" \' & © ∆'); // -> '< > " ' & © ∆' encode('< ©', mode: 'nonAsciiPrintable'>); // -> '< ©' encode('< ©', mode: 'nonAsciiPrintable', level: 'xml'>); // -> '< ©' encode(' < >" \' & ©', mode: 'nonAsciiPrintableOnly', level: 'xml'>); // -> ' < >" \' & ©'

level

all alias to html5 (default).
html5 uses HTML5 named references.
html4 uses HTML4 named references.
xml uses XML named references.

mode

specialChars encodes only HTML special characters (default).
nonAscii encodes HTML special characters and everything outside the ASCII character range.
nonAsciiPrintable encodes HTML special characters and everything outiside of the ASCII printable characters.
nonAsciiPrintableOnly everything outiside of the ASCII printable characters keeping HTML special characters intact.
extensive encodes all non-printable characters, non-ASCII characters and all characters with named references.

numeric

decimal uses decimal numbers when encoding html entities. i.e. © (default).
hexadecimal uses hexadecimal numbers when encoding html entities. i.e. © .

decode(text, options)

Decodes text replacing entities to characters. Unknown entities are left as is.

import decode> from 'html-entities'; decode('< > " ' & © ∆'); // -> ' < >" \' & © ∆' decode('©', level: 'html5'>); // -> '©' decode('©', level: 'xml'>); // -> '©'

level

all alias to html5 (default).
html5 uses HTML5 named references.
html4 uses HTML4 named references.
xml uses XML named references.

scope

body emulates behavior of browser when parsing tag bodies: entities without semicolon are also replaced (default).
attribute emulates behavior of browser when parsing tag attributes: entities without semicolon are replaced when not followed by equality sign = .
strict ignores entities without semicolon.

decodeEntity(text, options)

Decodes a single HTML entity. Unknown entitiy is left as is.

import decodeEntity> from 'html-entities'; decodeEntity('<'); // -> ' decodeEntity('©', level: 'html5'>); // -> '©' decodeEntity('©', level: 'xml'>); // -> '©'

level

all alias to html5 (default).
html5 uses HTML5 named references.
html4 uses HTML4 named references.
xml uses XML named references.

Performance

Statistically significant comparison with other libraries using benchmark.js . Results by this library are marked with * . The source code of the benchmark is available at benchmark/benchmark.ts .

Common Initialization / Load speed * #1: html-entities x 2,632,942 ops/sec ±3.71% (72 runs sampled) #2: entities x 1,379,154 ops/sec ±5.87% (75 runs sampled) #3: he x 1,334,035 ops/sec ±3.14% (83 runs sampled) HTML5 Encode test * #1: html-entities.encode - html5, nonAscii x 415,806 ops/sec ±0.73% (85 runs sampled) * #2: html-entities.encode - html5, nonAsciiPrintable x 401,420 ops/sec ±0.35% (93 runs sampled) #3: entities.encodeNonAsciiHTML x 401,235 ops/sec ±0.41% (88 runs sampled) #4: entities.encodeHTML x 284,868 ops/sec ±0.45% (93 runs sampled) * #5: html-entities.encode - html5, extensive x 237,613 ops/sec ±0.42% (93 runs sampled) #6: he.encode x 91,459 ops/sec ±0.50% (84 runs sampled) Decode test #1: entities.decodeHTMLStrict x 614,920 ops/sec ±0.41% (89 runs sampled) #2: entities.decodeHTML x 577,698 ops/sec ±0.44% (90 runs sampled) * #3: html-entities.decode - html5, strict x 323,680 ops/sec ±0.39% (92 runs sampled) * #4: html-entities.decode - html5, body x 297,548 ops/sec ±0.45% (91 runs sampled) * #5: html-entities.decode - html5, attribute x 293,617 ops/sec ±0.37% (94 runs sampled) #6: he.decode x 145,383 ops/sec ±0.36% (94 runs sampled) HTML4 Encode test * #1: html-entities.encode - html4, nonAscii x 379,799 ops/sec ±0.29% (96 runs sampled) * #2: html-entities.encode - html4, nonAsciiPrintable x 350,003 ops/sec ±0.42% (92 runs sampled) * #3: html-entities.encode - html4, extensive x 169,759 ops/sec ±0.43% (90 runs sampled) Decode test * #1: html-entities.decode - html4, attribute x 291,048 ops/sec ±0.42% (92 runs sampled) * #2: html-entities.decode - html4, strict x 287,110 ops/sec ±0.56% (93 runs sampled) * #3: html-entities.decode - html4, body x 285,529 ops/sec ±0.57% (93 runs sampled) XML Encode test #1: entities.encodeXML x 418,561 ops/sec ±0.80% (90 runs sampled) * #2: html-entities.encode - xml, nonAsciiPrintable x 402,868 ops/sec ±0.30% (89 runs sampled) * #3: html-entities.encode - xml, nonAscii x 403,669 ops/sec ±7.87% (83 runs sampled) * #4: html-entities.encode - xml, extensive x 237,766 ops/sec ±0.45% (93 runs sampled) Decode test #1: entities.decodeXML x 888,700 ops/sec ±0.48% (93 runs sampled) * #2: html-entities.decode - xml, strict x 353,127 ops/sec ±0.40% (92 runs sampled) * #3: html-entities.decode - xml, body x 355,796 ops/sec ±1.58% (86 runs sampled) * #4: html-entities.decode - xml, attribute x 369,454 ops/sec ±8.74% (84 runs sampled) Escaping Escape test #1: entities.escapeUTF8 x 1,308,013 ops/sec ±0.37% (91 runs sampled) * #2: html-entities.encode - xml, specialChars x 1,258,760 ops/sec ±1.00% (93 runs sampled) #3: he.escape x 822,569 ops/sec ±0.24% (94 runs sampled) #4: entities.escape x 434,243 ops/sec ±0.34% (91 runs sampled)

License

Security contact information

To report a security vulnerability, please use the Tidelift security contact. Tidelift will coordinate the fix and disclosure.

html-entities for enterprise

Available as part of the Tidelift Subscription

The maintainers of html-entities and thousands of other packages are working with Tidelift to deliver commercial support and maintenance for the open source dependencies you use to build your applications. Save time, reduce risk, and improve code health, while paying the maintainers of the exact dependencies you use. Learn more.

Keywords

Источник

How to Decode HTML Entities Using JavaScript

HTML stores its reserved characters as character entities. Character entities are simple text strings that start with an & and end with a ;. HTML entities are necessary because if you’re trying to write HTML’s special characters like < or >as simple text then HTML should be able to somehow store them so that they are not interpreted as HTML code. HTML entities are necessary for proper viewing of rendering of text on webpages. Entities can also be used when trying to write characters which are generally not found on standard keyboards.

Decoding HTML Entities

HTML entities can be decoded by using several different methods involving vanilla JavaScript or JavaScript libraries. This guide will only go through vanilla JavaScript methods for decoding HTML entities as they are easy and straightforward.

Decoding HTML Entities with the DOM Element

The first method is by using the textarea element. As the name suggests, the textarea element is used to create a simple text area where each character is interpreted as simple plain text.:

let txt = document. createElement ( «textarea» ) ;

In the above code we first created the textarea element using the document.createElement() method. Then we wrote the string containing HTML entities inside the textarea using innerHTML property. This way the string will be converted to simple text and entities will be converted to characters. Lastly, we returned the string stored inside the txt variable which is the textarea.

Now if we call the decode function with an HTML entity as parameter it will return it as simple text:

let decodedStr = decode ( encodedStr ) ;

Decoding HTML Entities with the DOMParser.parseFromString() method

The second method is by using the DOMParser.parseFromString() method. The DOMParser.parseFromString() method takes a string containing HTML and returns it as an HTML element:

let txt = new DOMParser ( ) . parseFromString ( str, «text/html» ) ;

return txt. documentElement . textContent ;

In the above code we first passed the string as an argument to the DOMParser.parseFromString() method and got it back as an HTML element by specifying the second argument as “text/html”. We then returned the text content of the newly created HTML element.

Now calling the decode() function:

let decodedStr = decode ( encodedStr ) ;

Conclusion

HTML Entities are necessary for proper viewing of text on webpages. Some websites contain code snippets as simple text. Without Entities it would be difficult to differentiate between what is a HTML code for the webpage and what is just plain text.

About the author

Shehroz Azam

A Javascript Developer & Linux enthusiast with 4 years of industrial experience and proven know-how to combine creative and usability viewpoints resulting in world-class web applications. I have experience working with Vue, React & Node.js & currently working on article writing and video creation.

Источник

What’s the right way to decode a string that has special HTML entities in it? [duplicate]

I’m not sure why that apostraphe is encoded like that ( ' ); all I know is that I want to decode it. Here’s one approach using jQuery that popped into my head:

function decodeHtml(html) < return $('').html(html).text(); >

7 Answers 7

This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.

Entity: Bad attempt at XSS:

Entity: Bad attempt at XSS:

Ah, seems like basically the same approach I took but without the jQuery dependency (which is nice). Doesn’t it still seem hacky, though? Or should I be perfectly comfortable with it?

Oh wait, I get it: you’re using textarea specifically so that the tags are preserved (as you said) but HTML entities still get decoded. Pretty clever.

It’s acceptable. It’s the best way to decode HTML. No tags are passed, unlike your original solution, which parse (thus hide) tags.

Don’t use the DOM to do this if you care about legacy compatibility. Using the DOM to decode HTML entities (as suggested in the currently accepted answer) leads to differences in cross-browser results on non-modern browsers.

For a robust & deterministic solution that decodes character references according to the algorithm in the HTML Standard, use the he library. From its README:

he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would, has an extensive test suite, and — contrary to many other JavaScript solutions — he handles astral Unicode symbols just fine. An online demo is available.

he.decode("We're unable to complete your request at this time."); → "We're unable to complete your request at this time."

Disclaimer: I’m the author of the he library.

Источник