- How to convert HTML to DOCX in the browser
- HTML to DOCX conversion – document building or altChunks?
- Client side conversion
- html-docx-js
- html-to-docx
- Summary
- HTML to DOCX Converter
- convert to
- compress
- capture website as
- create archive
- extract
- Options
- HTML
- DOCX
- +200 Formats Supported
- Data Security
- High-Quality Conversions
- Powerful API
- HTML to DOCX converter
- How to convert a HTML to a DOCX file?
- Converter
- Convert to HTML
- Convert from HTML
- File Format
- HTML (Hypertext Markup Language with a client-side image map)
- DOCX (Microsoft Word Open XML Document)
- Конвертировать HTML в DOCX (WORD) / URL в DOCX (WORD) онлайн
- Продвинутый онлайн-сервис конвертации html файлов в DOCX. Для mac & windows
- Язык гипертекстовой разметки
- Microsoft Office Open XML
How to convert HTML to DOCX in the browser
DOCX is a file format commonly associated with Microsoft Word. However, not everyone might be aware that this is a standardized open format, not limited by any license. Implemented according to the Office Open XML (OOXML) specification, DOCX shares the similar structure with presentation files (PPTX) and spreadsheets (XSLX). Interestingly, a file compatible with the OOXML format is actually an archive of related XML files. We can easily verify this by unpacking any DOCX file:
t3rmian@wasp:~/$ unzip "test.docx" Archive: test.docx creating: word/ creating: word/media/ extracting: word/media/image-WngyOTnaQ.png extracting: word/media/image-Xom8iU2nqh.png extracting: word/media/image-2MiVrdV3Lg.png extracting: word/media/image-u5K-49bkCE.png extracting: word/media/image-AM5Ve0JASj.png extracting: word/media/image-L85HC3HelY.png extracting: word/media/image-TGo0ZXsleV.png extracting: word/media/image-YOBg89XJk0.png creating: _rels/ extracting: _rels/.rels creating: docProps/ extracting: docProps/core.xml creating: word/theme/ extracting: word/theme/theme1.xml extracting: word/document.xml extracting: word/fontTable.xml extracting: word/styles.xml extracting: word/numbering.xml extracting: word/settings.xml extracting: word/webSettings.xml creating: word/_rels/ extracting: word/_rels/document.xml.rels extracting: [Content_Types].xml
After unpacking, in the word folder we will find XML files, among others responsible for styles (styles.xml), document content (document.xml) with references (_rels/document.xml.rels) to various resources (media/*), e.g. images.
HTML to DOCX conversion – document building or altChunks?
There are actually two approaches to converting an HTML document to DOCX. We can build such a document by converting individual HTML tags and styles to their equivalents in DOCX format or use altChunk feature.
The first approach is understandable, but what is altChunk? The altchunk element is simply a pointer to a file whose contents will be processed and imported into the document by the application (e.g. Microsoft Word) that supports the indicated format. This option doesn’t give much control over the resulting document.
Among the most popular applications that are able to display the DOCX format, only Microsoft Word will correctly display a document built using altChunk. In the LibreOffice Writer, Apache OpenOffice Writer, and Google Docs we will see a blank document. Note this when choosing or implementing a conversion from HTML to OOXML.
Client side conversion
When it comes to web applications, the undoubted advantage of feature feasibility is the possibility of implementation on the client’s (browser) side. This method reduces server-side processing and delegates the work to the client, making the application more scalable and closer to a distributed system. Converting an HTML file to DOCX, despite being familiar with the structure of the OOXML format, is not an easy task.
Among the available solutions, however, we have a choice of two libraries written in JavaScript that implement this complicated process. A solution based on altChunk feature can be found in a slightly older html-docx-js project. On the other hand, tag and style conversion is used in a more recent html-to-docx library.
html-docx-js
Using the html-docx-js library is really simple. All we need to do is add this script to our website. If you are using the npm package manager, you can find the library under the same name and install it with the npm i html-docx-js command. It is worth mentioning that html-docx-js will also work on the server-side. But let’s see how we can use it the browser:
p Hello HTML
Download
After the page is loaded, it will be converted to DOCX format and saved under the download href blob link. The unpacked DOCX archive will contain a folder with, among others, the word/document.xml file:
The actual content of the document can be found under the reference to the word/afchunk.mht:
MIME-Version: 1.0 Content-Type: multipart/related; type="text/html"; boundary="----=mhtDocumentPart" ------=mhtDocumentPart Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/fake/document.html p Hello HTML
Download ------=mhtDocumentPart--
Moreover, in case of the images, it will be necessary to first convert them to the base64 form (example).
html-to-docx
Converting HTML by building a document is a complicated process that the html-to-docx library does best. In the latest version 1.2.2, we will also find some support for generating documents in the browser. Using npm, install the module with the npm i html-to-docx command. Here comes the harder step, importing the library on the website is not so straightforward as in the previous case.
In the ./node_modules/html-to-docx/dist/ folder we find two html-to-docx.[esm|umd].js files, which we can load on the website. The source of the problem turns out to be, the dependency on other CJS type modules. This type requires transpilation before loading it into the browser, and unfortunately, these modules are not bundled with the library. Often it is not a problem. If we already use some kind of a bundler, loading the library usually does not require additional steps.
To familiarize ourselves with this topic, let’s see how to build a browser script from scratch. One of the quickest solutions here is to install the webpack bundler: npm i webpack webpack-cli —save-dev . Its latest version does not require any additional configuration. Then in the src/index.js file add the code referencing the installed library:
import HTMLtoDOCX from "html-to-docx/dist/html-to-docx.umd" const link = document.getElementById("download") HTMLtoDOCX(document.documentElement.outerHTML) .then(blob => < link.href = URL.createObjectURL(blob) >)
Next add the polyfills for necessary features that are not implemented in browsers: npm i util url buffer . . The configuration for webpack.config.js and the number of polyfills can differ depending on the dependency/Webpack version (see the demo for Webpack 5). Finally, build the code bundle with the npx webpack command verifying any missing dependencies. The HTML file will look like this:
Hello HTML
Download
Meanwhile, the word/document.xml, after running the conversion in the browser, will contain specific word processing tags and styles:
Hello HTML Download
Similar to html-docx-js the images will have to be converted to base64 format. Additionally, in the current version, you will have to keep them out of some specific tags, otherwise, they might not be displayed.
Summary
The JavaScript html-docx-js and html-to-docx libraries allow you to convert HTML documents to DOCX in two different ways. The DOCX format itself is not so complicated and you can viev or create your own document in the form of an XML files archive. When used in production, it is worth remembering that you will not always get the same result in every application that displays the OOXML format due to implementation differences (e.g. images anchoring in LibreOffice and Microsoft Word).
Do not forget to convert relevant images to the base64 format. In case of problems with referencing external resources e.g. for complex SVG references you can consider the canvg library. For other maybe unsupported elements, quite an interesting approach is to try to render as an image using html2canvas. Do also consider contributing to the above-mentioned projects in case you find a fix to any of the encountered problems.
2023/04/18: Added a source link to a minimal working example.
HTML to DOCX Converter
CloudConvert is an online document converter. Amongst many others, we support PDF, DOCX, PPTX, XLSX. Thanks to our advanced conversion technology the quality of the output will be as good as if the file was saved through the latest Microsoft Office 2021 suite.
convert to
compress
capture website as
create archive
extract
Options
HTML
HTML is a markup language that is used to create web pages. Web browsers can parse the HTML file. This file format use tags (e.g ) to build web contents. It can embed texts, image, heading, tables etc using the tags. Other markup languages like PHP, CSS etc can be used with html tags.
DOCX
DOCX is an XML based word processing file developed by Microsoft. DOCX files are different than DOC files as DOCX files store data in separate compressed files and folders. Earlier versions of Microsoft Office (earlier than Office 2007) do not support DOCX files because DOCX is XML based where the earlier versions save DOC file as a single binary file.
+200 Formats Supported
CloudConvert is your universal app for file conversions. We support nearly all audio, video, document, ebook, archive, image, spreadsheet, and presentation formats. Plus, you can use our online tool without downloading any software.
Data Security
CloudConvert has been trusted by our users and customers since its founding in 2012. No one except you will ever have access to your files. We earn money by selling access to our API, not by selling your data. Read more about that in our Privacy Policy.
High-Quality Conversions
Besides using open source software under the hood, we’ve partnered with various software vendors to provide the best possible results. Most conversion types can be adjusted to your needs such as setting the quality and many other options.
Powerful API
Our API allows custom integrations with your app. You pay only for what you actually use, and there are huge discounts for high-volume customers. We provide a lot of handy features such as full Amazon S3 integration. Check out the CloudConvert API.
HTML to DOCX converter
This online document converter allows you to convert your files from HTML to DOCX in high quality.
We support a lot of different file formats like PDF, DOCX, PPTX, XLSX and many more. By using the online-convert.com conversion technology, you will get very accurate conversion results.
How to convert a HTML to a DOCX file?
- Choose the HTML file you want to convert
- Change quality or size (optional)
- Click on «Start conversion» to convert your file from HTML to DOCX
- Download your DOCX file
To convert in the opposite direction, click here to convert from DOCX to HTML:
Not convinced? Click on the following link to convert our demo file from HTML to DOCX:
Rate this tool 3.1 / 5
You need to convert and download at least 1 file to provide feedback
Converter
Convert to HTML
Convert from HTML
File Format
HTML (Hypertext Markup Language with a client-side image map)
HTML (HyperText Markup Language) is the standard for creating websites. The idea was proposed in 1989 by physicist Tim Berners-Lee at CERN. Web browsers can read this language to interpret the coding into different texts, colors, formats (headings, p.
DOCX (Microsoft Word Open XML Document)
DOCX is an advanced version of the DOC file format and is much more usable and accessible than the latter at any given time. Unlike the DOC file, the DOCX file is not an extensive file format. Instead, it appears as being a single file while actuall.
Конвертировать HTML в DOCX (WORD) / URL в DOCX (WORD) онлайн
Продвинутый онлайн-сервис конвертации html файлов в DOCX. Для mac & windows
- Image
- Document
- Ebook
- Audio
- Archive
- Video
- Presentation
- Font
- Vector
- CAD
- Image
- Document
- Ebook
- Audio
- Archive
- Video
- Presentation
- Font
- Vector
- CAD
Язык гипертекстовой разметки
HTML ― это файл веб-формата. Исходный код HTML можно изменить в текстовом редакторе. HTML-файлы разрабатываются для будущего использования в веб-браузерах пользователей и позволяют форматировать сайты с текстом, изображениями и другими необходимыми материалами. В файлах этого формата используются теги для создания веб-страниц. Интерпретация HTML-кода выполняется веб-браузером, и этот код, как правило, не показывается пользователю.
Microsoft Office Open XML
С 2007 года Microsoft начал использовать формат файла docx, созданный с использованием формата Office Open XML. Этот формат представляет собой сжатый файл, содержащий текст в форме XML, графики и иные данные, которые могут быть преобразованы в битовые последовательность при помощи защищенных патентами двоичных форматов. Поначалу предполагалось, что этот формат заменит формат doc, но оба формата все еще используются по сегодняшний день.