Convert docx to html javascript

Содержание

docx2html
installation
example
Feature
ToDo
Saved searches
Use saved searches to filter your results more quickly
lalalic/docx2html
Name already in use
Sign In Required
Launching GitHub Desktop
Launching GitHub Desktop
Launching Xcode
Launching Visual Studio Code
Latest commit
Git stats
Files
README.md
Saved searches
Use saved searches to filter your results more quickly
License
widrelk/jsDocxToHtml
Name already in use
Sign In Required
Launching GitHub Desktop
Launching GitHub Desktop
Launching Xcode
Launching Visual Studio Code
Latest commit
Git stats
Files
README.md

docx2html

docx2html is a javascript converter from docx to html on nodejs and browser.

installation

example

const docx2html=require("docx2html") docx2html(input.files[0]) /** you can do further with utilities in converted html .then(html=>  //html.toString() //html.asZip/download/save >) */

docx2html(docx, options), return a promise object, options support
- container: a HTMLElement to append converted html, default value is document.body
- asImageURL(data): to convert image data to url, only required for nodejs
- content: the converted dom
- toString(/options: /)
- asZip(options)
- download(options)
- save(options)
- release(): to release image resources
It is based on docx4js 1.x to parse docx, and utilize docx4js api to traverse docx models and convert docx models to html elements.

Ideally, each docx model should have a specific converter to create accordingly html elements, so the design is simply to map from type of docx model to html element constructor.

While, the difficulty is that some docx models are difficult to be expressed in html. It’s luckly that we have CSS3 that make some rich styles possible in html, such as numbering, all(12) kinds of table styles.

Word shape utilizes SVG to draw lines, rects, and etc, but so far it only supports limited shapes, while the left job is time.

P of html, according to HTML specification, is restricted not to include any block container, such as div, so there’s no p tag, but all div with paragraph styles, and then do some arrangement when dom is ready with a small javascript code.

It keeps header and footer for every section, but there’s no conditional consideration, such as odd and even header/footer.

Word Field is kept, while so far only link is supported.

Feature

environment
- section
- header
- footer
- paragraph
- link
- numbering
  - many
  - rect
  - circle
  - round rect
  - h1 ~ h6
  - hyperlink
  - document default
  - named style
  - section style
    - page layout
    - columns
    - column style
    - all(12) word built in styles
    - styles on first/last/even/odd row/column
    - styles on 4 cornor cells
    - rotate
    - text direction
    - positioning
      - vertical
        
        page/margin — top/bottom/absolute
        
        page
        
        left/right/center/inside/outside/absolute
        
        left/right/center/absolute
        
        ToDo
        
        Источник
        
        Saved searches
        
        Use saved searches to filter your results more quickly
        
        You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
        
        a converter from docx to html
        
        lalalic/docx2html
        
        This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
        
        Name already in use
        
        A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
        
        Sign In Required
        
        Please sign in to use Codespaces.
        
        Launching GitHub Desktop
        
        If nothing happens, download GitHub Desktop and try again.
        
        Launching GitHub Desktop
        
        If nothing happens, download GitHub Desktop and try again.
        
        Launching Xcode
        
        If nothing happens, download Xcode and try again.
        
        Launching Visual Studio Code
        
        Your codespace will open once ready.
        
        There was a problem preparing your codespace, please try again.
        
        Latest commit
        
        Git stats
        
        Files
        
        Failed to load latest commit information.
        
        README.md
        
        docx2html is a javascript converter from docx to html on nodejs and browser. here’s a demo.
        
        const docx2html=require("docx2html") docx2html(input.files[0]) /** you can do further with utilities in converted html .then(html=> //html.toString() //html.asZip/download/save >) */
        
        docx2html(docx, options), return a promise object, options support
        
        container: a HTMLElement to append converted html, default value is document.body
        
        asImageURL(data): to convert image data to url, only required for nodejs
        
        content: the converted dom
        
        toString(/options: /)
        
        asZip(options)
        
        download(options)
        
        save(options)
        
        release(): to release image resources
        
        MIT, and I also provide commercial support for tickets and enhancement to pay my rent.
        
        It is based on docx4js 1.x to parse docx, and utilize docx4js api to traverse docx models and convert docx models to html elements.
        
        Ideally, each docx model should have a specific converter to create accordingly html elements, so the design is simply to map from type of docx model to html element constructor.
        
        While, the difficulty is that some docx models are difficult to be expressed in html. It’s luckly that we have CSS3 that make some rich styles possible in html, such as numbering, all(12) kinds of table styles.
        
        Word shape utilizes SVG to draw lines, rects, and etc, but so far it only supports limited shapes, while the left job is time.
        
        P of html, according to HTML specification, is restricted not to include any block container, such as div, so there’s no p tag, but all div with paragraph styles, and then do some arrangement when dom is ready with a small javascript code.
        
        It keeps header and footer for every section, but there’s no conditional consideration, such as odd and even header/footer.
        
        Word Field is kept, while so far only link is supported.
        
        environment
        
        section
        
        header
        
        footer
        
        paragraph
        
        link
        
        numbering
        
        many
        
        rect
        
        circle
        
        round rect
        
        h1 ~ h6
        
        hyperlink
        
        document default
        
        named style
        
        section style
        
        page layout
        
        columns
        
        column style
        
        all(12) word built in styles
        
        styles on first/last/even/odd row/column
        
        styles on 4 cornor cells
        
        rotate
        
        text direction
        
        positioning
        
        vertical
        
        page/margin — top/bottom/absolute
        
        page
        
        left/right/center/inside/outside/absolute
        
        left/right/center/absolute
        
        Источник
        
        Saved searches
        
        Use saved searches to filter your results more quickly
        
        You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
        
        Convert .docx file to html
        
        License
        
        widrelk/jsDocxToHtml
        
        This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
        
        Name already in use
        
        A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
        
        Sign In Required
        
        Please sign in to use Codespaces.
        
        Launching GitHub Desktop
        
        If nothing happens, download GitHub Desktop and try again.
        
        Launching GitHub Desktop
        
        If nothing happens, download GitHub Desktop and try again.
        
        Launching Xcode
        
        If nothing happens, download Xcode and try again.
        
        Launching Visual Studio Code
        
        Your codespace will open once ready.
        
        There was a problem preparing your codespace, please try again.
        
        Latest commit
        
        Git stats
        
        Files
        
        Failed to load latest commit information.
        
        README.md
        
        jsDocxToHtml .docx to HTML converter
        
        This library converts given blob of a .docx file created in Microsoft Word (other word processors are not tested) to raw HTML. It designed to generate as accurate as possible representation of the .docx file content.
        
        You just need to pass a blob — all styles and document structure is handled inside. As a result, you will get a string with HTML that will look just like .docx file when rendered. Alongside with HTML you will find additional data, such as comments with internal links to the document segments.
        
        Almost full character styling (except for non-standart underlining)
        
        Almost full list support (With custom patterns like «Chapter 3.2.1:»)
        
        Full paragraphs styling (Text alignment, margins)
        
        Table support (Custom borders, merged cells)
        
        Inline picures
        
        Division by pages (If manual or «auto» pagebreaks presented, see known issues)
        
        External links
        
        Table headers
        
        Non-standart underlining
        
        «ankered» pictures
        
        Excel and other documents cut-ins
        
        Numberings with «number maps» (Like Roman numerals)
        
        Import the library and then call its convertToHtml() function. It will return a promise, that will contain a string with HTML and an array of document’s comments when resolved. Here is an example of library usage indide of a React component
        
        import jsDocxToHtml from "@ree_n/jsDocxToHtml" const [html, setHtml] = useState(""); const [comments, setComments] = useState([]); useEffect( () => < jsDocxToHtml.convertToHtml(props.blob) .then((result) => < setHtml(result.html); setComments(result.comments.map((comment) =>CommentElement(comment))) >) >, [props.blob])
        
        result.html A string with all generated HTML
        
        result.comments An array of objects with a content of the document’s comments.
        
        These ID’s and classes are implemented into HTML result and can be useful
        
        Page: «pg» + page index
        
        Page’s content: «content_pg» + page index
        
        Header: «header_pg» + page index
        
        Footer: «footer_pg» + page index
        
        Comment: «comment» + comment ID
        
        In OOXML, division by pages is passed to the application that works with the document and not directly presented in the .docx file. At the moment, division is made based on the lastRenderedPageBreak tag, that you can find inside of a run. This works in most cases, but sometimes this tag is simply not presented. This can be solved with calculation of page content’s height inside of the library and later adjustions, but it is very difficult from my perspective.
        
        The simpler solution is to use Element.clientHeight after you render the html, but at the moment, I don’t know how to correctly implement it here.
        
        For correct display of the headers some additional work is required. Header’s height is stated nowhere inside of the document, just like with division by pages, all work is done by the application.
        
        Also, sometimes there is a blank header with w:std tag in xml, but in Word header is in place. Looks like it is somehow imported or inherited. Currently not supported.
        
        In OOXML, table width defenition is a little bit a mess. In short, it defined in table properties, but also can be overwritten by column width sometimes, and so on. In my tests, landscape tables works just fine, but when it comes to tables in footers, if rendered with column width, stated in xml file, the resulted table often does not match with what you can see in Word. Correct behaviour in this case is unknown for me.
        
        Sometimes, pagebreaks inside of a table row are inconsistent. This can cause some content of a row to stay on the previous page, while in Word it is transfered to the next page and so on.
        
        OOXML has a lot more underline options than HTML. At the moment, not all options are supported.
        
        According to documentation, page numbering stated in the section properties, but in my tests, a document with numberings in Microsoft Word did not have this tag stated anywhere. Therefore, correct behavior in this case is unknown to me. Page numberings are not supported at the moment.
        
        Looks like line height is calculated differently for OOXML and HTML. Need further research. At the moment results in footer overlapping with page content (basically just takes up more space), and similar issues.
        
        This library is inspired by Mammoth and somewhat based on it, so thanks to Mammoth’s author and contributors.
        
        Источник
        
        Читайте также: Unit test python api

Convert docx to html javascript

docx2html

installation

example

Feature

ToDo

Saved searches

Use saved searches to filter your results more quickly

lalalic/docx2html

Name already in use

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md

Saved searches

Use saved searches to filter your results more quickly

License

widrelk/jsDocxToHtml

Name already in use

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md