Модуль docx для python

docx 0.2.4

The docx module creates, reads and writes Microsoft Office Word 2007 docx files.

These are referred to as ‘WordML’, ‘Office Open XML’ and ‘Open XML’ by Microsoft.

These documents can be opened in Microsoft Office 2007 / 2010, Microsoft Mac Office 2008, Google Docs, OpenOffice.org 3, and Apple iWork 08.

The module was created when I was looking for a Python support for MS Word .docx files, but could only find various hacks involving COM automation, calling .Net or Java, or automating OpenOffice or MS Office.

The docx module has the following features:

Making documents

Features for making documents include:

  • Paragraphs
  • Bullets
  • Numbered lists
  • Document properties (author, company, etc)
  • Multiple levels of headings
  • Tables
  • Section and page breaks
  • Images

Editing documents

Thanks to the awesomeness of the lxml module, we can:

  • Search and replace
  • Extract plain text of document
  • Add and delete items anywhere within the document
  • Change document properties
  • Run xpath queries against particular locations in the document — useful for retrieving data from user-completed templates.
Читайте также:  Изображения

Getting started

Making and Modifying Documents

  • Just download python docx.
  • Use pip or easy_install to fetch the lxml and PIL modules.
  • Then run:

Congratulations, you just made and then modified a Word document!

Extracting Text from a Document

If you just want to extract the text from a Word file, run:

example-extracttext.py 'Some word file.docx' 'new file.txt'

Ideas & To Do List

  • Further improvements to image handling
  • Document health checks
  • Egg
  • Markdown conversion support

We love forks, changes and pull requests!

  • Check out the [HACKING](HACKING.markdown) to add your own changes!
  • For this project on github
  • Send a pull request via github and we’ll add your changes!

Want to talk? Need help?

License

Short version: this code is copyrighted to me (Mike MacCana), I give you permission to do what you want with it except remove my name from the credits. See the LICENSE file for specific terms.

Источник

python-docx 0.8.11

python-docx is a Python library for creating and updating Microsoft Word (.docx) files.

More information is available in the python-docx documentation.

Release History

0.8.11 (2021-05-15)

0.8.10 (2019-01-08)

  • Revert use of expanded package directory for default.docx to work around setup.py problem with filenames containing square brackets.

0.8.9 (2019-01-08)

0.8.8 (2019-01-07)

0.8.7 (2018-08-18)

  • Add _Row.height_rule
  • Add _Row.height
  • Add _Cell.vertical_alignment
  • Fix #455: increment next_id, don’t fill gaps
  • Add #375: import docx failure on –OO optimization
  • Add #254: remove default zoom percentage
  • Add #266: miscellaneous documentation fixes
  • Add #175: refine MANIFEST.ini
  • Add #168: Unicode error on core-props in Python 2

0.8.6 (2016-06-22)

  • Add #257: add Font.highlight_color
  • Add #261: add ParagraphFormat.tab_stops
  • Add #303: disallow XML entity expansion

0.8.5 (2015-02-21)

  • Fix #149: KeyError on Document.add_table()
  • Fix #78: feature: add_table() sets cell widths
  • Add #106: feature: Table.direction (i.e. right-to-left)
  • Add #102: feature: add CT_Row.trPr

0.8.4 (2015-02-20)

  • Fix #151: tests won’t run on PyPI distribution
  • Fix #124: default to inches on no TIFF resolution unit

0.8.3 (2015-02-19)

0.8.2 (2015-02-16)

  • Fix #94: picture prints at wrong size when scaled
  • Extract docx.document.Document object from DocumentPart Refactor docx.Document from an object into a factory function for new docx.document.Document object . Extract methods from prior docx.Document and docx.parts.document.DocumentPart to form the new API class and retire docx.Document class.
  • Migrate Document.numbering_part to DocumentPart.numbering_part . The numbering_part property is not part of the published API and is an interim internal feature to be replaced in a future release, perhaps with something like Document.numbering_definitions . In the meantime, it can now be accessed using Document.part.numbering_part .

0.8.1 (2015-02-10)

0.8.0 (2015-02-08)

  • Add styles. Provides general capability to access and manipulate paragraph, character, and table styles.
  • Add ParagraphFormat object, accessible on Paragraph.paragraph_format, and providing the following paragraph formatting properties:
    • paragraph alignment (justfification)
    • space before and after paragraph
    • line spacing
    • indentation
    • keep together, keep with next, page break before, and widow control
    • typeface (e.g. ‘Arial’)
    • point size
    • underline
    • italic
    • bold
    • superscript and subscript

    The following issues were retired:

    • Add feature #56: superscript/subscript
    • Add feature #67: lookup style by UI name
    • Add feature #98: Paragraph indentation
    • Add feature #120: Document.styles

    Backward incompatibilities

    Paragraph.style now returns a Style object. Previously it returned the style name as a string. The name can now be retrieved using the Style.name property, for example, paragraph.style.name .

    0.7.6 (2014-12-14)

    0.7.5 (2014-11-29)

    0.7.4 (2014-07-18)

    • Add feature #45: _Cell.add_table()
    • Add feature #76: _Cell.add_paragraph()
    • Add _Cell.tables property (read-only)

    0.7.3 (2014-07-14)

    0.7.2 (2014-07-13)

    0.7.1 (2014-07-11)

    0.7.0 (2014-06-27)

    • Add feature #68: Paragraph.insert_paragraph_before()
    • Add feature #51: Paragraph.alignment (read/write)
    • Add feature #61: Paragraph.text setter
    • Add feature #58: Run.add_tab()
    • Add feature #70: Run.clear()
    • Add feature #60: Run.text setter
    • Add feature #39: Run.text and Paragraph.text interpret ‘n’ and ‘t’ chars

    0.6.0 (2014-06-22)

    • Add feature #15: section page size
    • Add feature #66: add section
    • Add page margins and page orientation properties on Section
    • Major refactoring of oxml layer

    0.5.3 (2014-05-10)

    0.5.2 (2014-05-06)

    0.5.1 (2014-04-02)

    0.5.0 (2014-03-02)

    • Add 20 tri-state properties on Run, including all-caps, double-strike, hidden, shadow, small-caps, and 15 others.

    0.4.0 (2014-03-01)

    • Advance from alpha to beta status.
    • Add pure-python image header parsing; drop Pillow dependency

    0.3.0a5 (2014-01-10)

    0.3.0a4 (2014-01-07)

    0.3.0a3 (2014-01-06)

    0.3.0a1 (2014-01-05)

    • Full object-oriented rewrite
    • Feature-parity with prior version
    • text: add paragraph, run, text, bold, italic
    • table: add table, add row, add column
    • styles: specify style for paragraph, table
    • picture: add inline picture, auto-scaling
    • breaks: add page break
    • tests: full pytest and behave-based 2-layer test suite

    0.3.0dev1 (2013-12-14)

    • Round-trip .docx file, preserving all parts and relationships
    • Load default “template” .docx on open with no filename
    • Open from stream and save to stream (file-like object)
    • Add paragraph at and of document

    Источник

Оцените статью