Python transliterate russian to english

transliterate 1.10.2

Bi-directional transliterator for Python. Transliterates (unicode) strings according to the rules specified in the language packs (source script target script).

Comes with language packs for the following languages (listed in alphabetical order):

  • Armenian
  • Bulgarian (beta)
  • Georgian
  • Greek
  • Macedonian (alpha)
  • Mongolian (alpha)
  • Russian
  • Serbian (alpha)
  • Ukrainian (beta)

There are also a number of useful tools included, such as:

  • Simple lorem ipsum generator, which allows lorem ipsum generation in the language chosen.
  • Language detection for the text (if appropriate language pack is available).
  • Slugify function for non-latin texts.

Prerequisites

Installation

Install with latest stable version from PyPI.

pip install transliterate

or install the latest stable version from BitBucket:

pip install https://bitbucket.org/barseghyanartur/transliterate/get/stable.tar.gz

or install the latest stable version from GitHub:

pip install https://github.com/barseghyanartur/transliterate/archive/stable.tar.gz

That’s all. See the Usage and examples section for more.

Usage and examples

Simple usage

Transliteration to Armenian

Transliteration to Georgian

Transliteration to Russian

List of available (registered) languages

Reversed transliterations are transliterations made from target language to source language (in terms they are defined in language packs). In case of reversed transliterations, you may leave out the language_code attribute, although if you know it on beforehand, specify it since it works faster that way.

Reversed transliteration from Armenian

Reversed transliteration from Armenian with language_code argument left out

Reversed transliteration from Georgian

Reversed transliteration from Georgian with language_code argument left out

Reversed transliteration from Greek

Reversed transliteration from Greek with language_code argument left out

Reversed transliteration from Russian (Cyrillic)

Reversed transliteration from Russian (Cyrillic) with language_code argument left out

Working with large amounts of data

If you know which language pack shall be used for transliteration, especially when working with large amounts of data, it makes sense to get the transliteration function in the following way:

Registering a custom language pack

Basics

Make sure to call the autodiscover function before registering your own language packs if you want to use the bundled language packs along with your own custom ones.

Then the custom language pack part comes.

It’s possible to replace existing language packs with your own ones. By default, existing language packs are not force-installed.

To force install a language pack, set the force argument to True when registering a language pack. In that case, if a language pack with same language code has already been registered, it will be replaced; otherwise, if language pack didn’t exist in the registry, it will be just registered.

Forced language packs can’t be replaced or unregistered.

API in depth

There are 7 class properties that you could/should be using in your language pack, of which 4 are various sorts of mappings.

Mappings
  • mapping (tuple): A tuple of two strings, that simply represent the mapping of characters from the source language to the target language. For example, if your source language is Latin and you want to convert “a”, “b”, “c”, “d” and “e” characters to appropriate characters in Russian Cyrillic, your mapping would look as follows:
Additional
  • character_ranges (tuple): A tuple of character ranges (unicode table). Used in language detection. Works only if detectable property is set to True. Be aware, that language (or shall I better be saying — script) detection is very basic and is based on characters only.
  • detectable (bool): If set to True, language pack would be used for automatic language detection.

Using the lorem ipsum generator

Note, that due to incompatibility of the original lorem-ipsum-generator package with Python 3, when used with Python 3 transliterate uses its’ own simplified fallback lorem ipsum generator (which still does the job).

Generating paragraphs in Armenian

Generating sentense in Georgian

Generating sentense in Greek

Generating sentense in Russian (Cyrillic)

Language detection

Detect Russian (Cyrillic) text

Slugify

Slugify Russian (Cyrillic) text

Missing a language pack?

Missing a language pack for your own language? Contribute to the project by making one and it will appear in a new version (which will be released very quickly).

Writing documentation

Keep the following hierarchy.

===== title ===== header ====== sub-header ---------- sub-sub-header ~~~~~~~~~~~~~~ sub-sub-sub-header ^^^^^^^^^^^^^^^^^^ sub-sub-sub-sub-header ++++++++++++++++++++++ sub-sub-sub-sub-sub-header **************************

License

Support

For any issues contact me at the e-mail given in the Author section.

Источник

Читайте также:  Reports in java example
Оцените статью