Python cyrillic to latin

Iuliia

Transliteration means representing Cyrillic data (mainly names and geographic locations) with Latin letters. It is used for international passports, visas, green cards, driving licenses, mail and goods delivery etc.

Iuliia makes transliteration as easy as:

Why use Iuliia

  • 20 transliteration schemas (rule sets), including all main international and Russian standards.
  • Correctly implements not only the base mapping, but all the special rules for letter combinations and word endings (AFAIK, Iuliia is the only library which does so).
  • Simple API and zero third-party dependencies.

For schema details and other information, see iuliia.ru (in Russian).

Installation

Usage

List all supported schemas:

Transliterate using specified schema:
$ iuliia icao_doc_9303 Iuliia Shcheglova

Development setup

$ python3 -m venv env $ . env/bin/activate $ make deps schemas $ tox
$ make help Usage: make [task] task help ------ ---- changelog Generate changelog coverage Run tests with coverage deps Install dependencies lint Lint and static-check code pull Pull code and schemas push Push commits and tags schemas Update schemas test Run tests help Show help message 

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Читайте также:  Php bigint to int

Make sure to add or update tests as appropriate.

Use Black for code formatting and Conventional Commits for commit messages.

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Transliterate Cyrillic → Latin in every possible way

License

nalgeon/iuliia-py

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Transliterate Cyrillic → Latin in every possible way

Transliteration means representing Cyrillic data (mainly names and geographic locations) with Latin letters. It is used for international passports, visas, green cards, driving licenses, mail and goods delivery etc.

Iuliia makes transliteration as easy as:

>>> import iuliia >>> iuliia.translate("Юлия Щеглова", schema=iuliia.WIKIPEDIA) 'Yuliya Shcheglova'
  • 20 transliteration schemas (rule sets), including all main international and Russian standards.
  • Correctly implements not only the base mapping, but all the special rules for letter combinations and word endings (AFAIK, Iuliia is the only library which does so).
  • Simple API and zero third-party dependencies.

For schema details and other information, see iuliia.ru (in Russian).

List all supported schemas:

>>> import iuliia >>> import iuliia >>> for name, schema in iuliia.Schemas.items(): . print("".format(name, schema.description)) . ala_lc ALA-LC transliteration schema. ala_lc_alt ALA-LC transliteration schema. bgn_pcgn BGN/PCGN transliteration schema bgn_pcgn_alt BGN/PCGN transliteration schema bs_2979 British Standard 2979:1958 transliteration schema bs_2979_alt British Standard 2979:1958 transliteration schema gost_16876 GOST 16876-71 (aka GOST 1983) transliteration schema gost_16876_alt GOST 16876-71 (aka GOST 1983) transliteration schema gost_52290 GOST R 52290-2004 transliteration schema gost_52535 GOST R 52535.1-2006 transliteration schema gost_7034 GOST R 7.0.34-2014 transliteration schema gost_779 GOST 7.79-2000 (aka ISO 9:1995) transliteration schema gost_779_alt GOST 7.79-2000 (aka ISO 9:1995) transliteration schema icao_doc_9303 ICAO DOC 9303 transliteration schema iso_9_1954 ISO/R 9:1954 transliteration schema iso_9_1968 ISO/R 9:1968 transliteration schema iso_9_1968_alt ISO/R 9:1968 transliteration schema mosmetro Moscow Metro map transliteration schema mvd_310 MVD 310-1997 transliteration schema mvd_310_fr MVD 310-1997 transliteration schema mvd_782 MVD 782-2000 transliteration schema scientific Scientific transliteration schema telegram Telegram transliteration schema ungegn_1987 UNGEGN 1987 V/18 transliteration schema wikipedia Wikipedia transliteration schema yandex_maps Yandex.Maps transliteration schema yandex_money Yandex.Money transliteration schema

Transliterate using specified schema:

>>> source = "Юлия Щеглова" >>> iuliia.translate(source, schema=iuliia.ICAO_DOC_9303) 'Iuliia Shcheglova'
>>> schema = iuliia.Schemas.get("wikipedia") >>> iuliia.translate(source, schema) 'Yuliya Shcheglova'
$ iuliia icao_doc_9303 "Юлия Щеглова" Iuliia Shcheglova
$ python3 -m venv env $ . env/bin/activate $ make deps schemas $ tox
$ make help Usage: make [task] task help ------ ---- changelog Generate changelog coverage Run tests with coverage deps Install dependencies lint Lint and static-check code pull Pull code and schemas push Push commits and tags schemas Update schemas test Run tests help Show help message 

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Make sure to add or update tests as appropriate.

Use Black for code formatting and Conventional Commits for commit messages.

Источник

transliterate 1.10.2

Bi-directional transliterator for Python. Transliterates (unicode) strings according to the rules specified in the language packs (source script target script).

Comes with language packs for the following languages (listed in alphabetical order):

  • Armenian
  • Bulgarian (beta)
  • Georgian
  • Greek
  • Macedonian (alpha)
  • Mongolian (alpha)
  • Russian
  • Serbian (alpha)
  • Ukrainian (beta)

There are also a number of useful tools included, such as:

  • Simple lorem ipsum generator, which allows lorem ipsum generation in the language chosen.
  • Language detection for the text (if appropriate language pack is available).
  • Slugify function for non-latin texts.

Prerequisites

Installation

Install with latest stable version from PyPI.

pip install transliterate

or install the latest stable version from BitBucket:

pip install https://bitbucket.org/barseghyanartur/transliterate/get/stable.tar.gz

or install the latest stable version from GitHub:

pip install https://github.com/barseghyanartur/transliterate/archive/stable.tar.gz

That’s all. See the Usage and examples section for more.

Usage and examples

Simple usage

Transliteration to Armenian

Transliteration to Georgian

Transliteration to Russian

List of available (registered) languages

Reversed transliterations are transliterations made from target language to source language (in terms they are defined in language packs). In case of reversed transliterations, you may leave out the language_code attribute, although if you know it on beforehand, specify it since it works faster that way.

Reversed transliteration from Armenian

Reversed transliteration from Armenian with language_code argument left out

Reversed transliteration from Georgian

Reversed transliteration from Georgian with language_code argument left out

Reversed transliteration from Greek

Reversed transliteration from Greek with language_code argument left out

Reversed transliteration from Russian (Cyrillic)

Reversed transliteration from Russian (Cyrillic) with language_code argument left out

Working with large amounts of data

If you know which language pack shall be used for transliteration, especially when working with large amounts of data, it makes sense to get the transliteration function in the following way:

Registering a custom language pack

Basics

Make sure to call the autodiscover function before registering your own language packs if you want to use the bundled language packs along with your own custom ones.

Then the custom language pack part comes.

It’s possible to replace existing language packs with your own ones. By default, existing language packs are not force-installed.

To force install a language pack, set the force argument to True when registering a language pack. In that case, if a language pack with same language code has already been registered, it will be replaced; otherwise, if language pack didn’t exist in the registry, it will be just registered.

Forced language packs can’t be replaced or unregistered.

API in depth

There are 7 class properties that you could/should be using in your language pack, of which 4 are various sorts of mappings.

Mappings
  • mapping (tuple): A tuple of two strings, that simply represent the mapping of characters from the source language to the target language. For example, if your source language is Latin and you want to convert “a”, “b”, “c”, “d” and “e” characters to appropriate characters in Russian Cyrillic, your mapping would look as follows:
Additional
  • character_ranges (tuple): A tuple of character ranges (unicode table). Used in language detection. Works only if detectable property is set to True. Be aware, that language (or shall I better be saying — script) detection is very basic and is based on characters only.
  • detectable (bool): If set to True, language pack would be used for automatic language detection.

Using the lorem ipsum generator

Note, that due to incompatibility of the original lorem-ipsum-generator package with Python 3, when used with Python 3 transliterate uses its’ own simplified fallback lorem ipsum generator (which still does the job).

Generating paragraphs in Armenian

Generating sentense in Georgian

Generating sentense in Greek

Generating sentense in Russian (Cyrillic)

Language detection

Detect Russian (Cyrillic) text

Slugify

Slugify Russian (Cyrillic) text

Missing a language pack?

Missing a language pack for your own language? Contribute to the project by making one and it will appear in a new version (which will be released very quickly).

Writing documentation

Keep the following hierarchy.

===== title ===== header ====== sub-header ---------- sub-sub-header ~~~~~~~~~~~~~~ sub-sub-sub-header ^^^^^^^^^^^^^^^^^^ sub-sub-sub-sub-header ++++++++++++++++++++++ sub-sub-sub-sub-sub-header **************************

License

Support

For any issues contact me at the e-mail given in the Author section.

Источник

Оцените статью