Lezgi ↔ Russian

Open Translation Tools for Lezgi and Russian

A practical MT project built around real Lezgi texts. This hub brings together models, datasets, and a live Telegram bot to support bilingual communication and language technology for a low‑resource language.

What’s included

Bidirectional Lezgi↔Russian translation with curated parallel data and a deployment-ready model.

NLLB-based MT
Parallel corpora
Telegram bot
Lezgi (Cyrillic)
Russian

Project Overview

Model

Fine‑tuned NLLB model for bidirectional Lezgi↔Russian translation.

View model

Dataset

Parallel corpora from Lezgi Wikipedia and Lezgi Gazet texts, with Russian translations.

View dataset

Telegram Bot

Instant translations via @LezgiTranslatorBot in Telegram.

Open bot

Why this matters

Low‑resource focus

Lezgi has limited digital resources. Open tools help communities, researchers, and educators.

Reproducible pipeline

Datasets, training notes, and model cards make the work transparent and reusable.

Practical access

A Telegram bot makes translation available without installing any software.

Bot Examples

Translator mode

Uses the trained NLLB model for sentence translation. You can also forward a message from another chat and the bot will translate it. In auto mode the bot detects source language, but very short texts can be misdetected.

Translator mode screenshot

Dictionary mode

Activates when you send a single word. It uses the M.M.Gadzhiev dictionary, with normalization and fuzzy search. For example, "Горы/Дагълар" can still match "Гора/Дагъ" after morphological normalization. For Lezgi, fuzzy suggestions are shown (e.g., "Пирем" -> "Перем", рубашка).

Dictionary mode screenshot

Get involved

Contribute data

Have Lezgi texts to share? Reach out to contribute or improve the dataset.

Evaluate quality

Report translation issues or domain gaps to help prioritize improvements.

Build on top

Use the model and dataset to create tools for education, media, and research.