Contribute to the development of Pashto

pashto-volunteer

Areas to contribute

There are many things you can do to help us. Here are a few suggestions to uplift Pashto from a low resource to a web-rich language.

Common Voice

One of the biggest challenges for content creation in the Pashot language is typing grammatically correct sentences using the available keyboards.

Our first goal is to create an automatic speech recognition system in Pashto that will transcribe spoken words into written Pashto. We need training data in the Pashto language to create such a system. Usually, this training data is created through another open-source project called Mozilla Common Voice. Unfortunately, Pashto is one of those few languages with no data in the Common Voice project.

Our top challenges, in order of priority, are as follows:

  1. Complete translation of the Common Voice portal to Pashto
  2. Create sentences in the Pashto language for Common Voice
  3. Collect utterances against sentences collected for Common Voice

Pashot Automatic Speech Recognition system

  1. Add a new model either by training from scratch or fine-tuning a pre-trained model
  2. Create a production-ready version of the model for use on mobile devices
  3. Add a new preprocessed dataset

Pashot Corpus Creation

  1. Devise a unified approach towards Pashto langue corpus creation
  2. Design a language model based on step 1

Community Development

  1. Write blogs to enhance participation
  2. Suggest improvements to the website
  3. Create documentation and videos for awareness

Code of Ethics

We are an open community. So, if you reuse a model/code/tutorial, please mention the original authors and check the copyright notice