Contribute to the development of Pashto
Areas to contribute
There are many things you can do to help us. Here are a few suggestions to uplift Pashto from a low resource to a web-rich language.
Common Voice
One of the biggest challenges for content creation in the Pashot language is typing grammatically correct sentences using the available keyboards.
Our first goal is to create an automatic speech recognition system in Pashto that will transcribe spoken words into written Pashto. We need training data in the Pashto language to create such a system. Usually, this training data is created through another open-source project called Mozilla Common Voice. Unfortunately, Pashto is one of those few languages with no data in the Common Voice project.
Our top challenges, in order of priority, are as follows:
- Complete translation of the Common Voice portal to Pashto
- Create sentences in the Pashto language for Common Voice
- Collect utterances against sentences collected for Common Voice
Pashot Automatic Speech Recognition system
- Add a new model either by training from scratch or fine-tuning a pre-trained model
- Create a production-ready version of the model for use on mobile devices
- Add a new preprocessed dataset
Pashot Corpus Creation
- Devise a unified approach towards Pashto langue corpus creation
- Design a language model based on step 1
Community Development
- Write blogs to enhance participation
- Suggest improvements to the website
- Create documentation and videos for awareness
Code of Ethics
We are an open community. So, if you reuse a model/code/tutorial, please mention the original authors and check the copyright notice