Published

First ever Pashto Dataset released on Common Voice 14!

Authors
pashto-dao

Today is a historical day for the Pashto language.

On June 28, 2023, the Pashto language community celebrated a significant milestone with the release of the first-ever Pashto dataset on the Mozilla Common Voice project, version 14.

The inclusion of Pashto, an official language of Afghanistan, on the Common Voice platform is a momentous achievement, made possible by the tireless efforts of dedicated volunteers and language enthusiasts. Over the past year, the Pashto community has been actively contributing to the project, recording thousands of voice samples and transcripts, and working to ensure the dataset's quality and diversity.

"This is a proud moment for the Pashto language community," said Hanif Rahman, the Pashto project contrbutor on Common Voice. "The availability of a Pashto dataset on this renowned platform will not only benefit the development of speech recognition technologies for the Pashto language but also promote its use and preservation in the digital age."

The Pashto dataset released in Common Voice version 14 comprises 2 hours of high-quality recordings. The dataset covers a wide range of topics, including everyday conversations, news, and educational content, ensuring a comprehensive representation of the Pashto language.