top of page
  • Writer's pictureNirmal Patel

Speech Recognition for EdTech: Ideas for Products and Features

Updated: Dec 5, 2021

This article will share some exciting ways to use the latest speech recognition technology to create novel learning experiences! Speech recognition technology has improved significantly in the last decade due to the advent of deep learning methods. Higher accuracy levels of the machine learning models have enabled people to develop reliable speech-based educational products. These products are now giving students enhanced learning experiences and helping them be successful.

Using speech technology has also become very easy these days. There are several out-of-the-box solutions, libraries, and services that we can use to quickly build working prototypes and products. For example, companies like SoapBox Labs provide speech recognition services specially designed for children's voices. Mozilla's DeepSpeech is an open-source library to train speech models. Google Cloud and AWS also provide out-of-the-box speech recognition APIs. Not only that, now there are ready-to-use open-source speech models pre-trained on many hours of speech data. For example, Vakyansh is an open-source repository of speech models for Indian languages ( The easy availability of speech software and higher accuracy have made speech recognition prevalent in high-tech learning products.

We put together several ways to use speech technology for creating potentially effective learning experiences. Some of these ideas are close to products while others can be viewed as features of an existing technology platform.

Oral Reading Fluency

One of the most widely known use cases of speech technology in the classroom is automated reading assessments (see NWEA’s MAP reading fluency). When we want to measure the reading capability of the students, we can use speech recognition to rate the reading of the students automatically. From the speech data, we can also get various metrics such as word count per minute, word error rate, time to speak the text, etc. These metrics can help teachers personalize the reading instruction in their classrooms. We can also make speech technology available through messaging bots. For example, we can build a Whatsapp bot where students can receive a reading passage and respond with the audio of their reading. The AI can give immediate feedback to the student based on some predefined metrics. Check out our open-source python-based prototype for Oral Reading Fluency here:

Voice-based UI Navigation

Accessibility is often a challenge for large-scale edtech platforms. UI design issues can make it challenging to find the content or other functionality for teachers and students. In such cases, having a voice-based UI navigation can help users and increase the accessibility of the platform. By using voice commands, we can let teachers and students immediately find what they need. For example, the teacher can say, “show me all of the ungraded tests for my 3rd grade reading class”, and we can lead them to the correct page. Another more complex example is scheduling an assignment. A student can say “take me to my today’s homework” to go directly to their homework activity. In some use cases, a voice-based login might also help users.

Adaptive Reading Instruction

We can leverage speech-based assessments to provide students with leveled reading materials. As students read more text, we can keep changing the reading difficulty of the content. Using the latest NLP techniques, we can now predict the readability metrics for any content on the internet. We can build a large-scale reading item inventory of open-source content from the internet and rate the reading difficulty of the passages using machine learning models. Students can then get leveled content from this repository when they need it.

Read-Along Tutors

Speech recognition can help us build read-along tutors to help students read word by word. One of the most known examples of read-along tutors is Project LISTEN that Dr. Jack Mostow developed at Carnegie Mellon University. In this tutor, the speech recognition system tracked the students reading word by word and gave personalized help every time the student faced a challenge. There are several commercial read-along tutors in the market today, such as Fast ForWord by Scientific Learning.

Classroom Conversation Analytics

This is a recently emerged area where the dynamics of learning are captured by analyzing the conversation in the classroom. We can know answers to questions such as:

  • How much time is spent in the conversations versus the instruction?

  • Who all are participating in the classroom?

  • What type of questions get asked?

These are all critical questions to understand what is happening in the classroom. In some instances, speaker recognition algorithms can help us identify the actual speakers in the voice data and rebuild the entire conversational context of the class. We can analyze this context by learning scientists to understand the correlates of effective teaching and learning.

Group Learning Support

We can use conversational analytics in group learning scenarios where each group has a voice device. The voice device can provide help to the students who are learning together by giving them directions. We can also ensure that everyone is participating in the groups. The teacher can also get real-time information of what is happening in each group on their device and know where they need to focus. Potentially, this can allow the teacher to work with more groups at a time.

Math Facts Learning

We can design voice-based apps to help students master math facts by working with them. The app can speak the question, and the student can answer. This type of interaction can help students build mastery. We can also create this interface via Alexa or other voice-based products that support 3rd party applications.

Identifying Speech Impairments

Speech and language disorders can create many learning challenges for the students. For example, dyslexia is a widespread problem and affects nearly 20% of the students (source). We can use speech recognition to detect potential issues in students’ speech. It’s crucial to see these impairments early in childhood so that we can provide evidence-based interventions.


The biggest challenge with speech technology is unbiased recognition accuracy. For example, the speech engine may recognize some accents way better than others. Such challenges have been identified before in commercial speech engines. In 2019, a Harvard Business Review article described how Google’s speech engine had biases in them [link to the article]. The good news s that this is now a known issue in the Machine Learning community, and AI systems are actively removing bias from their models.

Future of AI in education

AI technology can enhance human capacity, but we must use it responsibly and equitably. When we use AI in education, we have to ensure that the benefits of the technology are reaching all students and teachers, not just the privileged ones. This way, we can make the most value out of it without creating digital divides between the learners.

Have more ideas?

If you have any interesting ideas about using speech technology in learning, please share them in the comments!

Thank you for reading!

Written by Nirmal Patel and Aditya Sharma from Playpower Labs.

315 views0 comments


bottom of page