Add Speech to Text Feature using Kotlin

Introduction

In today's tech-savvy world, speech recognition has become an integral part of various applications, providing users with a more natural and convenient way to interact with the app. Adding a Speech-to-Text (STT) feature to your Android application can enhance user experience significantly. In this blog, I'll guide you through the process of integrating a speech recognition feature into your Android app, similar to one inside the Google Assistant.

What is Speech-to-Text (STT) ?

Speech-to-Text or STT is the process of converting spoken words into text. It is a complex task, as it requires understanding the nuances of human speech, such as accents, dialects, and different pronunciations. It's been made possible using AI tools, NLP and various ML models. Let's move ahead and add this cool feature to our Android application.

Now, let's add this cool feature inside your Android application and make your user base get rid of typing. Let them speak to your app.

Implementation

#1 : Getting Required Permissions

To start capturing the speech, your app must have permission to capture the user's audio. For this, we will ask the user to give our app the permission, if it's not granted. If it's granted we will initialize the application's view.

if(ContextCompat.checkSelfPermission(this, Manifest.permission.RECORD_AUDIO) 
    == PackageManager.PERMISSION_GRANTED) {
    // Initialize your view
} else {
    ActivityCompat.requestPermissions(this, 
        arrayOf(Manifest.permission.RECORD_AUDIO), PERMISSION_REQ_CODE)
}

PERMISSION_REQ_CODE is an integer code which we will use once the user grants permission to our app, to check if its the same permission we requested. This is a standard way to handle permission requests in Android, please keep it in mind.

Now, add the code below to handle the result of the permission request we requested above. After a successful permission request and the user grants permission, we will initialize the application's view.

override fun onRequestPermissionsResult(
        requestCode: Int,
        permissions: Array<out String>,
        grantResults: IntArray
) {
    super.onRequestPermissionsResult(requestCode, permissions, grantResults)
    if(requestCode == PERMISSION_REQ_CODE && grantResults.isNotEmpty()) {
        if(grantResults[0] == PackageManager.PERMISSION_GRANTED) {
            // Initialize your view
        }
    }
}

#2 : Initializing a Speech Recognizer

Now, that we have the permission to record the user's audio, we are all set to add the STT feature. For this, we will initialize a Speech Recognizer which we will get the user's speech and give us the text from that.

The speech recognizer needs a speech recognizer intent to start listening to the user's speech. So we will first create an intent for the same.

// Creating the Speech Recognizer Intent
val speechRecognizerIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)
speechRecognizerIntent.putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true)
speechRecognizerIntent.putExtra(
    RecognizerIntent.EXTRA_LANGUAGE_MODEL,
    RecognizerIntent.LANGUAGE_MODEL_FREE_FORM
)
speechRecognizerIntent.putExtra(
    RecognizerIntent.EXTRA_LANGUAGE, Locale.getDefault())

// Initializing Speech Recognizer
val speechRecognizer = SpeechRecognizer.createSpeechRecognizer(context)

#3 : Start Listening to User

Now we are all set to make the user talk to us. We will now start listening to the user using our Speech Recognizer initialized above. Also, we have to set up a RecognitionListener for our Speech Recognizer so that we can respond to output received from our Speech Recognizer.

// To start listening, we have to call the startListening function
// while passing the speechRecognizerIntent to it
speechRecognizer.startListening(speechRecognizerIntent)

// Setting up the Recognition Listener to our Speech Recognizer
speechRecognizer.setRecognitionListener(object : RecognitionListener{
    override fun onReadyForSpeech(params: Bundle?) {}

    override fun onBeginningOfSpeech() {
        // This callback is called once user starts speaking
    }

    override fun onRmsChanged(rmsdB: Float) {}

    override fun onBufferReceived(buffer: ByteArray?) {}

    override fun onEndOfSpeech() {
        // This callback is called once user's speech ends
        speechRecognizer.stopListening()
    }

    override fun onError(error: Int) {}

    override fun onResults(results: Bundle?) {
        // The speechResults variable contains the user's speech
        val speechResults = results?.getStringArrayList(
                SpeechRecognizer.RESULTS_RECOGNITION)?.get(0)
    }

    override fun onPartialResults(partialResults: Bundle?) {
        // The partialResults variable contains the user's partial speech
        // i.e. intermediate results meanwhile the user is speaking
        val partialResults = partialResults?.getStringArrayList(
                SpeechRecognizer.RESULTS_RECOGNITION)?.get(0)
    }

    override fun onEvent(eventType: Int, params: Bundle?) {}
})

Conclusion

So, you have added a cool STT feature in your android application in just 3 easy steps. This will surely boost your user's experience on the app.

Bonus Tip: Below is the code to copy the text to the clipboard, which can be used to copy the user's speech to the clipboard.

val clipboardManager = 
    context.getSystemService(Context.CLIPBOARD_SERVICE) as ClipboardManager
clipboard.setPrimaryClip(
    ClipData.newPlainText("LABEL", textToCopy)
)

Thank You For Reading. Do follow for more Android Tutorials and Documentations.
"You learn more from failure than from success."

Typing is Outdated, Add this to your Android Application

Adding Speech To Text Feature using Kotlin