بِسْمِ ٱللَّٰهِ ٱلرَّحْمَٰنِ ٱلرَّحِيمِ
I recently posed a question on Fosstodon asking:
Do you believe writing (or blogging) is antiquated/outdated in some ways?
In the past it served as a vital tool for preservation of knowledge. But as our modern blogs usually 'die' within 12 months of us passing on (unless archived somewhere) & the fact that we have adequate audio/video tools to record our spoken words, would vlogging/audio-recording not be as effective as writing?
Whilst I still remain conflicted about blogging itself, I thought I would share how to blog using your voice on Linux.
The title may seem like click-bait, but I am using my voice to write this blog (mostly). This is made possible by advancements in a technology called Speech-to-Text (STT), the increase in advanced Speech-to-Text models (Whisper, Vosk, Coqui and DeepSpeech) and the development of an open-source application called Speech Note.
Install or enable Flatpak
The first step is to install or enable Flatpak. Flatpak is a packaging and software distribution tool that is independent of the various Linux distros.
It can be installed or enabled by referring to this link: Flatpak Linux Install
Install required software
The application can be installed using the package manager or via the command-line.
Using a command-line app (like Konsole), Speech Note can be installed with:
flatpak install flathub net.mkiol.SpeechNote.Debug
Open the Speech Note application and navigate to: Languages > English > and filter the application to only show Speech-to-Text models.
Install the Whisper (medium) model. I find this model to be effective even though I have a South African accent. For people who do not speak English natively, you could consider the other models. The larger models should be more accurate, but require significant disk space and some of them may also require high-end GPUs for usage.
Any mid-range microphone (that works on Linux) should work with this application. After installing the Whisper (medium) model, navigate to the main application page and select "English (whisper Medium)" at the bottom. Click on
Listen and speak into your microphone.
The text should be generated reasonably quickly (and possibly even faster if a supported GPU is being used). Thereafter one just needs to copy and paste the text into your preferred editor to edit the content appropriately.
For any errors, send me an email or contact me via Mastodon.
If you don't know how to use RSS and want email updates on my new content, consider Joining my Newsletter
The original content of this blog is a Waqf solely for the Pleasure of Allah. You are hereby granted full permission to copy, download, distribute, publish and share this content without modification under condition that full attribution is given to this author by creating a link either above or below the content that links back to the original source of the content. For any questions or ambiguity, you are requested to contact me via email for clarification.