List of speech recognition software

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Template:SHORTDESC: Speech recognition software is available for many computing platforms, operating systems, use models, and software licenses. Here is a listing of such, grouped in various useful ways.

Acoustic models and speech corpus (compilation)

[edit | edit source]

The following list presents notable speech recognition software engines with a brief synopsis of characteristics.

Application name Description Open-source License Operating system Programming language Supported language, note Offline or online
CMU Sphinx HMM Yes BSD style Cross-platform Java English, German, French, Mandarin, Russian Offline
HTK HMM neural net No HTK specific Cross-platform C English; version 3.5 released December 2015
Julius HMM trigrams Yes BSD style, non-commercial Cross-platform C Japanese, English; [1] Offline
Kaldi Neural net Yes Apache Cross-platform C++ English
RWTH ASR RWTH Aachen University No RWTH ASR, non-commercial use only Linux, macOS C++ English
Whisper Encoder/decoder transformer Yes MIT license Cross-platform Python (programming language) Multilingual Online (through API) and Offline

Macintosh

[edit | edit source]
Application name Description Open-source License Price Note
Dragon for Mac (discontinued 2018) macOS; by Nuance No Proprietary
Dragon Dictate (discontinued) macOS; by Nuance No Proprietary
MacSpeech Scribe (discontinued) Transcription from recorded text; acquired by Nuance
iListen (discontinued) PowerPC Macintosh; discontinued by MacSpeech; acquired by Nuance
Speakable items Included with macOS
ViaVoice (discontinued) IBM Product; acquired by Nuance
Voice Navigator Original GUI voice control; 1989
Weesper Neon Flow Voice-dictation software for macOS 11+; 100% offline processing; cross-app dictation No Proprietary €5/month (free trial) Provides local speech-to-text on device; available for macOS and Windows[1]

Cross-platform web apps based on Chrome

[edit | edit source]

The following list presents notable speech recognition software that operate in a Chrome browser as web apps. They make use of HTML5 Web-Speech-API.[2]

Application name Description Open-source License Price Note
Speechmatics[3] Cloud based and on-premise automatic speech recognition No Proprietary From £0.06 per minute of audio

Mobile devices and smartphones

[edit | edit source]

Many mobile phone handsets, including feature phones and smartphones such as iPhones and BlackBerrys, have basic dial-by-voice features built in. Many third-party apps have implemented natural-language speech recognition support, including:

Application name Description Open-source License Price Note
Assistant.ai Assistant for Android, iOS and Windows Phone No Proprietary, freeware Free Discontinued
Dragon Dictation No Proprietary, freeware Free
Google Now Android voice search No Proprietary, freeware Free
Google Voice Search No Proprietary, freeware Free
Microsoft Cortana Microsoft voice search No Proprietary, freeware Free
Siri Personal Assistant Apple's virtual personal assistant No Proprietary, freeware Free
Alexa – Amazon Echo Amazon's personal assistant No Proprietary
SILVIA Android and iOS No
Vlingo

Windows

[edit | edit source]

Windows built-in speech recognition

[edit | edit source]

The Windows Speech Recognition version 8.0 by Microsoft comes built into Windows Vista, Windows 7, Windows 8 and Windows 10. Speech Recognition is available only in English, French, Spanish, German, Japanese, Simplified Chinese, and Traditional Chinese and only in the corresponding version of Windows; meaning you cannot use the speech recognition engine in one language if you use a version of Windows in another language. Windows 7 Ultimate and Windows 8 Pro allow you to change the system language, and therefore change which speech engine is available. Windows Speech Recognition evolved into Cortana (software), a personal assistant included in Windows 10.

Windows 7, 8, 10, 11 third-party speech recognition

[edit | edit source]
  • Braina – Dictate into third party software and websites,[4] fill web forms and execute vocal commands.[5]
  • Dragon NaturallySpeaking from Nuance Communications – Successor to the older DragonDictate product. Focus on dictation. 64-bit Windows support since version 10.1.
  • Tazti – Create speech command profiles to play PC games and control applications – programs. Create speech commands to open files, folders, webpages, applications. Windows 7, Windows 8 and Windows 8.1 versions.[6]
  • Voice Finger – software that improves the Windows speech recognition system by adding several extensions to it. The software enables controlling the mouse and the keyboard by only using the voice. It is especially useful for aiding users to overcome disabilities or to heal from computer injuries.

Microsoft Speech API

[edit | edit source]

The first version of the Microsoft Speech API was released for Windows NT 3.51 and Windows 95 in 1995, it was then part of Windows up to Windows Vista. This initial version already contained Direct Speech Recognition and Direct Text To Speech APIs which applications could use to directly control engines, as well as simplified 'higher-level' Voice Command and Voice Talk APIs. Speech recognition functionality included as part of Microsoft Office and on Tablet PCs running Microsoft Windows XP Tablet PC Edition. It can also be downloaded as part of the Speech SDK 5.1 for Windows applications, but since that is aimed at developers building speech applications, the pure SDK form lacks any user interface (numerous applications were available), and thus is unsuitable for end users.

Built-in software

[edit | edit source]

Interactive voice response

[edit | edit source]

The following are interactive voice response (IVR) systems:

Unix-like x86 and x86-64 speech transcription software

[edit | edit source]
  • Janus Recognition Toolkit (JRTk)[8][9]
  • Mozilla DeepSpeech was[10] developing an open-source Speech-To-Text engine based on Baidu's deep speech research paper.[11]
  • Weesper Neon Flow – professional voice-dictation software that provides offline speech-to-text processing on macOS and Windows using local AI models. It is not open source and offers a paid subscription after a 15‑day free trial.[12]

Discontinued software

[edit | edit source]

See also

[edit | edit source]

References

[edit | edit source]
  1. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  2. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  3. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  4. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  5. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  6. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  7. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  8. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  9. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  10. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  11. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  12. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  13. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  14. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
  15. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).