Debian-Accessibility - Software
Speech Synthesis and related APIs
A thorough list is available on the speechsynthesis task page
EFlite
A speech server for Emacspeak and yasr (or other screen readers) that allows them to interface with Festival Lite, a free text-to-speech engine developed at the CMU Speech Center as an off-shoot of Festival.
Due to limitations inherited from its backend, EFlite does only provide support for the English language at the moment.
eSpeak
eSpeak/eSpeak-NG is a software speech synthesizer for English, and some other languages.
eSpeak produces good quality English speech. It uses a different synthesis
method from other open source text to speech (TTS) engines (no concatenative
speech synthesis, therefore it also has a very small footprint), and sounds
quite different. It's perhaps not as natural or smooth
, but some find the
articulation clearer and easier to listen to for long periods.
It can run as a command line program to speak text from a file or from stdin.
It also works well as a Talker
with the KDE text to speech system (KTTS),
as an alternative to Festival for example. As such, it
can speak text which has been selected into the clipboard, or directly from the
Konqueror browser or the Kate editor.
- Includes different Voices, whose characteristics can be altered.
- Can produce speech output as a WAV file.
- Can translate text to phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
- Potential for other languages. Rudimentary (and probably humourous) attempts at German and Esperanto are included.
- Compact size. The program and its data total about 350 kbytes.
- Written in C++.
eSpeak can also be used with Speech Dispatcher.
Festival Lite
A small fast run-time speech synthesis engine. It is the latest addition to the suite of free software synthesis tools including University of Edinburgh's Festival Speech Synthesis System and Carnegie Mellon University's FestVox project, tools, scripts and documentation for building synthetic voices. However, flite itself does not require either of these systems to run.
It currently only supports the English language.
Festival
A general multi-lingual speech synthesis system developed at the CSTR [Centre for Speech Technology Research] of University of Edinburgh.
Festival offers a full text to speech system with various APIs, as well an environment for development and research of speech synthesis techniques. It is written in C++ with a Scheme-based command interpreter for general control.
Besides research into speech synthesis, festival is useful as a stand-alone speech synthesis program. It is capable of producing clearly understandable speech from text.
Speech Dispatcher
Provides a device independent layer for speech synthesis. It supports various software and hardware speech synthesizers as backends and provides a generic layer for synthesizing speech and playing back PCM data via those different backends to applications.
Various high level concepts like enqueueing vs. interrupting speech and application specific user configurations are implemented in a device independent way, therefore freeing the application programmer from having to yet again reinvent the wheel.
Internationalised Speech Synthesis
All the currently available free solutions for software based speech synthesis seem to share one common deficiency: They are mostly limited to English, providing only very marginal support for other languages, or in most cases none at all. Among all the free software speech synthesizers for Linux, only CMU Festival supports more than one natural language. CMU Festival can synthesize English, Spanish and Welsh. German is not supported. French is not supported. Russian is not supported. When internationalization and localization are the trends in software and web services, is it reasonable to require blind people interested in Linux to learn English just to understand their computer's output and to conduct all their correspondence in a foreign tongue?
Unfortunately, speech synthesis is not really Jane Hacker's favourite homebrew project. Creating an intelligible software speech synthesizer involves time-consuming tasks. Concatenative speech synthesis requires the careful creation of a phoneme database containing all the possible combinations of sounds for the target language. Rules that determine the transformation of the text representation into individual phonemes also need to be developed and fine-tuned, usually requiring the division of the stream of characters into logical groups such as sentences, phrases and words. Such lexical analysis requires a language-specific lexicon seldom released under a free license.
One of the most promising speech synthesis systems is Mbrola, with phoneme databases for over several dozen different languages. The synthesis itself is free software. Unfortunately the phoneme databases are for non-military and non-commercial use only. We are lacking free phoneme databases in order to be used in the Debian Operating System.
Without a broadly multi-lingual software speech synthesizer, Linux cannot be accepted by assistive technology providers and people with visual disabilities. What can we do to improve this?
There are basically two approaches possible:
- Organize a group of people willing to help in this regard, and try to actively improve the situation. This might get a bit complicated, since a lot of specific knowledge about speech synthesis will be required, which isn't that easy if done via an autodidactic approach. However, this should not discourage you. If you think you can motivate a group of people large enough to achieve some improvements, it would be worthwhile to do.
- Obtain funding and hire some institute which already has the know how to create the necessary phoneme databases, lexica and transformation rules. This approach has the advantage that it has a better probability of generating quality results, and it should also achieve some improvements much earlier than the first approach. Of course, the license under which all resulting work would be released should be agreed on in advance, and it should pass the DFSG requirements. The ideal solution would of course be to convince some university to undergo such a project on their own dime, and contribute the results to the Free Software community.
Last but not least, it seems most of the commercially successful speech synthesis products nowadays do no longer use concatenative speech synthesis, mainly because the sound databases consume a lot of diskspace. This is not really desirable for small embedded products, like for instance speech on a mobile phone. Recently released Free software like eSpeak seem to try this approach, which might be very worthwhile to look at.
Screen review extensions for Emacs
Emacspeak
A speech output system that will allow someone who cannot see to work directly on a UNIX system. Once you start Emacs with Emacspeak loaded, you get spoken feedback for everything you do. Your mileage will vary depending on how well you can use Emacs. There is nothing that you cannot do inside Emacs :-). This package includes speech servers written in tcl to support the DECtalk Express and DECtalk MultiVoice speech synthesizers. For other synthesizers, look for separate speech server packages such as Emacspeak-ss or eflite.
speechd-el
Emacs client to speech synthesizers, Braille displays and other alternative output interfaces. It provides full speech and Braille output environment for Emacs. It is aimed primarily at visually impaired users who need non-visual communication with Emacs, but it can be used by anybody who needs sophisticated speech or other kind of alternative output from Emacs.
Console (text-mode) screen readers
A thorough list is available on the console screen readers task page
BRLTTY
A daemon which provides access to the Linux console for a blind person using a soft braille display. It drives the braille terminal and provides complete screen review functionality.
The Braille devices supported by BRLTTY are listed on the BRLTTY device documentation
BRLTTY also provides a client/server based infrastructure for applications wishing to utilize a Braille display. The daemon process listens for incoming TCP/IP connections on a certain port. A shared object library for clients is provided in the package libbrlapi. A static library, header files and documentation is provided in package libbrlapi-dev. This functionality is for instance used by Orca to provide support for display types which are not yet support by Gnopernicus directly.
Yasr
A general-purpose console screen reader for GNU/Linux and
other UNIX-like operating systems. The name yasr
is an acronym that
can stand for either Yet Another Screen Reader
or Your All-purpose
Screen Reader
.
Currently, yasr attempts to support the Speak-out, DEC-talk, BNS, Apollo, and DoubleTalk hardware synthesizers. It is also able to communicate with Emacspeak speech servers and can thus be used with synthesizers not directly supported, such as Festival Lite (via eflite) or FreeTTS.
Yasr works by opening a pseudo-terminal and running a shell, intercepting
all input and output. It looks at the escape sequences being sent and
maintains a virtual window
containing what it believes to be on the
screen. It thus does not use any features specific to Linux and can be
ported to other UNIX-like operating systems without too much trouble.
Graphical User Interfaces
Accessibility of graphical user interfaces on UNIX platforms has only recently received a significant upswing with the various development efforts around the GNOME Desktop, especially the GNOME Accessibility Project.
GNOME Accessibility Software
A thorough list is available on the Gnome accessibility task page
Assistive Technology Service Provider Interface
This package contains the core components of GNOME Accessibility. It allows Assistive technology providers like screen readers to query all applications running on the desktop for accessibility related information as well as provides bridging mechanisms to support other toolkits than GTK.
Bindings to the Python language are provided in package python-at-spi.
The ATK accessibility toolkit
ATK is a toolkit providing accessibility interfaces for applications or other toolkits. By implementing these interfaces, those other toolkits or applications can be used with tools such as screen readers, magnifiers, and other alternative input devices.
The runtime part of ATK, needed to run applications built with it is available in package libatk1.0-0. Development files for ATK, needed for compilation of programs or toolkits which use it are provided by package libatk1.0-dev. Ruby language bindings are provided by package ruby-atk.
gnome-accessibility-themes
The gnome-accessibility-themes package contains some high accessibility themes for the GNOME desktop environment, designed for the visually impaired.
A total of 7 themes are provided, providing combinations of high, low or inversed contrast, as well as enlarged text and icons.
gnome-orca
Orca is a flexible and extensible screen reader that provides access to the graphical desktop via user-customizable combinations of speech, braille, and/or magnification. Under development by the Sun Microsystems, Inc., Accessibility Program Office since 2004, Orca has been created with early input from and continued engagement with its end users.
Orca can use Speech Dispatcher for delivering speech output to the user. BRLTTY is used for braille display support (and for seamless console and GUI braille review integration).
KDE Accessibility Software
A thorough list is available on the KDE accessibility task page
kmag
Magnify a part of the screen just as you would use a lens to magnify a newspaper fine-print or a photograph. This application is useful for a variety of people: from researchers to artists to web-designers to people with low vision.
Non-standard input methods
A thorough list is available on the Input methods task page
Dasher
Dasher is an information-efficient text-entry interface, driven by natural continuous pointing gestures. Dasher is a competitive text-entry system wherever a full-size keyboard cannot be used - for example,
- on a palmtop computer
- on a wearable computer
- when operating a computer one-handed, by joystick, touchscreen, trackball, or mouse
- when operating a computer with zero hands (i.e., by head-mouse or by eyetracker).
The eyetracking version of Dasher allows an experienced user to write text as fast as normal handwriting - 25 words per minute; using a mouse, experienced users can write at 39 words per minute.
Dasher uses a more advanced prediction algorithm than the T9(tm) system often used in mobile phones, making it sensitive to surrounding context.
Caribou
Caribou is an input assistive technology intended for switch and pointer users. It provides a configurable on-screen keyboard with scanning mode.