Travis Love: February 2014

image from http://voicegal.wordpress.com/

As mentioned in part 1, with most speech recognition infrastructure being built on the concept of natural language, which is intended to communicate ideas, is it possible to grow speech recognition methods beyond anything more than simple dictation?

One of the important features that is desired within Eleuthera is the ability for users to be able to make their own phonetic choices. One person may be perfectly fine with saying "less than symbol" to get the output "<", while yet someone else may seek to make a simple chirp to get a similar result. While this function is all well and good, for me, this makes one very large assumption. That Eleuthera should continue the current industry method of programming; character by character text manipulation.

Natural Language vs. Programming Language.

As programming languages have evolved over the decades, it has come closer and closer to sounding like natural spoken language, except for its very heavy usage of symbols and shorthand words. For example, JavaScript is considered a 4th generation programming language and a simple code sample looks something like this:

var name = "Bob";

(function(name){

 if(name != "Bob"){

  alert("Your name is not Bob");

 }else{

  alert("Your name is Bob");

 }

});

The above example could be easily spoken out in natural language like so; "variable 'name' equals Bob, if name is not equal to Bob alert user 'your name is not Bob' else alert user 'your name is Bob'." Under the current speech recognition paradigm, which is established upon the concept of assuming natural language or communicating with humans, coding simple programs like that above could be possible.

However, in a world driven by productivity, the amount of extra "speech" would take up more time to say then it would to type, given that most modern text editors incorporate auto completion/expanders like Zen coding (Emmet). Likewise, more complex coding would be extremely difficult using this natural form because it removes the granularity of editing individual characters. To convert the above function from anonymous auto-executing, as it is now, to one with its own identifier, I would need the ability to navigate/speak individual characters. Speech systems like Google's voice API do not take this into consideration, speaking something like "A B C" is checked against a natural language library and tries to assess your intention, resulting in a possible "a bee sea" as a result. The current system assumes your intentions and returns what it "thinks" you were trying to say in relation to natural language not computer language.

These are my current thoughts, if we are to seek true progress in the area of speech-driven programming, then we cannot try to force programming language to check against natural language dictionaries like Google and Apple currently do, nor can we force natural language into a programmatic paradigm. Opening up how programmers speak their code to personal preference sounds like I'm advocating for complete anarchy and removal of standards, but if the results output conform to industry standards, does it matter how it is being "typed or spoken"?

As it stands, my plan is to maintain as close to granular control as most programmers have been used to do text editing as possible, though this may mean quite a lot of trial and error when defining which words fit within the "Goldilocks zone"; big enough to be a real word, but small enough to be productive.

Should speech programming focus more on the sum of the object's parts, rather than maintaining character by character dictation? For example, saying "function" would automatically produce the semantically correct code "(function(){});" and thus making it very software dependent, or should semantic control reside with the speaking user?

Speech driven apps have been the single biggest craze since Apple introduced Siri in 2010, giving voice recognition much-needed spotlight. For the last decade speech recognition has been locked down to proprietary OS-based devices and largely confined to voice dictation. The flexibility of controlling a device through voice commands has been paraded in Hollywood blockbusters for years, but with speech recognition becoming mobile, we are finally questioning its limits. With most speech recognition infrastructure being built on language intended to communicate ideas, is it possible to grow speech recognition beyond anything more than simple dictation?

Eleuthera: Speech Driven Programming Application.

In 2012, I was afforded the opportunity to participate in Google Summer of Code with an amazing group of people at the Inclusive Design Institute (IDI). Still in school, a disability beginning to affect the joints in my hands, and with no real programming experience outside of the classroom, I opted to take on the most challenging concept of all, to try and develop an application that would allow individuals to program using the voice on any mobile device purely using JavaScript.

No one else had even applied for the position at that time, nor has there been much research (that I'm aware of) been done on it since. I jumped straight in with what I called Eleuthera (freedom from chains), and quickly found myself drowning in all of the complexities involved in the project. Unfortunately, the project never really got off the ground because Google's open source speech API was still within its infancy, and microphone access still relied on Flash. Two years later, however, the landscape has changed and has breathed new life into Eleuthera. Over the next couple of months, I hope to continue to release posts about my encounters as a disabled programmer designing an application with the potential to aid other disabled programmers.

Travis Love

Wednesday, February 19, 2014

Speech-driven Programming: Are We Prepared? Part 2

Natural Language vs. Programming Language.

Tuesday, February 18, 2014

Speech - driven Programming: Are We Prepared? Part 1

Eleuthera: Speech Driven Programming Application.