image from http://voicegal.wordpress.com/ |
As mentioned in part 1, with most speech recognition infrastructure being built on the concept of natural language, which is intended to communicate ideas, is it possible to grow speech recognition methods beyond anything more than simple dictation?
One of the important features that is desired within Eleuthera is the ability for users to be able to make their own phonetic choices. One person may be perfectly fine with saying "less than symbol" to get the output "<", while yet someone else may seek to make a simple chirp to get a similar result. While this function is all well and good, for me, this makes one very large assumption. That Eleuthera should continue the current industry method of programming; character by character text manipulation.
Natural Language vs. Programming Language.
As programming languages have evolved over the decades, it has come closer and closer to sounding like natural spoken language, except for its very heavy usage of symbols and shorthand words. For example, JavaScript is considered a 4th generation programming language and a simple code sample looks something like this:
var name = "Bob";
(function(name){
if(name != "Bob"){
alert("Your name is not Bob");
}else{
alert("Your name is Bob");
}
});
The above example could be easily spoken out in natural language like so; "variable 'name' equals Bob, if name is not equal to Bob alert user 'your name is not Bob' else alert user 'your name is Bob'." Under the current speech recognition paradigm, which is established upon the concept of assuming natural language or communicating with humans, coding simple programs like that above could be possible.
However, in a world driven by productivity, the amount of extra "speech" would take up more time to say then it would to type, given that most modern text editors incorporate auto completion/expanders like Zen coding (Emmet). Likewise, more complex coding would be extremely difficult using this natural form because it removes the granularity of editing individual characters. To convert the above function from anonymous auto-executing, as it is now, to one with its own identifier, I would need the ability to navigate/speak individual characters. Speech systems like Google's voice API do not take this into consideration, speaking something like "A B C" is checked against a natural language library and tries to assess your intention, resulting in a possible "a bee sea" as a result. The current system assumes your intentions and returns what it "thinks" you were trying to say in relation to natural language not computer language.
These are my current thoughts, if we are to seek true progress in the area of speech-driven programming, then we cannot try to force programming language to check against natural language dictionaries like Google and Apple currently do, nor can we force natural language into a programmatic paradigm. Opening up how programmers speak their code to personal preference sounds like I'm advocating for complete anarchy and removal of standards, but if the results output conform to industry standards, does it matter how it is being "typed or spoken"?
As it stands, my plan is to maintain as close to granular control as most programmers have been used to do text editing as possible, though this may mean quite a lot of trial and error when defining which words fit within the "Goldilocks zone"; big enough to be a real word, but small enough to be productive.
Should speech programming focus more on the sum of the object's parts, rather than maintaining character by character dictation? For example, saying "function" would automatically produce the semantically correct code "(function(){});" and thus making it very software dependent, or should semantic control reside with the speaking user?