Dave Grundgeiger is a self professed artificial intelligence guy. As a fan of fictional AI systems such as J.A.R.V.I.S. in the Iron Man movies and Samantha in Spike Jonze’s Her, Grundgeiger, a senior software engineer and project architect at SOLOMO Technology, wants to see conversational artificial intelligence come of age.
And that’s just what he’s setting out to do with his Sofilia software project.
Videos by VICE
“You know what it’s like to have a great conversation,” reads the Sofilia Kickstarter page. “The other person gets you. You exchange ideas, not just words. It’s mentally and emotionally satisfying.”
With Sofilia, he is building existing natural-language software to convert natural-language text for human-computer conversation. Sofilia would do this by using augmented grammar to convert natural-language text in a two-way path (more on that below) with what’s called a “semantic triple” database (“triplestore”), which allows computers to “store, manipulate, and reason with meaning.”
“A semantic triple is nothing more than a relationship between two things like ‘DJ, loves, pizza,’ which consists of two “entities”—DJ and pizza, and a relationship between them (loves),” explained Grundgeiger, by way of analogy. “The order matters, so ‘pizza, loves, DJ’ is not the same. It turns out that it’s a flexible and powerful way to model the world.”
As Grundgeiger notes, natural-language software currently only allows users to do something like set a timer on their phone, for instance, but it doesn’t enable actual conversations between humans and artificial intelligence.
The core difference between Sofilia and most natural-language software, according to Grundgeiger, is that the latter works by analyzing word patterns. The internet’s growth over the last 15 years has yielded a huge amount of text available to search. If a user asks a search engine a question, it’s a simple grammatical exercise for it to reformat the question into a statement, then use that statement as a means of searching billions of web pages for a matching sentence. Given the sheer volume of written work that’s now available, there’s a good chance that someone, somewhere has already directly answered your question in a web page.
Sofilia throws this type of text matching completely out the window. Instead, it goes for a complete semantic “understanding” of the sentences people type or say. Sofilia does this by using a proprietary technique, a new type of knowledge graph, to “transform the grammatical structure of sentences into a knowledge graph representing the real-world things and relationships that are described in the sentence.”
This knowledge graph is created entirely by Sofilia in response to human language input. Google curates a database they call Knowledge Graph, which contains information about things in the world and the relationships among them. Other examples exist, but all of them are curated by people.
Sofilia builds one internally on-the-fly. The proprietary mechanism that allows her to do this is augmented, bidirectional grammar. That is, a mechanism that translates natural-language input into a knowledge graph and vice-versa. Grundgeiger said that searches are done using sub-graphs (a piece of the knowledge graph) that represent a question.
“Search is just the beginning,” he said. “Reasoning about knowledge is done by applying transformation rules to the knowledge structure. Whole conversations are also represented and related to each other in the graph. Sofilia is a platform for machine conversation and comprehension, not just search.”
Search is just the beginning.
So, will the Sofilia experience actually be something like talking to Samantha or J.A.R.V.I.S.? Eventually Sofilia will be able to support such conversations, but Grundgeiger emphasized that the initial release will be more modest. The first Sofilia iteration will be conversational, using the grammar and reasoning level of a first-grader. This is the target set in Microsoft Research’s Machine Comprehension Test, and no current software has come close to it.
And if you’re wondering what Sofilia’s voice might sound like, Grundgeiger said that initially it will sound like the third-party speech libraries that people are already familiar with, like Siri and Cortana. But, the same technology that transforms the internal world model into human language text can also be used to create speech sounds directly by using a “grammar” of speech acts instead of just words, resulting in a much more natural voice. Grundgeiger said that this technology can drive any communications mechanism, like facial animation and body language.
What exactly would the Kickstart campaign fund? Grundgeiger and his team have tested Sofilia on small datasets with grammar created by hand. With funding, they would be able to create a larger grammar file for Sofilia. They would also be able to have Sofilia evaluated in an industry-standard test, and then allow average users to give it a run.
“Every single word in every sentence is parsed, understood, and related to the rest of the sentence, the conversation, and the world,” writes Grundgeiger. “This is not text-pattern-matching software, but true natural-language understanding. Sofilia technology will enable not only better Siri-style software, but true JARVIS-style software—real dialog, real conversations.”