It’s by now a given that one of the best ways to eat better, whether that means consuming less calories and-or more vegetables and-or whatever, is to just actually pay attention to what we eat. This is the big secret behind dieting—any old fad diet is probably going to have a positive effect simply because the dieter is paying more attention to what they put into their face. And just by virtue of paying attention, they will probably eat healthier.
The fad diet usually gets the credit, but just caring at all goes a very long way.
Videos by VICE
Paying attention is hard, however. Even now, with nutrition information plastered all over the place and a million apps ready to log your every bite, staying on top of food intake is a real pain in the ass. How well can you quantify everything you’ve eaten over the past week?
Nutrition researchers from Tufts University together with engineers from MIT have a new solution to this problem of accurately and consistently logging food intake. Their prototype system, which was presented last week at the IEEE International Conference on Acoustics, Speech, and Signal Processing, is based on the machine learning subfield of natural language processing. It’s basic operation is to take a user’s sloppy human description of a meal and to convert it accurately into well-quantified nutritional information.
The first challenge in such a system is the same as in any natural language processing system, which is just making sense of human speech. At the bare minimum, it has to be able to figure out that, say, “ice cream” is something different than frozen cream or cream with ice in it. Or that the user is eating what is in the bowl and not the bowl itself. Even getting this far is pretty tricky.
Next, the system has to be able to make nutritional sense of the decoded speech. Here, the researchers turned to Amazon’s Mechanical Turk microlabor service, where they collected some 10,000 meal logs from Turk workers. Each log consisted of meal descriptions and sets of labels assigned to each food item, such as brand and quantity. This is the data the researchers used to train their system on.
So, the trained system was able to take a spoken description of a meal from some user and then assign tags or labels to different components relating to type of food, brand, quantity, and so forth. With these things isolated, it becomes just a matter of looking nutritional information up in USDA and Nutritionix databases.
MIT team member James Glass cautions that this is hardly a perfect system and that we should probably not be expecting it to appear in app stores anytime soon. “Speech recognition doesn’t work the way humans work,” he tells IEEE Spectrum. “That’s the direction it needs to go into, but it’s not there yet.”