In an April 2018 paper coauthored with collaborators from the University of Washington and DeepMind, the Google-owned artificial intelligence company, [computational linguist Sam] Bowman introduced a battery of nine reading-comprehension tasks for computers called GLUE (General Language Understanding Evaluation). The test was designed as “a fairly representative sample of what the research community thought were interesting challenges,” said Bowman, but also “pretty straightforward for humans.”
…
The machines bombed. Even state-of-the-art neural networks scored no higher than 69 out of 100 across all nine tasks: a D-plus, in letter grade terms.
…
Their appraisal would be short-lived. In October of 2018, Google introduced a new method nicknamed BERT (Bidirectional Encoder Representations from Transformers). It produced a GLUE score of 80.5. On this brand-new benchmark designed to measure machines’ real understanding of natural language — or to expose their lack thereof — the machines had jumped from a D-plus to a B-minus in just six months.
…
But is AI actually starting to understand our language — or is it just getting better at gaming our systems?
…
“We know we’re somewhere in the gray area between solving language in a very boring, narrow sense, and solving AI,” Bowman said.
Read full, original post: Machines Beat Humans on a Reading Test. But Do They Understand?