How can we determine in practice whether a machine can understand? In 1950, the computing pioneer Alan Turing tried to answer this question with his famous “imitation game,” now called the Turing test…. Unfortunately, Turing underestimated the propensity of humans to be fooled by machines.
In a 2012 paper, the computer scientists Hector Levesque, Ernest Davis and Leora Morgenstern proposed a more objective test, which they called the Winograd schema challenge. This test has since been adopted in the AI language community as one way, and perhaps the best way, to assess machine understanding. It consists of a pair of sentences, differing by exactly one word, each followed by a question. Here are two examples:
Sentence 1: I poured water from the bottle into the cup until it was full.
Question: What was full, the bottle or the cup?
Sentence 2: I poured water from the bottle into the cup until it was empty.
Question: What was empty, the bottle or the cup?
…
In each sentence pair, the one-word difference can change which thing or person a pronoun refers to. Answering these questions correctly seems to require commonsense understanding.