The idea is simple: if an NLP model is designed to converse with humans then what better way to see how well it performs than by talking to it? Dubbed the Dynabench (as in “dynamic benchmarking”), this system relies on people to ask a series of NLP algorithms probing and linguistically challenging questions in an effort to trip them up. The less the algorithm can be fooled, the better it is at doing its job.
What’s more, this dynamic benchmarking system is largely unaffected by the issues that plague static benchmarks. “The process cannot saturate, it will be less prone to bias and artifacts, and it allows us to measure performance in ways that are closer to the real-world applications we care most about,” FAIR researcher Douwe Kiela wrote in the post.
“The nice thing about Dynabench is that if a bias exists in previous rounds and people find a way to exploit these models…” Kiela told Engadget, “we collect a lot of examples that can be used to train the model so that it doesn’t make that mistake anymore.”
What’s really cool is that anyone can give Dynabench a try, it’s open to the public. Users simply have to log into the Dynabench portal to start chatting (via text of course) with a group of NLP models, there’s no experience required outside of a basic grasp on the English language. Moving forward, Kiela and his team hope to expand the system’s capabilities with more models, more modalities, and additional languages.
All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission.