August 19, 2022
Enlarge / Is our machine learning yet?

Over the past year, machine learning and artificial intelligence technology have made significant strides. Specialized algorithms, including OpenAI’s DALL-E, have demonstrated the ability to generate images based on text prompts with increasing canniness. Natural language processing (NLP) systems have grown closer to approximating human writing and text. And some people even think that an AI has attained sentience. (Spoiler alert: It has not.)

And as Ars’ Matt Ford recently pointed out here, artificial intelligence may be artificial, but it’s not “intelligence”—and it certainly isn’t magic. What we call “AI” is dependent upon the construction of models from data using statistical approaches developed by flesh-and-blood humans, and it can fail just as spectacularly as it succeeds. Build a model from bad data and you get bad predictions and bad output—just ask the developers of Microsoft’s Tay Twitterbot about that.

For a much less spectacular failure, just look to our back pages. Readers who have been with us for a while, or at least since the summer of 2021, will remember that time we tried to use machine learning to do some analysis—and didn’t exactly succeed. (“It turns out ‘data-driven’ is not just a joke or a buzzword,” said Amazon Web Services Senior Product Manager Danny Smith when we checked in with him for some advice. “‘Data-driven’ is a reality for machine learning or data science projects!”) But we learned a lot, and the biggest lesson was that machine learning succeeds only when you ask the right questions of the right data with the right tool.

See also  The Terraforming Mars card game is as good as we’d hoped it would be

Those tools have evolved. A growing class of “no-code” and “low-code” machine learning tools are making a number of ML tasks increasingly approachable, taking the powers of machine learning analytics that were once the sole provenance of data scientists and programmers and making them accessible to business analysts and other non-programming end users.

While the work on DALL-E is amazing and will have a significant impact on the manufacture of memes, deep fakes, and other imagery that was once the domain of human artists (using prompts like “[insert celebrity name] in the style of Edvard Munch’s The Scream“), easy-to-use machine learning analytics involving the sorts of data that businesses and individuals create and work with every day can be just as disruptive (in the most neutral sense of that word).

ML vendors tout their products as being an “easy button” for finding relationships in data that may not be obvious, uncovering the correlation between data points and overall outcomes—and pointing people to solutions that traditional business analysis would take humans days, months, or years to uncover through traditional statistical or quantitative analysis.

We set out to perform a John Henry-esque test: to find out whether some of these no-code-required tools could outperform a code-based approach, or at least deliver results that were accurate enough to make decisions at a lower cost than a data scientist’s billable hours. But before we could do that, we needed the right data—and the right question.