What Can We Really Say About Generative AI?

If you’ve been in tech for more than a few years, you’re surely aware of the fact that vendors and the tech press will jump on anything that gains any user interest or traction. They’ll lay claim to the hot concepts even if it means stretching facts to (and some would say “beyond”) the limits of plausibility. Would it surprise you to hear that this is being done with generative AI? It shouldn’t, and that means we really need to look at the technology to see what could reasonably be expected.

Anything new in tech has to pass two basic tests. First, is there a value proposition that would promote its use? Second, could providers make money from it? If the answer to either of these questions is “No!” then we can assume that we’re riding a hype wave that will eventually break. How we qualify a “Yes!” answer would determine how fast and how far the technology could go. Let’s apply this test to generative AI.

Generative AI is a combination of two technical elements. One is the ability to parse a plain-language query and determine what’s being asked, and the second is the ability to gather information from a knowledge base to provide an accurate answer. I’ve tried generative AI in two specific forms (ChatGPT and Google’s Bard) and derived my own view of how well we’re doing at each of these two things.

It’s possible to frame a query that generative AI seems to understand without too much difficulty as long as the query is fairly simple. By that I mean that the query includes a minimal number of levels, which we’d represent in a logical expression by putting IF/THEN statements in parenthesis to represent that we want one to be based on the results of one at a higher level. As the complexity of the query grows, the chances that the result will be interpreted to match the intent of the user reduces quickly. Many of the most egregious generative AI failures I’ve seen result from this.

The second area, the ability to analyze a knowledge base go get an answer, is much more difficult to address. I’ve done queries that have produced in seconds what I’d have taken up to an hour to research using traditional search engine technology. I’ve also done queries that produced totally wrong results, so wrong that they defined logic. Over the last month, many end users of AI have emailed me with their own stories, which largely match my own, and we’ve seen articles describing the problem too. There is no question that generative AI can make major mistakes in analysis.

What makes things worse given both of the generative AI limitations I’ve described is the fact that it’s essentially impossible to check the results because you don’t know how they were derived. Decades ago, I designed an information query system for a publishing company that accepted parenthesized IF/THEN statements, converted them to reverse Polish format for processing, and then executed them on a knowledge base. When I did the execution, I created a log that showed what the derived reverse Polish expression was and how it was evaluated, with each step showing how many items passed the query at that point. If you selected everything or nothing, or just more or less than expected, you knew where you messed up.

You don’t get that with popular generative AI tools, so unless you have a feel for what the results should be you can’t spot even a major error. Even if you have a rough idea, you still can get a result that barely passes your sniff test and pay a price. That’s one of the biggest problems that users reported with generative AI, regardless of the mission. I saw it when trying to get the number of leased lines in service in the US; the results were totally outlandish and I didn’t know how the package came by them.

These problems are almost surely a result of immature technology. We expect a lot from generative AI, more than we should realistically, given the amount of experience the market has with the technology. Pretty much all AI is based on a rule set that describes how information is to be examined and how results are correlated, analogous to my reverse Polish parsing. Get the rules wrong, even if what you do is miss some important relationship issue, and it’s garbage in, garbage out. We’re getting more refined with generative AI every day, and the results are getting better every day, but right now any package I’ve looked at or had reported to me will throw a major flier occasionally, and that makes it hard to trust it. Packages that will log their rule application are the best ones because you can at least try to see whether the package did logical things, but of course if you have to research everything you ask a package in traditional ways, why use it?

OK, where do we stand on this point? I think the basic technology concept behind generative AI (or any other form of AI, in fact) is sound. What’s required is a bit of maturing, and a mechanism for defending or explaining results. Cite sources!

The business model side is even more complicated. In order for someone to make money on generative AI, meaning profit from their investment in it at any level, vendor or user, somebody has to pay something. As my Latin teacher might have said or a prosecutor might ask, cui bono? Who benefits? The answer to that depends less on generative AI as a technology and more on the mission we set for it.

Most people experience generative AI as an alternative to traditional Internet searching. That mission, and any mission that’s related to much of what ordinary people do on the Internet, is ad-sponsored. The problem is that generative AI doesn’t really offer much of an opportunity for ad insertion compared to traditional search. Getting right to the answer is great for the one asking the question, but not so much for those trying to profit from answering it.

The easy solution is to say that the questioner would pay explicitly, but that flies in the face of Internet tradition and a company who promoted the approach wouldn’t likely be offered much credibility in the financial markets. This is why generative AI isn’t likely to kill search unless somebody figures out how to monetize it.

The next issue is the knowledge base. Search engines crawl the web to index pages, and since most people who publish web content want it found, few opt to limit the crawling. Still, you can do that. Does a generative AI package do its own crawl, or take advantage of the crawling a search engine does? Does the fact that you have a website, or you’re a blogger as I am, mean that you’re surrendering your content to be used to develop a knowledge base? We’ve already had legal complaints on that point.

Some of these issues can be resolved if we assume a company uses generative AI on its own data, and that the package it uses provides the logging of query processing needed to validate the methodology. However, these applications are only now evolving, and user experience with them is both limited and mixed. What I’m hearing most often, though, is that the technology isn’t “revolutionary”, meaning that more standardized analytics tools or traditional AI/ML tools work just as well with less risk.

Enterprises seem to think that’s the case too. Companies who reported extensive analytics tool usage, without AI augmentation, expressed only a quarter of the level of interest in generative AI as those who had no significant analytics commitment. That reinforces what I think is the summation of the issues I’ve cited here. Yes, generative AI could be a valuable tool. No, it’s not yet proved itself to enterprises, and it’s hard to say how long that proof might take.