What is AI Good For?
The last thing you want to read is another uninformed opinion about AI, so here is another uninformed opinion about AI. If you want more informed opinions about AI maybe read Thomas Ptacek or Ludic.
I do not use AI much but I have typed text into a prompt, and I was exposed to some underlying principles of machine learning in grad school two decades ago. Thanks to the Dunning-Kruger effect, I have some thoughts about the topic.
I hoped I could elide over the differences between "AI" and "machine learning" and "large language models" (LLMs), but unfortunately the distinctions matter. There are many different things called "AI", but I will mostly focus on generative neural networks and LLMs. In the next section I will touch on my understanding of these terms but you might want to DuckDuckGo them to learn the distinctions.
Some people feel we will have superhuman-level artificial general intelligence by 2027. I am not inclined to believe this. Mostly I am an AI pessimist, because I am old and think all new technology is bad. I am also upset at the electricity consumption used by these technologies. I think we are in a hype bubble that will pop sooner or later, but once the bubble pops there will be some uses for AI that stick around.
Even though I have been exposed to some of the underlying theory AI seems like magic to me. When I was in grad school people characterized problems as being "NLP Complete", which means that solving a problem was as difficult as getting computers to process natural language. It seems to me that natural language processing has now been solved. We can now type arbitrary sentences into LLMs and get reasonable (although certainly not flawless) responses back. We can talk to Siri and Alexa and Hey Google and they understand us, in the sense that they can parse our speech and respond appropriately.
Even more impressive to are the image and video generation tools. We have tired of their outputs (the lighting, the incorrect numbers of fingers), but regardless these image generators seem like magic to me. I suspect that we will figure out how to use image and video generation in creative and fulfilling ways sooner or later.
I agree that LLMs are stochastic parrots and spicy autocomplete: given a sequence of tokens they generate plausible completions. That makes it seem as if the LLMs are doing intelligent things. Many critics say that this is not true intelligence. I agree in the sense that the LLMs don't seem to understand the subjects they are parroting. I disagree in that I am not sure the degree to which human beings are intelligent, and what the actual differences are. If you give me inputs I respond in particular ways, and so do LLMs. It's Searle's Chinese Room Thought Experiment all over again. LLMs may not be intelligent, but if they act as if they were sufficiently intelligent that may not matter.
I am more of an AI doomer than an AI apologist, but I have to acknowledge that there are a bunch of smart people who find AI useful. I find it difficult to believe that people I genuinely admire for their intelligence are deluded or lying when they say they find AI useful.
For the purposes of this post I will pretend that AIs are not resource consumption monsters that slurp up electricity and water and GPUs at enormous environmental consequence. If your view is that these environmental considerations outweigh any possible good that AIs can produce (which is mostly my view) then much of the analysis below is moot. The counterarguments to this position are (a) maybe AIs using up so much electricity is a good thing because the increased demand is spurring energy innovation; (b) as with computers, it is possible that AIs will become more powerful and efficient while using fewer resources -- a cellphone today is more powerful than the most powerful mainframe computer 40 years ago, and those mainframes used up orders of magnitude more energy to run; (c) maybe there will be some breakthrough produced by AIs that will make all the resource usage worthwhile. I am starting to believe that (a) might be true, but I am skeptical of (b) and (c).
I feel it is worth thinking through the ways in which AI works and the ways it doesn't. The purpose of this blog post is to explore different claims people make about AI and evaluate whether they make sense to me.
My Model of LLMs
My conceptual model is influenced by my grad school experiences, this essay by Stephen Wolfram, and a bunch of stuff from the Aboard podcast.
Somebody (maybe Paul Ford?) characterized LLMs as a lossy database of all knowledge. I think this is approximately true: we feed the entire internet into a neural network, and it stores that information in an imperfect way. There is a high-dimensional representation of everything that is possible in the world, and a subset of that space (which computer science people call "manifolds") that live in the bigger space and represent what is actually true. Ideally, the training data I feed into the LLM represent points on that manifold, and then the LLM's job is to produce data on the manifold. In an ideal world the LLM's modelling of the manifold would match reality exactly, so that any point the LLM found on the manifold represented some real thing. In practice this is not possible; instead points on the manifold represent plausible things that may or may not exist in the real world.
This is very abstract, so here is a physical analogy. Let's describe a machine learning model that is not quite an LLM, but which is close. (A machine learning model is a computer program which takes in "training data" that it then uses to make predictions about data it has not seen. There are many different kinds of machine learning models, but the popular approach these days are called "neural networks".) An LLM's job is to take a series of words and then predict the next word(s). This model's job is to model a piñata of an ICE agent that is hanging in the air (and for some reason does not move in space). The piñata is hollow, so there is an (approximately) 2D surface that exists in a 3D area. We will refer to that 2D surface as the "manifold". There is a real 2D manifold (the actual piñata) and the learned manifold (which is what the machine learning model is coming up with).
Now I train an neural network by feeding 3D coordinates into the model. For each coordinate, I include the label "on the piñata" or "not on the piñata". The model is given only individual data points, and from that has to figure out the underlying shape of the piñata. As a human you could probably do this if you took each of those points and plotted it in 3D space -- given enough points, a 3D visualization tool, and "on the piñata or not" information. Then if I pointed to an arbitrary new 3D coordinate you had not seen before, you would be able to guess whether that new point was part of the piñata. Machine learning models can do the same thing.
In addition to the 3D coordinates and "on the piñata or not" information, we might include other information with each data point. For points on the piñata, we might include the color of the point on the piñata, and what body part of the ICE officer that point represents. Then for any 3D point, the model could tell us the following:
- Is that point on the piñata?
- If the point is on the piñata, what color is it?
- If the point is on the piñata, what body part of the ICE agent does it represent?
Note that the machine learning model has not seen EVERY point on the piñata, because there are an infinity of those points. But for every point you ask of it, it could offer a guess. If most guesses correspond to reality (ie they predict the actual piñata correctly) then the model is good; if many guesses are incorrect then the model is not so good. There are two things to note here: first, the map is not the territory, so we cannot expect the representation of the manifold to be perfect; and secondly we should not expect every guess the model makes to be perfect -- even good machine learning models will be incorrect sometimes.
Once we have trained the model, we can ask it even stranger things:
- Give me the co-ordinates of a point not on the piñata.
- Give me the co-ordinates of a brown point on the piñata.
- Give me the co-ordinates of a point on the piñata that is part of the ICE agent's ear.
All of these are perfectly acceptable prompts to the machine learning model. The model is "generative" in the sense that it is working "backwards": we are asking for specific attributes and it is giving us 3D co-ordinates back, as opposed to only giving it 3D co-ordinates and asking for the attributes at that point.
My understanding of LLMs is that they are similar to a program modelling a piñata hanging in space, except that instead of representing only a few parameters (x, y, z coordinates, color, the body part corresponding to the piñata part) each point represents billions of parameters. The other big difference is that instead of predicting individual points on a piñata or individual attributes, an LLM is optimized to complete sequences of text: you give it some characters (which presumably represent some real text) and the LLM's job is to use what it has learned about language structure to plausibly add phrases to the text you have offered.
The word "plausible" is important here. Just as the piñata model does not know for certain where every 3D point corresponding to the piñata is, an LLM does not know for certain how every sentence in the English language ought to be completed. It is making guesses. When those guesses do not correspond to reality, we often call the results "hallucinations". But in some sense every guess made by an LLM is a hallucination based on plausibility.
The text you offer an LLM might be English sentences and paragraphs. But they might also be in a different language: say German. In that case the LLMs job is to translate the text by generating ("guessing") a sequence of German words that correspond to the English ones.
Unlike a search engine an LLM does not reproduce its training data exactly. If I feed in an essay into an LLM, and later query the LLM for the essay back, it will reconstruct the essay based on probabilities for what is plausible. This reconstruction is lossy, which is why you can have an LLM fed on the entire Internet, but which is only a few terabytes in size.
Back when I was a grad student we used to worry a lot about "overfitting" and "the curse of dimensionality". Overfitting meant that you would provide too much training data to a model, and then it would give you answers that corresponded too closely to the quirks of the training data and not closely enough to reality.
The curse of dimensionality had to do with the number of different criteria you were learning in a model (in the piñata model so far that dimensionality is 6: 3 x 3D coordinates, 1 x "on the piñata or not", 1 x color of the point, 1 x body part of the ICE officer). As the number of dimensions went up the amount of training data you would need for good models was supposed to increase exponentially. However, modern LLMs use billions of parameters, so something (maybe transformers? maybe deep learning?) has rendered the curse of dimensionality irrelevant.
I don't claim that this is a good explanation of LLMs -- I am sure there are many details and subtleties I am missing. But if this explanation is approximately correct, then we can start thinking about good and bad uses of LLM AIs.
A frequent theme (maybe the only theme) in these evaluations is the principle that the output of LLMs on their own cannot be trusted, because those results correspond to plausibility, not reality. When there is an external mechanism that can evaluate the generated results against reality, then the use of AIs may be okay; if we are treating the output of those AIs exclusively as truth, then the results are probably going to be very bad.
Search Engines
People try to use LLMs as replacements for search engines like Google. Even Google uses LLMs as replacements for Google, trying to synthesize knowledge from the web. To me this seems really dumb. The outputs generated by LLMs do not necessarily correspond to reality; only to plausibility. So they can produce something that sounds right, but unless there is some other mechanism to independently verify whether the answer is true, we cannot trust the correctness of these results.
People try to get around this by asking the LLMs for references to their answers, but this fails because the LLMs then just generate references that sound plausible! You need some third party mechanism to test whether those references exist and are correct. Currently we leave that as a responsibility of the user, which seems dumb because then search engines become labor-creating devices instead of labor-saving ones.
I think a search engine should be a search engine. A search engine's job is to index the Internet and produce actual links to actual websites. That is based in reality, in the sense that webpages are real. An LLM will tell us about hypothetical websites that plausibly ought to exist, but we care about actual websites that exist in reality.
Brainstorming
I think brainstorming is one of the best uses of an LLM. There are lots of ways in which people get stuck in mental ruts when trying to come up with possible solutions to problems. If you can't think of anything you might ask an LLM to brainstorm a list of possibilities for you. Then you can go through that list and figure out which ones are possibly useful. All of the ideas will be plausible (because they all come from the manifold the LLM has learned). They won't necessarily exist, but that's okay -- when brainstorming you don't necessarily need every solution to be realistic. The ideas an LLM generates can help you think laterally and avoid getting stuck in thought loops.
The downside is that an LLM will probably not help you brainstorm a genius idea nobody has thought of before, because by definition an LLM is learning a manifold, and genius ideas are usually far of the manifold. So you might get some ideas that do not exist yet, but you won't get anything that is paradigm-busting.
The important thing here is that the outcome of brainstorming is a lot of ideas. It is okay if some of those ideas are incorrect and stupid. The next step is to evaluate those ideas and pick out the best ones. If a human goes through those judging steps, then it does not matter if the AI hallucinates ridiculous things.
Training Data
Somehow it seems as if we are running out of real data on which to train LLMs. In addition people are pumping lots of AI generated content on the Internet, and AI companies are slurping up this generated data to feed into the next generation of LLM. Is this okay?
The worry is that this will create Habsburg AI, which drift further and further from reality as they learn from hallucinated training data. I think this is a real worry, and since there is currently no reliable mechanism to label AI generated data from non-AI data, training models will in fact slurp up this bad data and get worse.
I think there is an important caveat to this. Not all AI-generated results are slop. Some AI generated results are carefully curated by humans. (The old Deep Leffen account comes to mind.) The results are frequently entertaining and/or insightful, and thus transmute from AI slop into actual reality (where "reality" is defined as "genuinely valuable to humans"). I don't think that training AIs on this data is any worse than training AIs on data that is exclusively-human-generated. But note the caveat here -- these results are curated. Human beings evaluate them and determine them to be worthy. That is what distinguishes such artifacts from slop. The endless slop that is generated via LLMs and never curated by humans seems like terrible training data to me.
Programming
This is an interesting one. On the one hand, AI generated code terrifies me. LLMs are trained on all kinds of computer code, and according to Sturgeon's Law most of that code is terrible. Thus we would expect the median LLM-generated code to be terrible too. From a cybersecurity perspective this seems like a disaster. One small coding error can lead to a level 7 CVE vulnerability. So how can we expect generated code to be secure?
The good news is that programming is a domain that has lots of potential guardrails. Programming languages that are strict on input and which fail fast are much better than ones that are permissive. So Rust and Haskell might be good candidates for LLM-generated code, and Perl or Javascript might be bad ones. The more a compiler can reject code that is unsafe, the more trustworthy the LLM-generated code can be. A compiler and LLM can have a back-and-forth conversation about code, with the LLM iteratively submitting code to a strict compiler until it compiles.
Testing frameworks are another potential guardrail here. You can have the LLMs generate code and then run that code against test suites. When the code fails its test suite the LLM gets information that its code is incorrect, and it has to resubmit code to pass the tests. But there are some dangers here: firstly, a clever LLM can write code that passes the tests and nothing else. Secondly, programmers find writing tests boring, and so might outsource their unit tests to an LLM. That can be fine if the programmer then carefully checks and approves the generated tests, but if the programmer is blindly trusting the test suite an LLM generates then the programmer is in for a bad time.
Overcoming Skill Gaps
One important question with respect to LLM-assisted programming is whether the programmer learns anything. My own feeling is that this can go in two ways. If a programmer is willing to put in the work to understand what the LLM is generating, and learns from that to improve their own programming practices in the future, then LLMs can be helpful. But if the programmer blindly trusts what the LLM generates (or does not have the domain knowledge to determine whether the LLM is generating good code or not) and does not put in the effort to develop that domain knowledge, then LLM-assisted programming seems like a terrible idea.
We know that LLMs are getting better at generating code. Say that a particular code snippet is 85% correct, but is still not working. A good test here is to observe what the programmer does in this situation. If the programmer takes that 85% code and then manipulates it (maybe manually, maybe by looking up or generating code to fix the known problem spots) to get that last 15%, then maybe the programmer is gaining knowledge. If instead the programmer feeds the 85% code back into the LLM with some modified prompt in the hopes of getting 100% correct code, then that is a bad smell.
Unfortunately when presented with a magic prompt that can generate anything, the temptation is to keep throwing queries at the prompt until it gives you what you want, instead of taking a step back and carefully evaluating the best next step. This is a baby programmer mistake, and I have seen similar things happen again and again when teaching programming courses (namely, the baby programmer keeps typing code into the computer instead of stepping back and re-evaluating). If we are reinforcing that bad habit via LLM-generated code then we are in for a bad time.
I am not sure what this new trend of "vibe coding" means, exactly, but it sounds really sketchy to me.
The other question with respect to skill gaps is whether there will be any paid opportunities for junior programmers in the future. I do not have a good sense of this. I think that the existing concept of a junior programmer will probably disappear; I do not know whether it will be replaced with somebody who is a "junior prompt engineer".
Programming as Management
I have sometimes heard LLMs described as "enthusiastic junior programmers". [citation needed] They will try hard at whatever task you throw at them, but unless you supervise them closely and break down tasks into tiny little bits for them to perform, they will go off the rails.
Let's say this is correct. This means every programmer using LLM-assisted coding becomes a manager, supervising one or more enthusiastic, error-prone junior programmers. Maybe this makes people more productive, but it sounds quite unfun to me.
Furthermore, human junior programmers do something magical: they learn from their mistakes, and mature into intermediate programmers. Do AI junior programmers do the same, or are you forever supervising enthusiatic junior programmers who keep making the same mistakes again and again? One form of "upgrade" might be in AI model improvements from the giant LLM companies -- as their models improve maybe the junior programmers become more skilled. In addition, as programmers get better at writing their prompts maybe their AI junior engineers do fewer stupid things (or at least do more interesting stupid things). But it is not clear to me that the payoff for training a junior AI programmer is as good as training a junior human programmer.
Prompt Semantics and Prompt Engineering
Sometimes it seems that the main task of LLM-assisted programmers these days is coming up with more and more elaborate prompts -- giant walls of text about things to avoid and tones to take and pretending to be experts in certain areas. This all seems gross to me, but if it turns you into a 10x programmer (and gets you 10x the salary?) then maybe it is fine. My aesthetic sense suggests that these elaborate prompts are boilerplate, and in my opinion this boilerplate should be baked into the LLM in the first place.
What is interesting to me about prompts is that they are a form of programming language. They are a set of constraints programmers devise to get their magic black box LLMs to do the right thing. Unfortunately, both the syntax and semantics of this programming language are ill-defined, and the outputs are nondeterministic: you can put the same prompt into an LLM twice and get different results each time.
This seems insane to me. For years we have learned that computers will do exactly what you tell them, and the art of programming was understanding how to get the computer to do the things you want in a repeatable and testable way. Now that philosophy has changed? Or maybe it hasn't. Maybe "prompt engineering" is just a higher-level programming language that people have to learn, in the same way that people write Python code instead of programming in assembler. In that sense, maybe programming has finally achieved the dreams of "beginner-friendly" "non-technical" programming we have promised since FLOW-MATIC and COBOL -- regular people with no previous training can express what they want the computer to do, and then it just does it. But in that case why are we supposed to spend an hour each day practicing our prompt engineering so that the LLM genie does not grant our wishes in unexpected ways?
TRACE
I do not know where Paul Ford's mnenomic TRACE fits into this essay, but he proposed it in the context of programming so here it is. TRACE is a heuristic for evaluating whether LLMs are doing good or bad. It stands for:
- Transparent
- Repeatable
- Actionable
- Clear
- Efficient
The podcast elaborates on the components of this acronym. I found it interesting because it synthesizes the things we want out of LLMs, and why they frustrate us when they fail.
Domain Exploration
Say I know nothing about a particular domain. Can an LLM help? My feeling is that as long as you are not seeking very specific knowledge, LLMs might work for this use case. You could ask the AI for broad strokes of knowledge, and you might receive vocabulary terms in response. The vocabulary could direct you what to look for in subsequent web searches. As usual, if you blindly trust all the domain information the LLM gives you without external validation, you are asking for trouble.
It is also possible that LLMs could help people understand broad domain concepts. I think that good explainers of domain concepts are both invaluable and rare, and thus LLMs might fill a niche here. But again I do not know how much I would trust the results.
Tutoring
This is a tricky one. In some sense you should not need an LLM to help you with tutoring. Wikipedia exists, as does Khan Academy. For many topics there are plenty of learning resources online. By this standard we should all be supergeniuses. Unfortunately, the Internet also offers a bunch of distractions, so instead of spending my time exploring topics deeply and gaining tractable skills, I watch dumb videos for dopamine hits.
In a similar way, I think an LLM could be useful for tutoring, but I am not sure most people will use the LLM wisely. The big advantage an LLM tutor might have over Wikipedia (or even a textbook) is that an LLM can generate exercises and sample questions for you on demand. Whether it can solve those exercises correctly is another question, but having a bunch of exercises to work through can be very helpful, especially if those exercises are matched to your current level of understanding.
The big trap here is being lazy and letting the LLM solve your exercises instead of just using the LLM to generate exercises for you. Why bother thinking and learning when the LLM does it all better than you do anyways?
Institutional Education
By "institutional education" I mean structured learning at schools and universities. As far as I can tell LLM plagiarism machines have destroyed institutional education. That may not be a surprise, given Sam Altman's contempt for university learning, but it feels like a violation.
Here's how things used to work: some poor instructor (sometimes me) would try to set assignments. The purpose of an assignment is to get students to internalize whatever skills or content I am trying to teach. In order to do this, as the instructor I have to set assignments aimed at the right skill level, and because this is an educational institution that grades students, I have to create assignments that are possible to mark consistently. In computer science, this usually meant that I could expect most student submissions to be similar. There was some fairly medicore plagiarism software that checked for identical or overly-similar submissions (hello changing variable names) but mostly I could expect most submissions to be similar. Often the most effective way to identify cheating consisted of sets of solutions that failed in similarly-suspicious ways. But correct solutions usually received little scrutiny. This system has lots and lots of flaws, but for a good fraction of students it did the job -- the students went through the exercises, they internalized some of the knowledge and skills they were supposed to learn, and they received a mark that was loosely correlated with that skill development.
Then along came the plagiarism machines, and everything fell apart. The plagiarism detection software doesn't work anymore, because the output of an LLM is nondeterministic and thus will be different for every plagiarising student. Meanwhile, if I expect students to solve the exercises the "right way" then those exercises have to be at a manageable level, which means they are easy enough for LLMs to solve trivially.
In my experience, many students (including me when I was a university student) care much less about learning than they do about grades, and if LLMs mean they get excellent grades they are happy not to learn anything. If a student wants to be honest but all of their peers are cheating, the pressure to cheat becomes very strong, which is precisely why pursuing plagiarism cases is worthwhile. So to a first approximation, we can assume that most or all students will just have LLM plagiarism machines do their homeworks for them.
Maybe those students are learning other skills, like prompt engineering? My guess is that they aren't. In large classes like the one I used to teach, well-constructed assignments need to be unambiguous and precisely defined. These kinds of assignment specifications can be fed into LLMs more-or-less verbatim, and I would expect the LLM to do a reasonable job.
Having written all this, people are still teaching university and high school classes, and presumably they are still offering assignments. I don't know how they do it. Instructors are clever and I am sure there is a vast literature on how to cope with LLMs, but I have not read that literature. Here are the options that occur to me:
Pretend that LLMs don't exist, and carry on as before. My guess is that most students end up getting LLMs to do their homework and don't learn anything.
Cleverly integrate LLMs into their assignments. Mandate LLM use for their courses. I guess this could work, but I have problems understanding how it would be useful for the introductory programming courses I used to teach. One useful exercise could be getting students to correct LLM-generated code, but in my experience these kinds of exercises do not scale to large classes because they are difficult to mark.
Escalate the arms race by making assignments too difficult for LLMs to solve. In this case you have just made assignments too difficult for honest students to solve, so they turn to whatever assistance they can.
Test learning in assignments in some other way -- for example via oral interviews. This has several disadvantages (a) it does not scale, (b) once the students know what the questions in the oral interviews are, they can memorize the answers, (c) some students are good at doing assignments and bad at oral interviews, and these students will be punished.
Forget about computers and get everybody to do all assignments with pen and paper. In addition to being a pain to mark, it does not solve the problem: the student can just get an LLM to solve the homework, and transcribe the LLM's answer.
I am sure that smart instructors know what to do now that we have (non-consensually!) had LLMs imposed on us. But I have never been a smart instructor, and in this domain I feel despair. I imagine it might be even worse in the humanities, where the usual artifacts from assignments are essays.
Summaries
Lots of people want to use LLMs to summarize blocks of text they don't want to read. I think this is a pretty bad use. When reading a block of text I am most interested in information that I don't know already, and that will be correlated with information that is not in the training set of the LLM and might even be far off the learned manifold. I am not sure that asking an LLM to summarize actually summarizes the document, but I don't know this for sure.
For example, say that I want to understand some proposed legislation, or a website's terms of use policy. Small changes of wording mean a great deal in these domains, and unless I can trust the LLM's summary to pick up on those subtleties I cannot trust the summary.
Translation
Many people use LLMs to translate text from one language to another. I think that using this to get a broad sense of the text might be okay, but again depending on the LLM to get the details right seems dumb. Small changes of wording mean a lot. Context means a lot. Tone means a lot. Unless the LLM can pick up on these subtleties it cannot be trusted.
Transcription
As opposed to converting between one language and another, I feel that LLMs are pretty good at converting spoken speech to text and vice versa. I feel this solves a real problem. Once upon a time I tried to release some training videos, but doing so meant generating subtitles for every word that was said. This was so tedious and slow that I became reluctant to create training videos. But an AI that could make a reasonable first draft of the subtitles would work well. People would still have to read through the subtitles to make sure they were correct, but in this case they do not need deep domain knowledge to check whether the LLM did things correctly: they can judge whether the transcription is accurate fairly easily.
Bullet Points
Some people use LLMs to expand bullet points into a longer document. Some people take longer documents and summarize them into bullet points. The classic example of this is cover letters.
In one sense, this seems like the stupidest use of computers ever. Why would you use lossy expansion to generate a cover letter and then use lossy compression to read it? In this case the problem is the expectation that people have cover letters. We should just send each other short lists of bullet points. Unfortunately, that does not work because of decorum, so we are stuck in this terrible situation of writing documents nobody wants to read, only to have machines compress those documents to imperfect summaries.
If you are taking a concise list of bullet points and getting an LLM to expand them into a full document, then what additional information does the LLM introduce to pad out the length? Can this additional verbiage be trusted? My own bias is to say that if you are not even willing to read a document that you generate for accuracy, then nobody should be expected to read that document.
I also worry about the usual problem of domain knowledge. Say I am not a lawyer, but I want to draft a legal document that expands upon some bullet points. I am pretty sure I will get something legal-ish from the LLM. Should I trust that this is a good legal document even if it is written in legalese? Without some legal training of my own, it seems that this is a dangerous strategy.
Slaves
The elephant in the room is that nobody wants to employ human beings. Human beings are irritating and cost a lot of money. It would be better to have an army of slaves doing your bidding. If those slaves are synchophantic computers who respond enthusiastically to your requests, so much the better.
It is quite clear to me that we are moving towards a future in which slavery (of computers and people) is becoming more socially acceptable. We are pretty close when it comes to gig workers already.
Would we better with a bunch of AI slaves? My feeling is that it would make everything cluttered and full of friction. Techbros talk about having "agents" (aka slaves) that will schedule flights and haggle the best prices with contractors, but on the other hand the flight-schedulers and contractors will have their own AI agents haggling for the other side. It is also not clear to me that if I have AI slaves that I am renting from some giant platform that these slaves will work in my interests, as opposed to working in the best interests of the techbros who own the giant platforms.
We see this promise again and again. Gig work was supposed to be "flexible" and liberate a lot of people via extra income; instead many people (maybe most people?) go broke on the gig platforms because (a) they have to compete with others who undercut them, (b) the platforms download all the costs (eg maintaining an AirBnB house, maintaining an Uber/Lyft car) onto those gig workers, and (c) the platforms reserve the right to drop any worker it does not like, for pretty much any reason.
Similarly, if I have an AI slave that is supposed to book hotel reservations for me, why do I believe that this slave will book me a nice hotel at the best price, as opposed to the hotel that gives the AI slave's tech platform the best kickback?
Idea Bros
There is a cliché of the would-be tech founder who has a great idea and "just needs a technical co-founder to code it up", at which point the aforementioned tech founder will be rich enough to purchase US presidencies. Unfortunately, there is another cliché: genius is 1% inspiration and 99% perspiration. The ever-optimistic founders think that LLMs solve this problem. Do you have a great idea, or even a "great" idea? Fire up loveable.ai and get a prototype of that idea in minutes. All done! Riches galore!
I think that AI will make it much easier for these Idea Guys to pump out prototypes. I don't think it will make (most of them) successful enough to purchase US presidencies. For one thing, if you have a great idea then that idea is probably floating around the noosphere, and somebody else has the same idea. So now you are back to competitive capitalism, which presumably requires more than a tech demo to win. Secondly, these great ideas tend to skip the details, and that's where the devils live. Those products that are successful are the ones that vanquish those devils -- and that requires domain knowledge, skill in navigating those difficulties, and dollops of luck.
Powerful LLMs have been around for a few years now. We should be awash in world-changing products that were built by AI. Have we been? I am not sure, but I am going to guess we haven't. Why not? Were the LLMs just not advanced enough?
I think that using LLMs for prototyping is probably fine. Startups sometimes raise millions of dollars with less. Fast prototypes mean that you can try lots of things and discard the things that don't work. But at some point I feel you need to dig deep and solve the hard problems that others have not been able to tackle, and I think that requires hard work.
Completing the Last 10%
One frustrating pattern I have seen is that LLMs can generate a demo that looks slick but has problems under the hood. Maybe with skilled prompt engineering those demos can be 90% done. Who gets the last 10% done? Who has the domain knowledge to do it? And how do they develop that domain knowledge for the 10% when they have let the LLM do the learning/thinking for them for the 90%?
Stealing Time at Work
There are idiotic memes about getting AI to solve your work problems in minutes, so you can spend hours relaxing. Say that these claims are true, and that an LLM can transform you from a 1x employee to a 60x programmer, so that you can get all your day's work done in 8 minutes instead of 8 hours. Do you think that your boss will be happy with that level of productivity? Or will your boss expect 60x the output from you?
I am a firm disbeliever that productivity gains benefit workers much. I think back to the revolution in desktop publishing. Suddenly instead of making posters with Zipatone sheets and manually pasting text on a page, we could use desktop publishing programs to make posters. It was much more efficient! And then the expectation became that in addition to everything else people did when advertising events, they would make nice computer-generated posters as well. When color printing became common the problem just got worse.
We don't have paper posters any more, but now we have Instagram reels and Youtube shorts and Fediverse posts to generate for any event. The amount of work expected of us went up, thus costing us additional time despite our greater efficiencies.
Research
This is an interesting one. The purpose of research is to discover things that are true, but which nobody to this point has been able to demonstrate is true. When an LLM learns a manifold, it is trying to learn all the things that are probably true. Some of the points on the manifold are known to be true; these correspond to the LLM's training data. Some points on the manifold have not been explored by humans. If an LLM "hallucinates" these points but humans can prove they are correct, then those humans have accomplished research.
However, there are also some concepts which will not be easily for LLMs to explore, because there is no training data corresponding to those true things even though they are true. Many big advances in research happen when people make weird observations that don't fit existing models, and then have an insight which causes them to re-evaluate and re-form the model to something that is more true. Maybe LLMs can help identify some of those weird outlier observations, but in my experience from machine learning, usually outliers are thrown out of training sets. Maybe the LLM can even find patterns in the outlier data that suggests a new hypothesis. But I don't think an LLM can do all the research here; humans will still be needed to carry out experiments independent of the LLM to confirm those hypotheses.
There are probably even wilder hypotheses which fit weird data but which LLMs probably will not be able to hypothesize. The question then is whether humans (or maybe humans equipped with non-LLM tools) will be able to form and test these hypotheses.
That all sounds pretty sunny, but in practice I think LLMs will ruin research just as thoroughly as they are ruining education. Journals will be filled with plausible but incorrect hallucinated AI slop. Researchers searching for references will be given hallucinated citations to papers that don't exist. And if/when LLMs become trendy in research, then researchers will tailor the questions they ask to ones that the LLMs can answer, rather than tailoring questions to what is unknown in the world.
Conclusion
It seems to me that people are using LLMs in a lot of stupid ways, but what do I know?
It seems that the only ways to use LLMs wisely is to vet their output, which means somebody still needs to retain enough domain information to tell whether the LLM is on or off the rails.
It seems to me that machine learning for classification is one of the big wins. Such classifiers have been around since prehistoric ages (that is, when I was in grad school) and in principle should be getting better and better. Generative AI is just running these classification models in reverse. But I am not hearing much about these applications from the AI bros.
I am quite worried that human beings will be further de-skilled because they rely on AI. Given that LLMs are inherently unreliable, this seems bad to me.