Every person and their dog is now talking about ChatGPT, but for good reason. Those who have used it have had a glimpse into the future.
What if a machine could write that email draft, or help brainstorm ideas for a marketing tagline? What if we could get a 5-bullet summary of a 200-page document, with enough information to decide if it was worth reading any further? It’s everything Clippy dreamed of growing up to become.
The reality is it’s already useful enough to do those things, and all the complaints we have today, such as it not being good at maths, or hallucinating facts are not insurmountable problems, and can actually be fixed with the tech we have today (through re-prompting – taking the output of the prompt and trying to source where it could look up that information and referencing, or by adding a specialised interpreter for adding, subtracting and multiplication – which computers are already very good at)
There is real empirical evidence that business professionals’ productivity improved by 59% when using ChatGPT, and the rated quality of the documents they ended up writing is much higher. See ChatGPT Lifts Business Professionals’ Productivity and Improves Work Quality.
With the advent of Plugins, Large Language Models will be able to look things up, book that romantic meal for 2 for you, and even help plan a trip abroad with up-to-the-minute flight pricing and availability to hand.
The underlying model takes a bunch of words as input and is able to predict the next word. The words are converted to tokens, 1 common word is 1 token, and more complicated words are converted into 2 or more tokens. OpenAI prices their latest model input at $0.03 per 1,000 tokens, and output at $0.06 per 1,000 tokens, which makes it very affordable for even larger automation workloads, especially compared to manual processes.
There have been many write-ups on ChatGPT and Large Language Models in general.
The best writeup is Stephen Wolfram’s piece, he’s able to simplify the concept to the point where it still remains truthful, unlike some other writeups.
The model is able to calculate the odds that a certain word is going to appear next. Always picking the most likely one results in very dull, flat writing that isn’t very fun to read.
Researchers have found that adding a certain amount of randomness into the words that are picked more closely mimics a human-like output. LLMs call this ‘temperature’.
Set to zero, or cold – this is deterministic. You’re always going to get the same result. This can be good for use cases that involve scoring, tagging and categorisation. Or more complex cases like checking content for terms of service violations like sharing emails/phone numbers on platforms that don’t allow users to circumvent the in-app contact systems (either for safety, or more often commercial and insurance reasons).
Set to hot (1.0), this is a lean-mean-poem-writing machine. Coming up with a stream of ideas, like an enthusiastic intern if it were somehow possible to drink 10,000 coffees. Not all of it will be good, though as humans we’re quite good at sifting through and picking out the gold.
Setting this to somewhere around 0.7 seems to be the sweet spot here, and can generate professional document outlines and templates.
There is some nervousness around job displacement, but the world of work is not a zero-sum game. One person winning doesn’t mean another person loses. Ultimately, the list of things humanity wants to do is infinitely long, and any boosts in productivity mean we work through it quicker. Will you ever complete all the tasks in your backlog? Or will you continue to iterate and improve the products you’re working on?
There may be some short-term turbulence as the market shakes up and adjusts, but ultimately we will end up doing less boring stuff, and more of the things that we want to do and are good at.
In software engineering, web applications will be able to self-heal by suggesting bug fixes based on real-world errors. That glue code between two platforms can be generated automatically, and that security vulnerability you didn’t spot will be picked up by an automatic pull request review bot. All of these things just improve our output and don’t replace the person doing the work.
While AI won’t replace you or your role, a person or company that uses AI in an effective way almost certainly will.
Writing a single sentence and expecting an amazing report or think-piece out the other end is completely underestimating the effort and thought that goes into creating a piece of work. Much as proper artists will baulk at the idea of DALL-E replacing artists, copywriters will baulk at the idea that ChatGPT is going to replace them.
These are tools, and the output is only going to be as good as what you put in.
A template legal document could be made bespokely on-the-fly, but it won’t give you the years of experience a legal professional will to ask the right questions, and unpick the detail.
As a professional using AI to assist their work, you need to be mindful that everyone else will have access to the same tools too. At first the passable become good, the good become great, but then the new bar is set – and expectations adjust. The genie is out of the bottle, and it’s not going back in. Not using AI to aid your process is going to put you at a disadvantage, but writing a lazy prompt and hoping for the AI to do your work for you is not going to cut it.
There is also the problem of bias in the training data, which brings us neatly on to our next point.
We live in a biased and flawed world, which can be mirrored in the training data for any large model.
Imagine a naive implementation of an LLM like a child learning from its environment. Any continued exposure to bias or prejudice could mean they’re likely to adopt the same biases too. They just mimic what they see in the world around them.
AI researchers know this, so they work hard to make sure that the data they feed it is as unbiased as possible. Also after training the models, they test for biases in the output, then use debiasing algorithms to reweight the model against those.
At a higher level, if you have a pre-existing LLM to work with, you can take any potentially biassed output that you want to send to a user, and re-run it through the model with an additional prompt that checks for biases. This will either try to remove them or make the model refuse to answer.
You can add specific instructions to the original prompt in certain cases, though these can sometimes be worked around. You may have seen that ChatGPT refuses to give instructions on how to make a bomb, but it used to happily write a play about someone making a bomb. The ‘reprompting’ fixes these issues, but does increase cost and latency (both these things will come down in time).
The risk with current language models isn’t around it ‘taking over the world’, it’s in how AI is applied to decisions that a human would or could make.
An example of unacceptable risk, would be the social credit scoring system as devised by the Chinese government. This score is used to allow or deny access to certain government services. A bias or flaw in the calculation could exclude people unintentionally. China doesn’t have the best track record here either.
Many on the internet have claimed that Large Language Model AI will never be able to think and reason similar to how a human would, and that its knowledge can never be up-to-date. This is already demonstrably false.
Simply asking the AI to ‘show its working’ (yes, like your old maths teacher insisted) dramatically improves the model’s ability to solve problems. There are several papers that explore the power of Chain of Thought Reasoning with LLMs.
If you ask the model not to lie, and also let it know it can look things up via an API (and provide a brief description of the API), it will respond with the desired API call it wants to run. You run this on the model’s behalf by parsing the command out of the text, and running it. Then feed the output back into the model.
This is how OpenAI’s plugin system works; a short plain-English description of the plugin is inserted into the prompt, and when the model thinks the plugin is relevant, it feeds in the API description, and any output back into the model.
This is a quick way to give models access to internal databases, public information available on the internet, or anything else you could think of.
The gloves are well and truly off. Pandora’s box has been opened, so the only thing you can do is embrace this new wave of Large Language Model tech as quickly as possible.
The team at Parallax is building an AI product which will allow you to chat with a bot that has read entire Confluence spaces. If you think you’d be interested in this or any other use cases for your business then do get in touch.
James co-founded Parallax and leads on technology. He is a technology leader and able to solve the most complex challenges. He’s got a wealth of experience across software engineering, solution architecture and mobile app development.
James has extensive experience in complex problem-solving and has worked with global organisations such as UEFA and NASA. He’s a tech pioneer and was recently listed in the BIMA100 list as recognition for his achievements. James also features on the Hey! presents Off Script podcast.