Almost
everyday I see new developments in the field of Artificial Intelligence (AI),
and new opinions that people have about AI. I have tried to keep current in my
posts, but inevitably, some of what I say will be outdated, possibly even in
just a few days.
AI Problems
In this
post I will talk about several problems that have impeded the development of
neural network-based AI systems in the past. These same problems will likely
continue to be problems for AI in the future.
The Training Data Problem
I think that
compiling the training data set for AI poses the most formidable obstacle to
creating practical AI systems. Large networks needed a large volume of data.
The training data set for ChatGPT3 had 300 billion words. Until the Internet
matured, it would have been impossible to find that volume of text data needed
to train complex AI systems like ChatGPT.
When I took
the Artificial Intelligence class back in the 1990s, they warned us about the need
to ensure that the training data set was of high quality. Any errors or
mistakes in the data would contaminate the AI, leading to poor quality results.
A major part of compiling the data would be to check, and, if necessary, clean
the data.
The Intellectual Property Problem
The huge
demand for training data has started to run into legal challenges. Writers,
other creative people, and owners of intellectual property are concerned that
the AI companies are compensating them when their work is used to train AI
systems. It could be that my own work may well have been used to train AI
systems. I have no idea how I would find out if it had.
OpenAI said
that they cannot create a functional AI without the use of copyrighted
material. https://arstechnica.com/information-technology/2024/01/openai-says-its-impossible-to-create-useful-ai-models-without-copyrighted-material/. It is likely that some kind of
limitation on the use of copyrighted material will emerge. This could restrict
the development of AI systems and increase the cost. While there is a large
volume of material available in the public domain, this material is often old,
and outdated.
The Bias Problem
Bias in the
training data is part of the issue of how clean the data is. However, bias is
something that deserves special consideration. If the data has a bias, so will
the AI. In the course I took in the 1990s, they reinforced this issue time and
time again. Sadly, bias has been a problem with many AI systems. While this can
have humorous results, it can also result in negative outcomes.
In one case
an AI created to identify skin cancers, used the fact that images of actual
skin cancers happened to include a ruler for scale, while non cancer images did
not. https://venturebeat.com/business/when-ai-flags-the-ruler-not-the-tumor-and-other-arguments-for-abolishing-the-black-box-vb-live/.
Many
articles have been published about bias in AI models for law enforcement. For
example: https://daily.jstor.org/what-happens-when-police-use-ai-to-predict-and-prevent-crime/ The problem of bias is not limited
to policing though.
Bias creeps
in during the creation of the training data set. If there is bias in how a law
is enforced, the data available about that law will contain that bias. It is
essential that the data be checked for bias before training. Not only can the
original data be biased, but the people checking for bias may have their own
biases, which they may be unaware of.
There was a
recent case where attempts by Google to correct bias in an AI model resulted in
a different bias. https://globalnews.ca/news/10311428/google-gemini-image-generation-pause/.
It is easy
to call out the bias in AI systems. However, it is clear that controlling bias
is difficult. Discovering the best ways to address bias in AIs will continue to
be a major challenge.
As an
aside, I suspect that the underlying cause of the biases people have may well
be the same as the cause of bias in AI systems. Maybe learning how to deal with
bias in AI systems may help us deal with bias in people.
The Black Box Problem
I spent
most of my working career developing and applying transportation forecasting
models. These were used to predict what traffic will be like in the future,
which were then used to plan the transportation system.
Many people
criticized the forecasts the model produced because they saw the model as a
black box. They couldn’t see how it worked, so they tended to distrust what they
predicted. While I felt that the model could be explained, the explanations
were complicated. Few people had the time or patience needed to understand the
explanations.
The problem
with neural networks is that they truly are black boxes. We can see the inputs
and the outputs. We can even look at the parameters inside the AI. But with AI
systems that can have 175 billion parameters, it is not practical for people to
understand, let alone explain, how the AI got the answer it did.
The black
box problem makes it very difficult to fix an AI that isn’t acting the way you
want it to. It can’t be debugged in the same way a computer program. It can’t
be reasoned with in the same way as you can with a person.
It appears
to me that the current approach is to revisit the training data set and modify
it before retraining the model. It may be necessary to revise the data and
retrain the AI system many times before the users and developers are satisfied.
Almost
everyday I see new developments in the field of Artificial Intelligence (AI),
and new opinions that people have about AI. I have tried to keep current in my
posts, but inevitably, some of what I say will be outdated, possibly even in
just a few days.
AI Problems
In this
post I will talk about several problems that have impeded the development of
neural network-based AI systems in the past. These same problems will likely
continue to be problems for AI in the future.
The Training Data Problem
I think that
compiling the training data set for AI poses the most formidable obstacle to
creating practical AI systems. Large networks needed a large volume of data.
The training data set for ChatGPT3 had 300 billion words. Until the Internet
matured, it would have been impossible to find that volume of text data needed
to train complex AI systems like ChatGPT.
When I took
the Artificial Intelligence class back in the 1990s, they warned us about the need
to ensure that the training data set was of high quality. Any errors or
mistakes in the data would contaminate the AI, leading to poor quality results.
A major part of compiling the data would be to check, and, if necessary, clean
the data.
The Intellectual Property Problem
The huge
demand for training data has started to run into legal challenges. Writers,
other creative people, and owners of intellectual property are concerned that
the AI companies are compensating them when their work is used to train AI
systems. It could be that my own work may well have been used to train AI
systems. I have no idea how I would find out if it had.
OpenAI said
that they cannot create a functional AI without the use of copyrighted
material. https://arstechnica.com/information-technology/2024/01/openai-says-its-impossible-to-create-useful-ai-models-without-copyrighted-material/. It is likely that some kind of
limitation on the use of copyrighted material will emerge. This could restrict
the development of AI systems and increase the cost. While there is a large
volume of material available in the public domain, this material is often old,
and outdated.
The Bias Problem
Bias in the
training data is part of the issue of how clean the data is. However, bias is
something that deserves special consideration. If the data has a bias, so will
the AI. In the course I took in the 1990s, they reinforced this issue time and
time again. Sadly, bias has been a problem with many AI systems. While this can
have humorous results, it can also result in negative outcomes.
In one case
an AI created to identify skin cancers, used the fact that images of actual
skin cancers happened to include a ruler for scale, while non cancer images did
not. https://venturebeat.com/business/when-ai-flags-the-ruler-not-the-tumor-and-other-arguments-for-abolishing-the-black-box-vb-live/.
Many
articles have been published about bias in AI models for law enforcement. For
example: https://daily.jstor.org/what-happens-when-police-use-ai-to-predict-and-prevent-crime/ The problem of bias is not limited
to policing though.
Bias creeps
in during the creation of the training data set. If there is bias in how a law
is enforced, the data available about that law will contain that bias. It is
essential that the data be checked for bias before training. Not only can the
original data be biased, but the people checking for bias may have their own
biases, which they may be unaware of.
There was a
recent case where attempts by Google to correct bias in an AI model resulted in
a different bias. https://globalnews.ca/news/10311428/google-gemini-image-generation-pause/.
It is easy
to call out the bias in AI systems. However, it is clear that controlling bias
is difficult. Discovering the best ways to address bias in AIs will continue to
be a major challenge.
As an
aside, I suspect that the underlying cause of the biases people have may well
be the same as the cause of bias in AI systems. Maybe learning how to deal with
bias in AI systems may help us deal with bias in people.
The Black Box Problem
I spent
most of my working career developing and applying transportation forecasting
models. These were used to predict what traffic will be like in the future,
which were then used to plan the transportation system.
Many people
criticized the forecasts the model produced because they saw the model as a
black box. They couldn’t see how it worked, so they tended to distrust what they
predicted. While I felt that the model could be explained, the explanations
were complicated. Few people had the time or patience needed to understand the
explanations.
The problem
with neural networks is that they truly are black boxes. We can see the inputs
and the outputs. We can even look at the parameters inside the AI. But with AI
systems that can have 175 billion parameters, it is not practical for people to
understand, let alone explain, how the AI got the answer it did.
The black
box problem makes it very difficult to fix an AI that isn’t acting the way you
want it to. It can’t be debugged in the same way a computer program. It can’t
be reasoned with in the same way as you can with a person.
It appears
to me that the current approach is to revisit the training data set and modify
it before retraining the model. It may be necessary to revise the data and
retrain the AI system many times before the users and developers are satisfied.
This post is a mirror from my main blog
http://www.dynamiclethargyfilms.ca/blog