Sunday, March 3, 2024

Artificial Intelligence (AI) and Me – Part 2

Almost everyday I see new developments in the field of Artificial Intelligence (AI), and new opinions that people have about AI. I have tried to keep current in my posts, but inevitably, some of what I say will be outdated, possibly even in just a few days.

AI Problems

In this post I will talk about several problems that have impeded the development of neural network-based AI systems in the past. These same problems will likely continue to be problems for AI in the future.

The Training Data Problem

I think that compiling the training data set for AI poses the most formidable obstacle to creating practical AI systems. Large networks needed a large volume of data. The training data set for ChatGPT3 had 300 billion words. Until the Internet matured, it would have been impossible to find that volume of text data needed to train complex AI systems like ChatGPT.

When I took the Artificial Intelligence class back in the 1990s, they warned us about the need to ensure that the training data set was of high quality. Any errors or mistakes in the data would contaminate the AI, leading to poor quality results. A major part of compiling the data would be to check, and, if necessary, clean the data.

The Intellectual Property Problem

The huge demand for training data has started to run into legal challenges. Writers, other creative people, and owners of intellectual property are concerned that the AI companies are compensating them when their work is used to train AI systems. It could be that my own work may well have been used to train AI systems. I have no idea how I would find out if it had.

OpenAI said that they cannot create a functional AI without the use of copyrighted material. https://arstechnica.com/information-technology/2024/01/openai-says-its-impossible-to-create-useful-ai-models-without-copyrighted-material/. It is likely that some kind of limitation on the use of copyrighted material will emerge. This could restrict the development of AI systems and increase the cost. While there is a large volume of material available in the public domain, this material is often old, and outdated.

The Bias Problem

Bias in the training data is part of the issue of how clean the data is. However, bias is something that deserves special consideration. If the data has a bias, so will the AI. In the course I took in the 1990s, they reinforced this issue time and time again. Sadly, bias has been a problem with many AI systems. While this can have humorous results, it can also result in negative outcomes.

In one case an AI created to identify skin cancers, used the fact that images of actual skin cancers happened to include a ruler for scale, while non cancer images did not. https://venturebeat.com/business/when-ai-flags-the-ruler-not-the-tumor-and-other-arguments-for-abolishing-the-black-box-vb-live/.

Many articles have been published about bias in AI models for law enforcement. For example: https://daily.jstor.org/what-happens-when-police-use-ai-to-predict-and-prevent-crime/ The problem of bias is not limited to policing though.

Bias creeps in during the creation of the training data set. If there is bias in how a law is enforced, the data available about that law will contain that bias. It is essential that the data be checked for bias before training. Not only can the original data be biased, but the people checking for bias may have their own biases, which they may be unaware of.

There was a recent case where attempts by Google to correct bias in an AI model resulted in a different bias. https://globalnews.ca/news/10311428/google-gemini-image-generation-pause/.

It is easy to call out the bias in AI systems. However, it is clear that controlling bias is difficult. Discovering the best ways to address bias in AIs will continue to be a major challenge.

As an aside, I suspect that the underlying cause of the biases people have may well be the same as the cause of bias in AI systems. Maybe learning how to deal with bias in AI systems may help us deal with bias in people.

The Black Box Problem

I spent most of my working career developing and applying transportation forecasting models. These were used to predict what traffic will be like in the future, which were then used to plan the transportation system.

Many people criticized the forecasts the model produced because they saw the model as a black box. They couldn’t see how it worked, so they tended to distrust what they predicted. While I felt that the model could be explained, the explanations were complicated. Few people had the time or patience needed to understand the explanations.

The problem with neural networks is that they truly are black boxes. We can see the inputs and the outputs. We can even look at the parameters inside the AI. But with AI systems that can have 175 billion parameters, it is not practical for people to understand, let alone explain, how the AI got the answer it did.

The black box problem makes it very difficult to fix an AI that isn’t acting the way you want it to. It can’t be debugged in the same way a computer program. It can’t be reasoned with in the same way as you can with a person.

It appears to me that the current approach is to revisit the training data set and modify it before retraining the model. It may be necessary to revise the data and retrain the AI system many times before the users and developers are satisfied.

Almost everyday I see new developments in the field of Artificial Intelligence (AI), and new opinions that people have about AI. I have tried to keep current in my posts, but inevitably, some of what I say will be outdated, possibly even in just a few days.

AI Problems

In this post I will talk about several problems that have impeded the development of neural network-based AI systems in the past. These same problems will likely continue to be problems for AI in the future.

The Training Data Problem

I think that compiling the training data set for AI poses the most formidable obstacle to creating practical AI systems. Large networks needed a large volume of data. The training data set for ChatGPT3 had 300 billion words. Until the Internet matured, it would have been impossible to find that volume of text data needed to train complex AI systems like ChatGPT.

When I took the Artificial Intelligence class back in the 1990s, they warned us about the need to ensure that the training data set was of high quality. Any errors or mistakes in the data would contaminate the AI, leading to poor quality results. A major part of compiling the data would be to check, and, if necessary, clean the data.

The Intellectual Property Problem

The huge demand for training data has started to run into legal challenges. Writers, other creative people, and owners of intellectual property are concerned that the AI companies are compensating them when their work is used to train AI systems. It could be that my own work may well have been used to train AI systems. I have no idea how I would find out if it had.

OpenAI said that they cannot create a functional AI without the use of copyrighted material. https://arstechnica.com/information-technology/2024/01/openai-says-its-impossible-to-create-useful-ai-models-without-copyrighted-material/. It is likely that some kind of limitation on the use of copyrighted material will emerge. This could restrict the development of AI systems and increase the cost. While there is a large volume of material available in the public domain, this material is often old, and outdated.

The Bias Problem

Bias in the training data is part of the issue of how clean the data is. However, bias is something that deserves special consideration. If the data has a bias, so will the AI. In the course I took in the 1990s, they reinforced this issue time and time again. Sadly, bias has been a problem with many AI systems. While this can have humorous results, it can also result in negative outcomes.

In one case an AI created to identify skin cancers, used the fact that images of actual skin cancers happened to include a ruler for scale, while non cancer images did not. https://venturebeat.com/business/when-ai-flags-the-ruler-not-the-tumor-and-other-arguments-for-abolishing-the-black-box-vb-live/.

Many articles have been published about bias in AI models for law enforcement. For example: https://daily.jstor.org/what-happens-when-police-use-ai-to-predict-and-prevent-crime/ The problem of bias is not limited to policing though.

Bias creeps in during the creation of the training data set. If there is bias in how a law is enforced, the data available about that law will contain that bias. It is essential that the data be checked for bias before training. Not only can the original data be biased, but the people checking for bias may have their own biases, which they may be unaware of.

There was a recent case where attempts by Google to correct bias in an AI model resulted in a different bias. https://globalnews.ca/news/10311428/google-gemini-image-generation-pause/.

It is easy to call out the bias in AI systems. However, it is clear that controlling bias is difficult. Discovering the best ways to address bias in AIs will continue to be a major challenge.

As an aside, I suspect that the underlying cause of the biases people have may well be the same as the cause of bias in AI systems. Maybe learning how to deal with bias in AI systems may help us deal with bias in people.

The Black Box Problem

I spent most of my working career developing and applying transportation forecasting models. These were used to predict what traffic will be like in the future, which were then used to plan the transportation system.

Many people criticized the forecasts the model produced because they saw the model as a black box. They couldn’t see how it worked, so they tended to distrust what they predicted. While I felt that the model could be explained, the explanations were complicated. Few people had the time or patience needed to understand the explanations.

The problem with neural networks is that they truly are black boxes. We can see the inputs and the outputs. We can even look at the parameters inside the AI. But with AI systems that can have 175 billion parameters, it is not practical for people to understand, let alone explain, how the AI got the answer it did.

The black box problem makes it very difficult to fix an AI that isn’t acting the way you want it to. It can’t be debugged in the same way a computer program. It can’t be reasoned with in the same way as you can with a person.

It appears to me that the current approach is to revisit the training data set and modify it before retraining the model. It may be necessary to revise the data and retrain the AI system many times before the users and developers are satisfied.



This post is a mirror from my main blog http://www.dynamiclethargyfilms.ca/blog

No comments:

Post a Comment