Sunday, March 3, 2024

Artificial Intelligence (AI) and Me – Part 2

Almost everyday I see new developments in the field of Artificial Intelligence (AI), and new opinions that people have about AI. I have tried to keep current in my posts, but inevitably, some of what I say will be outdated, possibly even in just a few days.

AI Problems

In this post I will talk about several problems that have impeded the development of neural network-based AI systems in the past. These same problems will likely continue to be problems for AI in the future.

The Training Data Problem

I think that compiling the training data set for AI poses the most formidable obstacle to creating practical AI systems. Large networks needed a large volume of data. The training data set for ChatGPT3 had 300 billion words. Until the Internet matured, it would have been impossible to find that volume of text data needed to train complex AI systems like ChatGPT.

When I took the Artificial Intelligence class back in the 1990s, they warned us about the need to ensure that the training data set was of high quality. Any errors or mistakes in the data would contaminate the AI, leading to poor quality results. A major part of compiling the data would be to check, and, if necessary, clean the data.

The Intellectual Property Problem

The huge demand for training data has started to run into legal challenges. Writers, other creative people, and owners of intellectual property are concerned that the AI companies are compensating them when their work is used to train AI systems. It could be that my own work may well have been used to train AI systems. I have no idea how I would find out if it had.

OpenAI said that they cannot create a functional AI without the use of copyrighted material. https://arstechnica.com/information-technology/2024/01/openai-says-its-impossible-to-create-useful-ai-models-without-copyrighted-material/. It is likely that some kind of limitation on the use of copyrighted material will emerge. This could restrict the development of AI systems and increase the cost. While there is a large volume of material available in the public domain, this material is often old, and outdated.

The Bias Problem

Bias in the training data is part of the issue of how clean the data is. However, bias is something that deserves special consideration. If the data has a bias, so will the AI. In the course I took in the 1990s, they reinforced this issue time and time again. Sadly, bias has been a problem with many AI systems. While this can have humorous results, it can also result in negative outcomes.

In one case an AI created to identify skin cancers, used the fact that images of actual skin cancers happened to include a ruler for scale, while non cancer images did not. https://venturebeat.com/business/when-ai-flags-the-ruler-not-the-tumor-and-other-arguments-for-abolishing-the-black-box-vb-live/.

Many articles have been published about bias in AI models for law enforcement. For example: https://daily.jstor.org/what-happens-when-police-use-ai-to-predict-and-prevent-crime/ The problem of bias is not limited to policing though.

Bias creeps in during the creation of the training data set. If there is bias in how a law is enforced, the data available about that law will contain that bias. It is essential that the data be checked for bias before training. Not only can the original data be biased, but the people checking for bias may have their own biases, which they may be unaware of.

There was a recent case where attempts by Google to correct bias in an AI model resulted in a different bias. https://globalnews.ca/news/10311428/google-gemini-image-generation-pause/.

It is easy to call out the bias in AI systems. However, it is clear that controlling bias is difficult. Discovering the best ways to address bias in AIs will continue to be a major challenge.

As an aside, I suspect that the underlying cause of the biases people have may well be the same as the cause of bias in AI systems. Maybe learning how to deal with bias in AI systems may help us deal with bias in people.

The Black Box Problem

I spent most of my working career developing and applying transportation forecasting models. These were used to predict what traffic will be like in the future, which were then used to plan the transportation system.

Many people criticized the forecasts the model produced because they saw the model as a black box. They couldn’t see how it worked, so they tended to distrust what they predicted. While I felt that the model could be explained, the explanations were complicated. Few people had the time or patience needed to understand the explanations.

The problem with neural networks is that they truly are black boxes. We can see the inputs and the outputs. We can even look at the parameters inside the AI. But with AI systems that can have 175 billion parameters, it is not practical for people to understand, let alone explain, how the AI got the answer it did.

The black box problem makes it very difficult to fix an AI that isn’t acting the way you want it to. It can’t be debugged in the same way a computer program. It can’t be reasoned with in the same way as you can with a person.

It appears to me that the current approach is to revisit the training data set and modify it before retraining the model. It may be necessary to revise the data and retrain the AI system many times before the users and developers are satisfied.

Almost everyday I see new developments in the field of Artificial Intelligence (AI), and new opinions that people have about AI. I have tried to keep current in my posts, but inevitably, some of what I say will be outdated, possibly even in just a few days.

AI Problems

In this post I will talk about several problems that have impeded the development of neural network-based AI systems in the past. These same problems will likely continue to be problems for AI in the future.

The Training Data Problem

I think that compiling the training data set for AI poses the most formidable obstacle to creating practical AI systems. Large networks needed a large volume of data. The training data set for ChatGPT3 had 300 billion words. Until the Internet matured, it would have been impossible to find that volume of text data needed to train complex AI systems like ChatGPT.

When I took the Artificial Intelligence class back in the 1990s, they warned us about the need to ensure that the training data set was of high quality. Any errors or mistakes in the data would contaminate the AI, leading to poor quality results. A major part of compiling the data would be to check, and, if necessary, clean the data.

The Intellectual Property Problem

The huge demand for training data has started to run into legal challenges. Writers, other creative people, and owners of intellectual property are concerned that the AI companies are compensating them when their work is used to train AI systems. It could be that my own work may well have been used to train AI systems. I have no idea how I would find out if it had.

OpenAI said that they cannot create a functional AI without the use of copyrighted material. https://arstechnica.com/information-technology/2024/01/openai-says-its-impossible-to-create-useful-ai-models-without-copyrighted-material/. It is likely that some kind of limitation on the use of copyrighted material will emerge. This could restrict the development of AI systems and increase the cost. While there is a large volume of material available in the public domain, this material is often old, and outdated.

The Bias Problem

Bias in the training data is part of the issue of how clean the data is. However, bias is something that deserves special consideration. If the data has a bias, so will the AI. In the course I took in the 1990s, they reinforced this issue time and time again. Sadly, bias has been a problem with many AI systems. While this can have humorous results, it can also result in negative outcomes.

In one case an AI created to identify skin cancers, used the fact that images of actual skin cancers happened to include a ruler for scale, while non cancer images did not. https://venturebeat.com/business/when-ai-flags-the-ruler-not-the-tumor-and-other-arguments-for-abolishing-the-black-box-vb-live/.

Many articles have been published about bias in AI models for law enforcement. For example: https://daily.jstor.org/what-happens-when-police-use-ai-to-predict-and-prevent-crime/ The problem of bias is not limited to policing though.

Bias creeps in during the creation of the training data set. If there is bias in how a law is enforced, the data available about that law will contain that bias. It is essential that the data be checked for bias before training. Not only can the original data be biased, but the people checking for bias may have their own biases, which they may be unaware of.

There was a recent case where attempts by Google to correct bias in an AI model resulted in a different bias. https://globalnews.ca/news/10311428/google-gemini-image-generation-pause/.

It is easy to call out the bias in AI systems. However, it is clear that controlling bias is difficult. Discovering the best ways to address bias in AIs will continue to be a major challenge.

As an aside, I suspect that the underlying cause of the biases people have may well be the same as the cause of bias in AI systems. Maybe learning how to deal with bias in AI systems may help us deal with bias in people.

The Black Box Problem

I spent most of my working career developing and applying transportation forecasting models. These were used to predict what traffic will be like in the future, which were then used to plan the transportation system.

Many people criticized the forecasts the model produced because they saw the model as a black box. They couldn’t see how it worked, so they tended to distrust what they predicted. While I felt that the model could be explained, the explanations were complicated. Few people had the time or patience needed to understand the explanations.

The problem with neural networks is that they truly are black boxes. We can see the inputs and the outputs. We can even look at the parameters inside the AI. But with AI systems that can have 175 billion parameters, it is not practical for people to understand, let alone explain, how the AI got the answer it did.

The black box problem makes it very difficult to fix an AI that isn’t acting the way you want it to. It can’t be debugged in the same way a computer program. It can’t be reasoned with in the same way as you can with a person.

It appears to me that the current approach is to revisit the training data set and modify it before retraining the model. It may be necessary to revise the data and retrain the AI system many times before the users and developers are satisfied.



This post is a mirror from my main blog http://www.dynamiclethargyfilms.ca/blog

Sunday, February 18, 2024

Artificial Intelligence (AI) and Me - Part 1

Artificial Intelligence (AI) has been all the rage in 2023 and it looks like it will continue to be a big thing in 2024 as well. I have mixed feelings about AI. I can see its value as well as the dangers and challenges it brings. In this series of blog posts I will try to clarify my own thoughts about AI.

This post will focus on my experiences with the concept of artificial intelligence before the recent developments in AI. I feel this background information will be useful in later posts. In later posts I will look at the technical challenges of AI, the problems of AI in application, and my own use of AI systems.

My exposure to AI began in the 1960s when I began to read Science fiction. Then later, in the 1990s, I took a course in expert systems and neural networks systems.

 

Artificial Intelligence (AI) in Science Fiction

Artificial Intelligence has long been a feature of Science Fiction stories. That is where I was first exposed to the idea.

You can’t really talk about AI, without referring to the movie “2001: A Space Odyssey.” It is my favorite movie. Although AI was common in earlier science fiction, “2001” exposed the concept to a wider audience. The benefits and dangers of AI are an important aspect of the story.

One comment that I find of particular interest comes at about 1 hour and three minutes into the film. A TV interviewer asks Dave Bowman if HAL, the onboard computer, has real emotions. He replies, “Well, he acts like he has genuine emotions. Um, of course he's programmed that way to make it easier for us to talk to him. But as to whether he has real feelings is something I don't think anyone can truthfully answer.”

Many of Isaac Asimov’s robot stories deal with the opportunities and dangers of AI. One story that sticks in my mind nearly 50 years after I last read it is “Galley Slave.” It is about a robotic proofreader who is alleged to have ruined a writer’s reputation by making changes to the galleys of a book an author had written. Since people are now using AI systems for proof reading, this 70-year-old story is still relevant. (https://en.wikipedia.org/wiki/Galley_Slave).  I think people who have experienced frustration with autocomplete will empathize with the main character.

“The Tunnel Under the World” by Frederik Pohl is another story that is relevant to AI. I never read the story, but I heard a radio play based on it. [Spoiler Alert] In the story tiny intelligent robots, who represent real people, live in a model of a city. They are used to test how real people might react to different advertising campaigns as they go about their daily activities. https://en.wikipedia.org/wiki/The_Tunnel_under_the_World

 

Expert Systems and Neural Networks

In the mid-1990s I took a night class on expert systems and neural networks. Neural networks are the underlying technology behind today’s artificial intelligence systems.

In the 1990s I worked on the creation and application of computer models to forecast traffic for planning the transportation system. I took the class to better understand a new type of transportation model that, like “The Tunnel Under the World,” used AI simulations of individual people to predict the behavior of real people when faced with changes to the transportation system. These newer models didn’t come into use until after I left the field, so I don’t know how they worked out.

Expert systems were an earlier attempt to develop artificial intelligence. However, by the mid-1990s, neural network technology was beginning to supplant the older expert systems.

Neural networks are based on a simulation of how the human brain works. The brain consists of neurons that are interconnected. It is the strength of these connections that determine how a brain thinks. In neural network systems nodes represent neurons and parameters represent the connections.

A human brain may have 100 billion neurons and about 700 trillion connections. By comparison ChatGpt3 had about 175 billion parameters. The human brain is about 4,000 times as complex as ChatGPT3.

After a neural network is set up, it is trained using a training data set, which gives the network the inputs you have and the outputs that you want it to produce.

This is a simplified explanation. You can get a more detailed explanation here: https://en.wikipedia.org/wiki/Neural_network

 

How To Create a Neural Network

There are two major steps in creating an AI system using a neural network. The first step is setting up the network. The second step is to train the network.

The neural networks we covered in the class I took were quite simple. These networks had only a matter of a few dozen nodes. The structure consisted of an input layer, an output layer and one or more intermediate layers. The systems we use today are far more complex.

The more interesting and understandable step to me is training the network.

Before a neural network can be trained, you must create the training data set. This consists of observations of the inputs available and outputs you want. An example of this could be images of handwritten letters and the corresponding letters they represent.

Creating a useful training data set is a major challenge. Not only must you find the data, but you will need to ensure the data covers a broad enough range of possibilities. Also, you need to ensure that the data is clean. That is, the outputs must match the inputs. For example, all the images of handwritten “A’s” must be identified in the output as “A’s”. In the course the teacher warned us that any bias in the data will result in a biased AI.

The neural network is calibrated by repeatedly running the input data through the neural network, then compared to the corresponding output. Based on how well the neural network replicates the observed output data, the parameters of the neural network are adjusted, until the outputs match. This can take a long time.

In the 1990s, it was very difficult to assemble a useful training data set. As the Internet grew and more information became available in digital form, it became easier to compile the kind of large data sets that neural networks needed for calibration. There are still major barriers now to preparing a useful training data set.

The computers available in the 1990s were much slower, which limited the size of neural networks. Even the fastest super computers in the 1990s would struggle to calibrate large neural networks. Many modern home computers can outperform the super computers from the 1990s. This has dramatically expanded the size of neural networks we can work with, and consequently the ability of AI systems has also expanded.

This post is a mirror from my main blog http://www.dynamiclethargyfilms.ca/blog

Sunday, February 4, 2024

“The Notion of the Dirty Pot” Posted

I wrote, recorded, and posted my writing exercise: “The Notion of the Dirty Pot.”

Image created using Image Creator from Microsoft Bing

Rachel, an archaeology graduate student finds an unusual pot on a dig. Professor Albert Shannon is also baffled. https://soundcloud.com/dynamiclethargy/the-notion-of-the-dirty-pot

The recording is also available on the audio page of my website. https://dynamiclethargyfilms.ca/audio/

This is a writing exercise based on a random title generated at https://dynamiclethargyfilms.ca/random-title-generator/random-title-generator-for-things/

Character voices by Voice.ai: Politician, Sam-v30, and Betty-White.

 

This post is a mirror from my main blog http://www.dynamiclethargyfilms.ca/blog