Keith Douglas
Last time, I posed six families of questions concerning current events surrounding artificial intelligence (AI). I received some feedback to continue exploring some of the aspects already presented, in more detail; thanks to the feedback I will do so in a moment. However, before I do so I would encourage more readers to engage with the column! I would love to take questions, debate what I have presented, and so on.
ChatGPT Continued
I was encouraged by some readers to continue topic 1, about the explainability problems. So this month’s discussion expands here. (I first wrote some of this material for a presentation I gave to my current work unit at Statistics Canada. Credit to the Canadian taxpayer, then, for in a minor way sponsoring this investigation.) I will begin with a bit more history. It may be interesting in its own right, but I think it also bears some critical scrutiny.
These particular forms of AI use what are called artificial neural networks (ANNs), and sometimes historically also called “connectionist networks” or “parallel distributed processing.” They were first discussed in the 1950s under still other different names. It was not until the 1980s when academic researchers, primarily interested in theoretical psychology and neuroscience, adopted the approach. At the time, they were labouriously simulated by conventional, usually modestly parallel computers. I learned about them primarily from The Computational Brain, the classic by Seinowski and Patricia Churchland. (There are two philosophers named Churchland: husband and wife Paul and Patricia).
Paul Churchland thinks that ANNs are incomprehensible. He originally thought that was a good thing. There is no contention on the first point, incidentally, and that is precisely why people are “!?ing,” in the way I put it last time. In order to do this I will have to get a bit more involved than I usually do in explanation, so please bear with me!
The basic way to understand an ANN in outline is as a classifier. An early example was used in text-to-speech: classify written words as to the phonemes used to say them (and in order, of course). The next basic fact is that ANNs are not programmed in anything like the usual way. They are instead a system that learns from examples and what could be called anti-examples: e.g.,“This is NOT a picture of a dog.”
From this a network of nodes and edges is built up, with a parameter of each called a “weight.” This represents how much the two connected nodes are associated or anti-associated. Thus we get a massive parallel processor of sorts, complete with input, hidden layers (the processor proper), and the output layer, which can be a single bit if that’s all that is needed: e.g., “Word can be pronounced in English”, “Photograph is of a terrorist”, “Text is understandable by a 12-year-old student from Kenya”, etc.
It so happens that the intervening layers are needed to calculate any interesting functions, and it is precisely this innovation that saved the approach from obscurity and historical curiousity in the 1980s. Here’s a scan of a diagram from T. Sejnowski’s recent book:
This is a sample diagram of how one particular learning algorithm for ANNs works. There are three distinct steps in the process of creating an ANN.
1. Designing and building the initial structure of a network and selecting its learning algorithm(s)
2. Training it with examples and anti-examples, using the algorithms selected
3. Using it to perform whatever classification task is of interest
So what gets created during this process? Here is where another important bit of detail matters: The hidden layers store basis vectors in at least one vector space. This is a bit oversimplified, but for the purposes of illustration it will do. What’s a vector space? What are basis vectors?
Let’s think back to how something can be located in space: We select an origin, a distance unit, and then say that thing (or event) is so many units away from the origin in the vertical dimension, so many in the horizontal dimension, and so many in the front-and-back dimension. This approach generalizes so long as we can describe, somehow, how much of something we have and how far it is away from another. Indeed, examples like faces are what Churchland has used for years.
Imagine one dimension being how far apart the eyes are. Another could be where on the head’s vertical axis the ears are. Another could be a primary reflectance wavelength for the hair colour. All of this sounds over-simplified, but it is important to realize two things: (1) the network itself might create the relevant categories and (2) using less might still “work,” just less reliably. I’ve included a fragment of these basis vectors, called holons, from one of the face recognition networks (taken from The Computational Brain):
These are just six of the 80 that the network used. And here we get our first case of incomprehensibility: I for one barely even see a face in the bottom left square! Certainly it is not easy to give a name to the features picked out. Now imagine doing this in a situation where our cognition is even harder to understand (face recognition seems to have a special subsystem in human brains). In general, there is no reason for the holons to be understandable.
Why, then, is this incomprehensible nature a good thing? I remember in grade eight we were asked by our teacher to write a small journal entry on whether we think in pictures or in words. I know now that neither is in general true, and that the use of the notion of pictures can be very misleading in some contexts.
Many people at first blush think that cognition takes place verbally, or at least that a lot of it does. This is why logical notation concerns itself with something that looks like a language. It is that same logical influence that held sway in many early views on AI. Over the years there have been dissenters who were not convinced that this was the only possibility.
It can be shown fairly easily that the ANNs described above are also universal computing models much like the Turing machine (TM) and its supposed commitment to “symbolism.” But this proof was not appreciated, nor is it necessary. The relevance then is: Cognitively interesting tasks seem to not require anything like what we take to be ordinary human thought.
That gets us toward a plausibilistic argument and a very interesting conclusion: Our “minds” are not at all what we think them to be. That’s interesting even if we are committed materialists. This is thus an argument for “eliminative materialism.” For better or for worse — and without much regard to the debate over this conclusion — Google, Microsoft, etc., have gone ahead with marketing these techniques to the world.
I would like to close by encouraging the reader to think about:
1) What sort of cognitive architecture do humans have?
2) How does that relate to that of other animals?
3) Can one stably hold the partial eliminativist position (i.e., that eliminative materialism is correct for some aspects of how we work, but not others)?
4) Bunge has argued against the relevance of ANN models for understanding humans, by appealing to neuroplasticity and to neuromodulators. Are these good arguments?
5) Should people studying data science or other fields where ANNs are now common tools also study the philosophical history? Why or why not?
“probabilistic argument” should read “plausibilistic”. I do not know whether I slipped, or the editor “corrected” me. I am not a Bayesian, so I do not think arguments (like Locke) or propositions (or similiar) like contemporaries would have it have probabilities.
I have updated the post to correct this.