Monday, October 22, 2018

Consciousness and Information (or: Why I am a Cartesian Dualist)

I.
One of the major points of confusion I see in many modern theories of consciousness, including Integrated Information Theory (IIT), Global Neuronal Workspace theory (GNW) and others is an unjustified jump from information and the processing thereof to conscious, subjective experience of that information. The strong versions of these theories tend to make the mistake of saying once you have the right type of information processed in the right way, subjective experience will emerge. On these theories, the brain, which processes a lot of information and combines different kinds of information with each other (IIT) or selectively focuses on a particular subset of information (GNW) thus produces consciousness.

I think that underlying these ideas is a fundamental misconception of what information is and what it can do. Information is a mathematical concept, not a physical one. We can use physical systems to represent information, like we can use a pair of gloves to represent the number "two". But that is just a matter of cognitive convenience; if I wanted, the pair of gloves could represent the number "ten" by counting the number of fingers on the gloves. Information is the same in this respect; Information is a property of random variables, not matter.  I can use a coin to represent a random variable by saying "this coin has two states, heads and tails, and flipping the coin assigns a Bernoulli distribution to the random variable with p = 0.5, and the side on which the coin lands after being flipped determines the outcome of the random variable after the experiment." But this is, again, a matter of convention. I could, for example, consider the number of times the coin flips in the air as the random variable, which would then have a different number of states (in principle, the set of all natural numbers) and a different probability distribution. So the coin itself doesn't contain information intrinsically, the information depends on what we, as observers, choose the coin to represent. (Things become a bit more nuanced with subatomic particles, which have physical states that seem to be at least somewhat well-defined and restricted in terms of the information that they convey, and there's also the issue of Landauer's principle which needs to be addressed, but I'll leave those aside for the moment.)

Image result for gloves
A primitive computer

In addition to physical objects being able to represent different information depending on the how the observer chooses to define the state space, information is substrate independent; in other words, you can store the same information in a variety of physical media and it will be identical from a mathematical standpoint. Two rocks and two socks can both convey the number "two". Let's take another example: the string "Hello" is equivalent to the binary string 01001000 01100101 01101100 01101100 01101111 in ASCII encoding, where every letter in the English letter is assigned an 8-bit binary code. A printer can convert the above binary string stored in transistors on your computer to the English string written in ink on a piece of paper. Different physical media, same information.

Let us posit, for the sake of argument, that ASCII was never invented. In other words, no one ever created a mapping between English and 8 digit-numbers. Does 01001000 01100101 01101100 01101100 01101111 still mean the same thing as "Hello"? Well, this is kind of a trick question. In information theory, entropy, the standard measure of information, doesn't answer questions of the form "does A mean the same thing as B". Instead, entropy measures the amount of information in a probability distribution, but it doesn't tell you anything about "meaning." So, for example, the average letter in English language has about 2-3 bits of entropy (after compressing via word-frequency and so forth) meaning that if you want to have a binary system that can represent any arbitrary string in English, you'd need, say, 15 bits to encode all 5 letters of hello. So entropy tells us that 01001000 01100101 01101100 01101100 01101111 could represent "Hello" with some bits to spare if we wanted it to; we would just need to create the encoding scheme that performs the appropriate mapping. But there's no (lossless) mapping from the English language to binary that could produce the word "Hello" with a single bit.

II.

Information theory can actually do a bit more for us though. There's a measure called mutual information, which does indeed tell us how much information A contains about B (and vice versa, mutual information happens to be symmetric). However, mutual information requires some additional knowledge about A and B, namely the joint probability, or the probability that in a given experiment A will have value x and B will have value y. So, for example, there is a non-zero mutual information between a person's height and weight, because height is at least somewhat predictive of weight. In this sense, mutual information is similar to correlation, but it is a stronger measure because correlation only captures linear relationships between A and B whereas mutual information tells you the maximum information you can extract from A about B using an optimal function.

If we go back to our ASCII example if we already have a computer that translates binary to English,  we can calculate the joint probability between the binary code stored in its memory and the words that it prints on its screen or on a piece of paper, and from there we can determine that the mutual information between the ASCII code and English is maximal. If we don't already have such a computer, though, then the joint probability between ASCII and English is simply not defined and the mutual information can't be calculated.

Our mutual information measure also doesn't really approach anything resembling "meaning". All we've said is that it is possible to convert from one string of symbols into another string of symbols without losing information. Because I know that my computer uses a well-defined mapping from bits to letters, I can reconstruct text from documents stored on a hard drive. That's great. But if English-speaking humans weren't around to understand the semantics of English, this would be a pointless exercise; a meaningless conversion of one string of symbols to another. The same is true, by the way, when it comes to information processing. By performing a mathematical operation on some data, I'm simply converting one string of symbols to another string of symbols by means of a function (i.e. via a Turing Machine algorithm). I could have a string of symbols a trillion trillion bits long, and I could perform a trillion trillion operations on it (if you want, I can even make the operations behave like a network, because that seems to be something that people think is important), and at the end I'd still be left with...a string of symbols. There is no Turing function of which I am aware which can take a string of symbols, perform a mathematical operation on it, and return something other than a string of symbols.

III.



If you want to get something other than a string of symbols out of another string of symbols, you have to leave the realm of mathematics and return to the world of physics, with particles that bump into each other and that sort of thing. The best example here is from molecular biology. In the classic (extremely simplified) central dogma of molecular biology, DNA is transcribed into RNA which is then translated into a protein. DNA is a system equivalent to binary except with a base-4 system (A,T,G,C) instead of binary's base-2 system or our more commonly used base-10 system. All of these systems are of course basically the same, the differences are merely representational. DNA is translated to RNA, which has the same bases as DNA except T is converted to U. The RNA strand is complementary to the DNA strand, which means that wherever G appears in the DNA strand a C appears in the RNA strand, but the two strands are informationally equivalent, because there is a simple algorithm to reconstruct the DNA string from the RNA string and vice versa. The string of As, T, Gs, and Cs, only matters, though, when the RNA is translated into amino acids, which create proteins, because proteins actually do stuff in the cell. Proteins help to catalyze chemical reactions, transfer materials within and between cells, and so on. So the DNA only has "meaning", once it's converted into a protein. And, for what it's worth, information is actually lost when RNA is translated into a protein, because multiple RNA trigrams can code for the same amino acid, meaning that you can't tell from looking at a protein exactly what RNA sequence created it.

Image result for ribosome video
RNA Translation

If I were to create a DNA strand a trillion trillion sequences long by randomly concatenating base pairs and transfecting it to your genome, it probably wouldn't do anything especially useful (actually there's a good chance it would kill you). And again, same goes for if I would process the information in that strand a trillion trillion times. The amount of information or information processing that occurs to a string of information is not highly correlated with the usefulness and/"meaning" of that information. The important step for "meaning" is the translation of the information to a physical substrate, not the symbolic representation itself. The choice of symbolic represenation is arbitrary; it just has to be long enough to compress whatever physical information you need to "read out".

IV.

So far I've been tossing around the word "meaning" without defining it, because this is where consciousness comes in. When we look at the word "Hello" written on a screen and we see it -- I mean experientially -- that cannot be information processing. Information processing can turn "Hello" into "Cello" or it can translate "Hello" into 01001000 01100101 01101100 01101100 01101111 (in the case of the brain, into the code of firing neuronal action potentials). It can associate "Hello" with other strings of information, such as an image of a waving hand - and by "associate" I mean "perform some mathematical operation whereby the symbolic representation of the waving hand and the symbolic representation of the word "hello" are combined to produce a new string of symbols. But information processing - even in network structures - cannot remove "Hello" from the world of strings of symbols to the world of conscious experience. And if it can, you have to explain how, because such a claim very much looks like a category error.

In my view, if consciousness is anything, it is unlikely to be information or to emerge from information, because we have no examples of things that are not strings of symbols emerging from strings of symbols. Consciousness (or perhaps more precisely, qualia) is a substrate which interacts with information, it is not the information itself. And what is interesting about consciousness is not the information that it reads, but rather the fact that it can read anything at all, even if the information it reads is simplistic. The brain does a lot of processing, but the purpose of this is not to create consciousness, it is simply to prepare the information for consciousness's interaction with it.



Consciousness is not necessarily a physical substrate, though I find the view of a "consciousness particle" a lot more plausible than the idea that consciousness is something that spontaneously emerges from information processing. I also believe that attempts to ground consciousness in the combination of information processing and "causal networks" of physical interactions in which that information is stored (as exists in IIT) are misguided. The best approach, in my view, is to view consciousness as a categorically independent object of inquiry and to distinguish consciousness from the information with which it interacts, the latter lending itself to network computation-related explanations.

Once we divorce consciousness from information, we are left with something very small; a consciousness that does not contain memory, personality, or anything that could be considered persistent. Persistent information is stored in the brain and may be projected to consciousness, but consciousness does not store information for more than a negligible amount of time. This version of consciousness is so small that, in the absence of information, it is indistinguishable (as far as we know) from being absent. To sharpen this point, consider the following question - is a sleeping person actually unconscious or is his brain simply not projecting any information to the conscious substrate? If consciousness is synonymous with information and the processing thereof, the question is meaningless. But if consciousness is a substrate of information; consciousness can be present in the absence of information, just like a computer can be on in the absence of data held in memory. You might not see anything on the screen, but it's still there, awaiting informational input.

No comments:

Post a Comment