"Whether language evolved with the brain or the brain evolved with language?" is a question answered by Geoffrey Hinton, which I felt might be interesting to share and elaborate some explanation.
1st, symbolic view: cognition consists of having strings of symbols in some clean logical language where there is no ambiguity and applying rules of inference, cognition is just these symbolic manipulations on things that are like the language symbols.
2nd view. Once you get inside the head, it's all vectors, symbols come in, you convert these symbols into big vectors, and if you want output you produce symbols again. There is a point in machine translation in 2014, where people used recurrent neural nets(RNN) and words keep coming in and having a hidden state that keep accumulating information in the hidden state. So when they get to the end of the sentence, they have a big hidden vector that captures the meaning of that sentence, then is used to produce the sentence in neural language, that's called the thought vector. You convert the language into big vectors(nothing like language), and that is what cognition is all about.
And the 3rd view, you take these symbols, and you convert these symbols into embeddings and use multiple layers of that, so you get very rich embeddings, but the embeddings are still tied to the symbols. And got a big vector of this symbol and a big vector for that symbol, and these vectors interact to produce the symbol for the next word. And that's what understanding is, knowing how to convert the symbols into those vectors, and knowing how the vectors should interact. You are staying with the symbols, but you are interpreting with these big vectors. But it is not saying you are getting these symbols altogether, it's you turn the symbols into big vectors and you stay with the surface structure with the symbols.
We can see 3 concepts from Hinton when he explains 3 different views of language and how they relate to cognition, so we ask
1)What are symbols, vectors, and embeddings?
symbols - words
vectors - representation of words
embeddings - capture the semantic relationship between symbols by placing them in a high-dimensional vector space
For example, considering the symbol "cat"
on this 2D plane,"cat" and "dog" might be close to each other because they are both animals"Fish" is further away because it is a different type of animal"car" is far from all animals, indicating it's conceptually different.
easy peasy!
The word "cat" might now be at (0.8, 0.5, 0.9), indicating it's a domestic animal of small size.The word "dog" could be at (0.75, 0.6, 0.85), very close to "cat" but slightly larger."Fish" could be at (0.2, 0.4, 0.1), reflecting that it's not domestic, not mammalian, and lives in water. But still animals!"Car" might be at (0.9, 0.1, 0.0), far from all the animals, highlighting it's a machine.
2) RNN, hidden state, and hidden vectors in Machine translation are mentioned in the 2nd view.
Before we bawl ourselves out for the lack of prior knowledge like what RNN is, what is the hidden state, or something like that, let's try to be curious superficially.
What was the historical context of machine translation in 2014? what is the turning point of WHAT?
in 2014, a significant breakthrough in machine translation(WHAT) occurred with the advent of using RNN, particularly with sequence-to-sequence(seq2seq) models.
Ok that sounds interesting, but I'm still keeping my superficiality:
what's the big difference being made here?
Now let's dive into an example of how an RNN captures the meaning of the sentence using the hidden vector and then uses it to produce a translated sentence. Oh, there you are Mr "Hidden Vector"!
Input sentence(English): "The cat sits on the mat."Target sentence(French): "Le chat est assis sur le tapis."
The RNN processes each word in the input sentence sequentially:
"The" → hidden state 1"cat" → hidden state 2"sits" → hidden state 3"on" → hidden state 4"the" → hidden state 5"mat" → hidden state 6
Oh wait, that reminds me of another English-French translation example of "the European economic area".
Ok, let us shift to this for the time being. The RNN processes the input sentence word by word: "The agreement on the European Economic Area was signed in August 1992." to French.
As each word is processed, the RNN updates its hidden state. This hidden state is a vector that captures the context and meaning of the sentence so far.
WTF does this mean?
Suppose our FINAL hidden state is a vector of 4 numbers for simplicity (in reality, it would be much larger[.....]). After processing the entire English sentence, it might look something like this:
[0.8, -0.3, 0.6, 0.1] - This vector doesn't directly correspond to words, but rather to abstract features that the network has learned to represent meaning. Remember our 3 dimensional features!
In our example, these numbers encode various aspects of the sentence:
Likewise, RNN doesn't directly "read" this vector to produce words. Instead, it uses it as a starting point for generating the translation. Here's a simplified process:
a) The network starts with the final hidden state: [0.8, -0.3, 0.6, 0.1]
b) It uses this to predict the first word "L'accord": [0.8, -0.3, 0.6, 0.1] -> RNN -> "L'accord"
c) After outputting "L'accord", it updates the hidden state, maybe to:[0.75, -0.2, 0.55, 0.15]
d) It uses this new state to predict "sur":[0.75, -0.2, 0.55, 0.15] -> RNN -> "sur"
e) This process continues, with the hidden state updating after each word.
Without the hidden state, the RNN would have no context to work with. It would be guessing each word in isolation. The hidden state allows it to:
For example, when translating "European Economic Area", the hidden state's encoding of the economic context helps the network choose "zone économique" rather than just translating word-by-word right?
I hope now we could more or less imagine what's hidden from the French cat sitting on the mat.
(Thanks to my friend @太空喵 this article is also published on my newly established raw website: https://blog-nine-omega-56.vercel.app/2024/08/19/llm/
【免责声明】市场有风险,投资需谨慎。本文不构成投资建议,用户应考虑本文中的任何意见、观点或结论是否符合其特定状况。据此投资,责任自负。