Both sides previous revision Previous revision Next revision | Previous revision |
third_person [2024/12/25 12:20] – [Case 2: Teacher supervision] pedroortega | third_person [2024/12/25 12:40] (current) – [Why does this happen?] pedroortega |
---|
- and reinforcing consequences (rewards and punishments). | - and reinforcing consequences (rewards and punishments). |
These elements are essential for learning through direct interaction with the environment. | These elements are essential for learning through direct interaction with the environment. |
| |
| But learning like this is extremely limited. Most of what we know about the world does not come from first person experience! |
| |
[[https://en.wikipedia.org/wiki/Imitation|Imitation]] is another form of learning which is ubiquitous in animals((In addition, there is evidence suggesting animals have dedicated neural circuitry for imitation---see e.g. mirror neurons (Kilner and Lemon, 2012).)). Imitation learning, however, involves translating third-person (observed) experiences into first-person (self) knowledge. This process requires the learner to infer causal relationships from observations, effectively reconstructing the underlying principles behind observed behaviors. Such a transformation is challenging because third-person observations lack the direct causal feedback inherent in personal experience. | [[https://en.wikipedia.org/wiki/Imitation|Imitation]] is another form of learning which is ubiquitous in animals((In addition, there is evidence suggesting animals have dedicated neural circuitry for imitation---see e.g. mirror neurons (Kilner and Lemon, 2012).)). Imitation learning, however, involves translating third-person (observed) experiences into first-person (self) knowledge. This process requires the learner to infer causal relationships from observations, effectively reconstructing the underlying principles behind observed behaviors. Such a transformation is challenging because third-person observations lack the direct causal feedback inherent in personal experience. |
This implies that $P(Y|X)$ will predict well what will happen when the demonstrator chooses $X$, but it won't predict what will happen when the learner chooses $X$. This last prediction differs because the learner's choice---even when imitating---are based on their own subjective information state, which is ignorant about the unobserved intention $\theta$, and thus unable to implement the necessary causal dependency between $X$ and $\theta$ the same way the demonstrator did. | This implies that $P(Y|X)$ will predict well what will happen when the demonstrator chooses $X$, but it won't predict what will happen when the learner chooses $X$. This last prediction differs because the learner's choice---even when imitating---are based on their own subjective information state, which is ignorant about the unobserved intention $\theta$, and thus unable to implement the necessary causal dependency between $X$ and $\theta$ the same way the demonstrator did. |
| |
==== Why does this happen? ==== | ==== The math: why does this happen? ==== |
| |
To understand what will happen when we substitute the demonstrator by the learner, we need $P(Y|\text{do}(X))$, i.e. the distribution over $Y$ when $X$ is chosen independently, also known as the effect $Y$ under the //intervention $X$// in causal lingo. | To understand what will happen when we substitute the demonstrator by the learner, we need $P(Y|\text{do}(X))$, i.e. the distribution over $Y$ when $X$ is chosen independently, also known as the effect $Y$ under the //intervention $X$// in causal lingo. |