Differences

This shows you the differences between two versions of the page.

--- uiai [2026/03/17 01:30] – [Definition: Third-party action] pedroortega
+++ uiai [2026/03/17 10:30] (current) – [Definition: Third-party action] pedroortega
@@ Line 398: / Line 398: @@
 Generate $(\dot{\gamma}_j,\dot{x}_j)_{j \ge k}$ as follows:
-- **Shared prefix:** Set $\dot{\gamma}_{\le k-1} := \gamma_{\le k-1}$, $\dot{x}_{\le k-1} := x_{\le k-1}$.
+  * **Shared prefix:** Set $\dot{\gamma}_{\le k-1} := \gamma_{\le k-1}$, $\dot{x}_{\le k-1} := x_{\le k-1}$.
-- **Force an $\mathcal{A}$-block start:** Set $\dot{\gamma}_k := 1$.
+  * **Force an $\mathcal{A}$-block start:** Set $\dot{\gamma}_k := 1$.
-- **Evolve branch chronologically:** For $j \ge k$, first sample the next substrate symbol by $\dot{x}_j \sim \mu(\cdot \mid \underline{a\hat{o}}_{<t} a_t\,w\,\dot{x}_{k:j-1})$, so $\mu$ emits the content of the forced $\mathcal{A}$-block in the branch, conditioned on the shared past and the already-emitted branch block prefix. Then sample the next gate value by
+  * **Evolve branch chronologically:** For $j \ge k$, first sample the next substrate symbol by $\dot{x}_j \sim \mu(\cdot \mid \underline{a\hat{o}}_{<t} a_t\,w\,\dot{x}_{k:j-1})$, so $\mu$ emits the content of the forced $\mathcal{A}$-block in the branch, conditioned on the shared past and the already-emitted branch block prefix. Then sample the next gate value by
 $$
   \dot{\gamma}_{j+1} \sim \Gamma(\cdot \mid \dot{\gamma}_{\le j}, \dot{x}_{\le j}).
@@ Line 416: / Line 416: @@
 Note that $k'$ is determined inside the branch and therefore the length of $\dot{a}_{t+1}$ need not match the length of the on-path $\mathcal{A}$-token written by the agent starting at $k$.
+{{ ::uiai-cf-action.png?600 |Counterfactual Action}}
 **Diagram note.**
-The intended picture is that after the shared on-path prefix ending at $a_3,o_3$, the on-path transcript has factual action $a_4$, while the counterfactual branch replaces that with the world-generated block $\dot{a}_4$ occupying the same would-be action slot.
+The diagram shows that after the shared on-path prefix ending at $a_3,o_3$, the on-path transcript has factual action $a_4$, while the counterfactual action $\dot{a}_4$ is spawn from the same prefix but generated by the world.
 ==== Definition: Third-party action ====
@@ Line 426: / Line 428: @@
 To define the world’s $\mathcal{A}$-continuation at $k$, run the following tokenization procedure, initialized from the already-written on-path transcript up to $k-1$. Let $(\dot{\gamma}_j)_{j \ge k}$ be generated as follows:
-- **Shared prefix:** Set $\dot{\gamma}_{\le k-1} := \gamma_{\le k-1}$.
+  * **Shared prefix:** Set $\dot{\gamma}_{\le k-1} := \gamma_{\le k-1}$.
-- **Force an $\mathcal{A}$-block start:** Set $\dot{\gamma}_{k} := 1$.
+  * **Force an $\mathcal{A}$-block start:** Set $\dot{\gamma}_{k} := 1$.
-- **Read transcript chronologically:** For $j \ge k$, let $x_j$ be the next substrate symbol generated by the world on-path. Then sample the next gate value by
+  * **Read transcript chronologically:** For $j \ge k$, let $x_j$ be the next substrate symbol generated by the world on-path. Then sample the next gate value by
 $$
   \dot{\gamma}_{j+1} \sim \Gamma(\cdot \mid \dot{\gamma}_{\le j}, x_{\le j}).
@@ Line 448: / Line 450: @@
 o_t = w\,\dot{a}_{t+1}\,v.
 $$
+{{ :uiai-third-party.png?600 |}}
 **Diagram note.**
-The intended picture is that inside a long world-written observation token, there is an embedded block $\dot{a}_4$ that occupies an $\mathcal{A}$-position under the tokenization convention, even though on-path it is still written by the world and therefore appears as evidence.
+The diagram illustrates a third party action. Inside a long observation token $o_3$, one can identify an embedded block $\dot{a}_4$ that is interpreted as a third-party action. Because it is written by the world, it counts as evidence.
 It is not hard to see that, for a given potential index $k$, the counterfactual action $\dot{a}_{t+1}$ and the third-party action $\dot{a}_{t+1}$ are the same random block: the difference is only whether the gate sampled $\gamma_k = 1$ (counterfactual, not observed) or $\gamma_k = 0$ (third-party, observed) at position $k$. The precise distinction between the different types of $\mathcal{A}$-tokens is important; we will also refer to them as //factual// (first-person, agent-generated), //counterfactual//, and //third-party// $\mathcal{A}$-tokens.
@@ Line 545: / Line 549: @@
 Assume $(\Sigma,\Gamma,\pi,\mu)$ is an interaction system where $\pi := M$ is the //universal semimeasure// and $\mu$ is a //primitive measure//. Let $(k_i)_{i \ge 1}$ be an action-slot schedule. The following conditions hold:
-- **Action-slot is chosen by coin flip.**
+  * **Action-slot is chosen by coin flip.** At each $k_i$, the gate draws $\gamma(k_i) \sim \mathrm{Bernoulli}(\rho_i)$, $\rho_i \in (0,1)$, where $\rho_i$ is a chronological function of the agent-visible history $h_i$. Conditional on $h_i$, the bit $\gamma(k_i)$ is independent of the world’s $\mathcal{A}$-token $\dot{a}^{(k_i)}$ at $k_i$.
-  At each $k_i$, the gate draws
-  $$
-  \gamma(k_i) \sim \mathrm{Bernoulli}(\rho_i),
-  \qquad
-  \rho_i \in (0,1),
-  $$
-  where $\rho_i$ is a chronological function of the agent-visible history $h_i$. Conditional on $h_i$, the bit $\gamma(k_i)$ is independent of the world’s $\mathcal{A}$-token $\dot{a}^{(k_i)}$ at $k_i$.
-- **Gate held fixed through action-slot.**
+  * **Gate held fixed through action-slot.** The gate holds the value of $\gamma(k_i)$ fixed throughout the $\mathcal{A}$-token beginning at $k_i$. If $\gamma(k_i)=0$, the world writes the $\mathcal{A}$-token, so it is a third-party action. If $\gamma(k_i)=1$, the agent writes the $\mathcal{A}$-token, so it becomes an intervention $\hat{a}$ from the agent’s view.
-  The gate holds the value of $\gamma(k_i)$ fixed throughout the $\mathcal{A}$-token beginning at $k_i$. If $\gamma(k_i)=0$, the world writes the $\mathcal{A}$-token, so it is a third-party action. If $\gamma(k_i)=1$, the agent writes the $\mathcal{A}$-token, so it becomes an intervention $\hat{a}$ from the agent’s view.
-- **Infinitely many agent-written slots.**
+  * **Infinitely many agent-written slots.** With probability $1$, $\gamma(k_i)=1$ occurs for infinitely many $i$.
-  With probability $1$, $\gamma(k_i)=1$ occurs for infinitely many $i$.
 **Induced agent interventions and world targets.**
-Before we proceed, we need to clarify the indexing of action slots, and in particular their substrate position versus agent-time. According to the standard setup, the schedule specifies substrate positions $k_1 < k_2 < \cdots$. Then $\dot{a}^{(k_i)} \in \mathcal{A}$ denotes the $\mathcal{A}$-token the world would write starting at $k_i$. If $\gamma(k_i)=0$ this token is realized on-path as an embedded third-party action; if $\gamma(k_i)=1$ it is only a counterfactual target. To index only the factual actions, the slots assigned to the agent, let
+Before we proceed, we need to clarify the indexing of action slots, and in particular their substrate position versus agent-time. According to the standard setup, the schedule specifies substrate positions $k_1 < k_2 < \cdots$. Then $\dot{a}^{(k_i)} \in \mathcal{A}$ denotes the $\mathcal{A}$-token the world would write starting at $k_i$. If $\gamma(k_i)=0$ this token is realized on-path as an embedded third-party action; if $\gamma(k_i)=1$ it is only a counterfactual target. To index only the factual actions, the slots assigned to the agent, let $i_1 < i_2 < \cdots$ be the random indices with $\gamma(k_{i_t}) = 1$. For each $t \ge 1$, define $a_{t+1} \in \mathcal{A}$ as the $\mathcal{A}$-token the agent actually writes at $k_{i_t}$, and define the corresponding counterfactual target by $\dot{a}_{t+1} := \dot{a}^{(k_{i_t})}$.
-$$
-i_1 < i_2 < \cdots
-$$
-be the random indices with $\gamma(k_{i_t}) = 1$. For each $t \ge 1$, define $a_{t+1} \in \mathcal{A}$ as the $\mathcal{A}$-token the agent actually writes at $k_{i_t}$, and define the corresponding counterfactual target by
-$$
-\dot{a}_{t+1} := \dot{a}^{(k_{i_t})}.
-$$
 Notice that in this case the previous observation token was completed, and hence
@@ Line 580: / Line 565: @@
 **Deviation measures.**
-To quantify how closely intrinsic completion tracks the target continuation, we use $D_{\mathrm{KL}}$ and $\mathrm{TV}$. Since $M(\cdot \mid \cdot)$ may have missing mass, we complete it by adding a stop outcome $\bot \notin \mathcal{A}$ and writing
+To quantify how closely intrinsic completion tracks the target continuation, we use $D_{\mathrm{KL}}$ and $\mathrm{TV}$. Since $M(\cdot \mid \cdot)$ may have missing mass, we complete it by adding a stop outcome $\bot \notin \mathcal{A}$ and writing $\overline{\mathcal{A}} := \mathcal{A} \cup \{\bot\}$. Define $\overline{M}(a \mid \cdot) := M(a \mid \cdot) \quad \text{for } a \in \mathcal{A}$, and $\overline{M}(\bot \mid \cdot) := 1 - \sum_{a \in \mathcal{A}} M(a \mid \cdot)$.
-$$
+For the measure $\mu$ set $\overline{\mu}(a \mid \cdot) := \mu(a \mid \cdot) \quad \text{for } a \in \mathcal{A}$, and $
-\overline{\mathcal{A}} := \mathcal{A} \cup \{\bot\}.
+\overline{\mu}(\bot \mid \cdot) := 0$.
-$$
-Define
-$$
-\overline{M}(a \mid \cdot) := M(a \mid \cdot) \quad \text{for } a \in \mathcal{A},
-$$
-and
-$$
-\overline{M}(\bot \mid \cdot) := 1 - \sum_{a \in \mathcal{A}} M(a \mid \cdot).
-$$
-For the measure $\mu$ set
-$$
-\overline{\mu}(a \mid \cdot) := \mu(a \mid \cdot) \quad \text{for } a \in \mathcal{A},
-$$
-and
-$$
-\overline{\mu}(\bot \mid \cdot) := 0.
-$$
 For distributions $P,Q$ on a countable set, define
@@ Line 1133: / Line 1093: @@
 Notice that we can combine a variety of schema rules that instantiate well-known decision principles and other preference structures, such as:
-- //Bayes-optimal finite-horizon POMDP control:// $u$ encodes an executable finite POMDP, horizon or discount, and reward parameters; $f$ outputs a Bayes-optimal adaptive controller.
+  * //Bayes-optimal finite-horizon POMDP control:// $u$ encodes an executable finite POMDP, horizon or discount, and reward parameters; $f$ outputs a Bayes-optimal adaptive controller.
-- //Safety-first / constrained control:// $u$ specifies dynamics plus a hard safety predicate, or budget constraint, and a secondary objective; $f$ outputs a controller that enforces the constraint when feasible and otherwise follows the specified fallback rule.
+  * //Safety-first / constrained control:// $u$ specifies dynamics plus a hard safety predicate, or budget constraint, and a secondary objective; $f$ outputs a controller that enforces the constraint when feasible and otherwise follows the specified fallback rule.
-- //Multi-objective tradeoffs:// $u$ provides multiple reward components and weights, or a specified scalarization; $f$ outputs the optimal controller under that tradeoff.
+  * //Multi-objective tradeoffs:// $u$ provides multiple reward components and weights, or a specified scalarization; $f$ outputs the optimal controller under that tradeoff.
-- //Choice from comparisons:// $u$ contains a finite set of candidates plus computable pairwise comparisons or rankings; $f$ outputs the candidate, or action program, selected by a computable revealed-preference rule.
+  * //Choice from comparisons:// $u$ contains a finite set of candidates plus computable pairwise comparisons or rankings; $f$ outputs the candidate, or action program, selected by a computable revealed-preference rule.
-- //Rule- / constitution-following:// $u$ encodes a finite set of rules, hard constraints, plus a computable tie-breaker; $f$ outputs an action or program satisfying the rules when feasible, and otherwise follows an explicitly encoded fallback.
+  * //Rule- / constitution-following:// $u$ encodes a finite set of rules, hard constraints, plus a computable tie-breaker; $f$ outputs an action or program satisfying the rules when feasible, and otherwise follows an explicitly encoded fallback.
-- //Program synthesis / tool protocol:// $u$ encodes a specification together with a computable evaluator or tool interface; $f$ outputs a program, or macro-action, that passes the evaluator, or an explicit next repair step in an iterative protocol.
+  * //Program synthesis / tool protocol:// $u$ encodes a specification together with a computable evaluator or tool interface; $f$ outputs a program, or macro-action, that passes the evaluator, or an explicit next repair step in an iterative protocol.
-- //Norms / dialogue acts:// $u$ encodes an interaction context together with a computable norm taxonomy; $f$ outputs an appropriate dialogue act, such as apologize, clarify, refuse, or defer, consistent with the norms and the stated context.
+  * //Norms / dialogue acts:// $u$ encodes an interaction context together with a computable norm taxonomy; $f$ outputs an appropriate dialogue act, such as apologize, clarify, refuse, or defer, consistent with the norms and the stated context.
 On prompts of the corresponding type, the agent will behave “as if” following the schema.