uiai

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
uiai [2026/03/17 01:30] – [Definition: Third-party action] pedroortegauiai [2026/03/17 01:40] (current) – [Definition: Counterfactual action] pedroortega
Line 398: Line 398:
 Generate $(\dot{\gamma}_j,\dot{x}_j)_{j \ge k}$ as follows: Generate $(\dot{\gamma}_j,\dot{x}_j)_{j \ge k}$ as follows:
  
-**Shared prefix:** Set $\dot{\gamma}_{\le k-1} := \gamma_{\le k-1}$, $\dot{x}_{\le k-1} := x_{\le k-1}$.+  * **Shared prefix:** Set $\dot{\gamma}_{\le k-1} := \gamma_{\le k-1}$, $\dot{x}_{\le k-1} := x_{\le k-1}$.
  
-**Force an $\mathcal{A}$-block start:** Set $\dot{\gamma}_k := 1$.+  * **Force an $\mathcal{A}$-block start:** Set $\dot{\gamma}_k := 1$.
  
-**Evolve branch chronologically:** For $j \ge k$, first sample the next substrate symbol by $\dot{x}_j \sim \mu(\cdot \mid \underline{a\hat{o}}_{<t} a_t\,w\,\dot{x}_{k:j-1})$, so $\mu$ emits the content of the forced $\mathcal{A}$-block in the branch, conditioned on the shared past and the already-emitted branch block prefix. Then sample the next gate value by+  * **Evolve branch chronologically:** For $j \ge k$, first sample the next substrate symbol by $\dot{x}_j \sim \mu(\cdot \mid \underline{a\hat{o}}_{<t} a_t\,w\,\dot{x}_{k:j-1})$, so $\mu$ emits the content of the forced $\mathcal{A}$-block in the branch, conditioned on the shared past and the already-emitted branch block prefix. Then sample the next gate value by
 $$ $$
   \dot{\gamma}_{j+1} \sim \Gamma(\cdot \mid \dot{\gamma}_{\le j}, \dot{x}_{\le j}).   \dot{\gamma}_{j+1} \sim \Gamma(\cdot \mid \dot{\gamma}_{\le j}, \dot{x}_{\le j}).
Line 426: Line 426:
 To define the world’s $\mathcal{A}$-continuation at $k$, run the following tokenization procedure, initialized from the already-written on-path transcript up to $k-1$. Let $(\dot{\gamma}_j)_{j \ge k}$ be generated as follows: To define the world’s $\mathcal{A}$-continuation at $k$, run the following tokenization procedure, initialized from the already-written on-path transcript up to $k-1$. Let $(\dot{\gamma}_j)_{j \ge k}$ be generated as follows:
  
-**Shared prefix:** Set $\dot{\gamma}_{\le k-1} := \gamma_{\le k-1}$.+  * **Shared prefix:** Set $\dot{\gamma}_{\le k-1} := \gamma_{\le k-1}$.
  
-**Force an $\mathcal{A}$-block start:** Set $\dot{\gamma}_{k} := 1$.+  * **Force an $\mathcal{A}$-block start:** Set $\dot{\gamma}_{k} := 1$.
  
-**Read transcript chronologically:** For $j \ge k$, let $x_j$ be the next substrate symbol generated by the world on-path. Then sample the next gate value by +  * **Read transcript chronologically:** For $j \ge k$, let $x_j$ be the next substrate symbol generated by the world on-path. Then sample the next gate value by 
 $$ $$
   \dot{\gamma}_{j+1} \sim \Gamma(\cdot \mid \dot{\gamma}_{\le j}, x_{\le j}).   \dot{\gamma}_{j+1} \sim \Gamma(\cdot \mid \dot{\gamma}_{\le j}, x_{\le j}).
Line 545: Line 545:
 Assume $(\Sigma,\Gamma,\pi,\mu)$ is an interaction system where $\pi := M$ is the //universal semimeasure// and $\mu$ is a //primitive measure//. Let $(k_i)_{i \ge 1}$ be an action-slot schedule. The following conditions hold: Assume $(\Sigma,\Gamma,\pi,\mu)$ is an interaction system where $\pi := M$ is the //universal semimeasure// and $\mu$ is a //primitive measure//. Let $(k_i)_{i \ge 1}$ be an action-slot schedule. The following conditions hold:
  
-**Action-slot is chosen by coin flip.**   +  * **Action-slot is chosen by coin flip.** At each $k_i$, the gate draws $\gamma(k_i) \sim \mathrm{Bernoulli}(\rho_i)$$\rho_i \in (0,1)$, where $\rho_i$ is a chronological function of the agent-visible history $h_i$. Conditional on $h_i$, the bit $\gamma(k_i)$ is independent of the world’s $\mathcal{A}$-token $\dot{a}^{(k_i)}$ at $k_i$.
-  At each $k_i$, the gate draws +
-  $+
-  \gamma(k_i) \sim \mathrm{Bernoulli}(\rho_i), +
-  \qquad +
-  \rho_i \in (0,1), +
-  $$ +
-  where $\rho_i$ is a chronological function of the agent-visible history $h_i$. Conditional on $h_i$, the bit $\gamma(k_i)$ is independent of the world’s $\mathcal{A}$-token $\dot{a}^{(k_i)}$ at $k_i$.+
  
-**Gate held fixed through action-slot.**   +  * **Gate held fixed through action-slot.** The gate holds the value of $\gamma(k_i)$ fixed throughout the $\mathcal{A}$-token beginning at $k_i$. If $\gamma(k_i)=0$, the world writes the $\mathcal{A}$-token, so it is a third-party action. If $\gamma(k_i)=1$, the agent writes the $\mathcal{A}$-token, so it becomes an intervention $\hat{a}$ from the agent’s view.
-  The gate holds the value of $\gamma(k_i)$ fixed throughout the $\mathcal{A}$-token beginning at $k_i$. If $\gamma(k_i)=0$, the world writes the $\mathcal{A}$-token, so it is a third-party action. If $\gamma(k_i)=1$, the agent writes the $\mathcal{A}$-token, so it becomes an intervention $\hat{a}$ from the agent’s view.+
  
-**Infinitely many agent-written slots.**   +  * **Infinitely many agent-written slots.** With probability $1$, $\gamma(k_i)=1$ occurs for infinitely many $i$.
-  With probability $1$, $\gamma(k_i)=1$ occurs for infinitely many $i$.+
  
 **Induced agent interventions and world targets.**   **Induced agent interventions and world targets.**  
-Before we proceed, we need to clarify the indexing of action slots, and in particular their substrate position versus agent-time. According to the standard setup, the schedule specifies substrate positions $k_1 < k_2 < \cdots$. Then $\dot{a}^{(k_i)} \in \mathcal{A}$ denotes the $\mathcal{A}$-token the world would write starting at $k_i$. If $\gamma(k_i)=0$ this token is realized on-path as an embedded third-party action; if $\gamma(k_i)=1$ it is only a counterfactual target. To index only the factual actions, the slots assigned to the agent, let +Before we proceed, we need to clarify the indexing of action slots, and in particular their substrate position versus agent-time. According to the standard setup, the schedule specifies substrate positions $k_1 < k_2 < \cdots$. Then $\dot{a}^{(k_i)} \in \mathcal{A}$ denotes the $\mathcal{A}$-token the world would write starting at $k_i$. If $\gamma(k_i)=0$ this token is realized on-path as an embedded third-party action; if $\gamma(k_i)=1$ it is only a counterfactual target. To index only the factual actions, the slots assigned to the agent, let $i_1 < i_2 < \cdots$ be the random indices with $\gamma(k_{i_t}) = 1$. For each $t \ge 1$, define $a_{t+1} \in \mathcal{A}$ as the $\mathcal{A}$-token the agent actually writes at $k_{i_t}$, and define the corresponding counterfactual target by $\dot{a}_{t+1} := \dot{a}^{(k_{i_t})}$.
- +
-$+
-i_1 < i_2 < \cdots +
-$$ +
- +
-be the random indices with $\gamma(k_{i_t}) = 1$. For each $t \ge 1$, define $a_{t+1} \in \mathcal{A}$ as the $\mathcal{A}$-token the agent actually writes at $k_{i_t}$, and define the corresponding counterfactual target by +
- +
-$+
-\dot{a}_{t+1} := \dot{a}^{(k_{i_t})}. +
-$$+
  
 Notice that in this case the previous observation token was completed, and hence Notice that in this case the previous observation token was completed, and hence
Line 580: Line 561:
  
 **Deviation measures.**   **Deviation measures.**  
-To quantify how closely intrinsic completion tracks the target continuation, we use $D_{\mathrm{KL}}$ and $\mathrm{TV}$. Since $M(\cdot \mid \cdot)$ may have missing mass, we complete it by adding a stop outcome $\bot \notin \mathcal{A}$ and writing+To quantify how closely intrinsic completion tracks the target continuation, we use $D_{\mathrm{KL}}$ and $\mathrm{TV}$. Since $M(\cdot \mid \cdot)$ may have missing mass, we complete it by adding a stop outcome $\bot \notin \mathcal{A}$ and writing $\overline{\mathcal{A}} := \mathcal{A} \cup \{\bot\}$. Define $\overline{M}(a \mid \cdot) := M(a \mid \cdot) \quad \text{for } a \in \mathcal{A}$, and $\overline{M}(\bot \mid \cdot) := 1 - \sum_{a \in \mathcal{A}} M(a \mid \cdot)$.
  
-$$ +For the measure $\mu$ set $\overline{\mu}(a \mid \cdot) := \mu(a \mid \cdot) \quad \text{for } a \in \mathcal{A}$, and $ 
-\overline{\mathcal{A}} := \mathcal{A} \cup \{\bot\}. +\overline{\mu}(\bot \mid \cdot) := 0$.
-$$ +
- +
-Define +
- +
-$$ +
-\overline{M}(a \mid \cdot) := M(a \mid \cdot) \quad \text{for } a \in \mathcal{A}, +
-$$ +
- +
-and +
- +
-$$ +
-\overline{M}(\bot \mid \cdot) := 1 - \sum_{a \in \mathcal{A}} M(a \mid \cdot). +
-$$ +
- +
-For the measure $\mu$ set +
- +
-$+
-\overline{\mu}(a \mid \cdot) := \mu(a \mid \cdot) \quad \text{for } a \in \mathcal{A}, +
-$$ +
- +
-and +
- +
-$+
-\overline{\mu}(\bot \mid \cdot) := 0. +
-$$+
  
 For distributions $P,Q$ on a countable set, define For distributions $P,Q$ on a countable set, define
Line 1133: Line 1089:
 Notice that we can combine a variety of schema rules that instantiate well-known decision principles and other preference structures, such as: Notice that we can combine a variety of schema rules that instantiate well-known decision principles and other preference structures, such as:
  
-//Bayes-optimal finite-horizon POMDP control:// $u$ encodes an executable finite POMDP, horizon or discount, and reward parameters; $f$ outputs a Bayes-optimal adaptive controller. +  * //Bayes-optimal finite-horizon POMDP control:// $u$ encodes an executable finite POMDP, horizon or discount, and reward parameters; $f$ outputs a Bayes-optimal adaptive controller. 
-//Safety-first / constrained control:// $u$ specifies dynamics plus a hard safety predicate, or budget constraint, and a secondary objective; $f$ outputs a controller that enforces the constraint when feasible and otherwise follows the specified fallback rule. +  //Safety-first / constrained control:// $u$ specifies dynamics plus a hard safety predicate, or budget constraint, and a secondary objective; $f$ outputs a controller that enforces the constraint when feasible and otherwise follows the specified fallback rule. 
-//Multi-objective tradeoffs:// $u$ provides multiple reward components and weights, or a specified scalarization; $f$ outputs the optimal controller under that tradeoff. +  //Multi-objective tradeoffs:// $u$ provides multiple reward components and weights, or a specified scalarization; $f$ outputs the optimal controller under that tradeoff. 
-//Choice from comparisons:// $u$ contains a finite set of candidates plus computable pairwise comparisons or rankings; $f$ outputs the candidate, or action program, selected by a computable revealed-preference rule. +  //Choice from comparisons:// $u$ contains a finite set of candidates plus computable pairwise comparisons or rankings; $f$ outputs the candidate, or action program, selected by a computable revealed-preference rule. 
-//Rule- / constitution-following:// $u$ encodes a finite set of rules, hard constraints, plus a computable tie-breaker; $f$ outputs an action or program satisfying the rules when feasible, and otherwise follows an explicitly encoded fallback. +  //Rule- / constitution-following:// $u$ encodes a finite set of rules, hard constraints, plus a computable tie-breaker; $f$ outputs an action or program satisfying the rules when feasible, and otherwise follows an explicitly encoded fallback. 
-//Program synthesis / tool protocol:// $u$ encodes a specification together with a computable evaluator or tool interface; $f$ outputs a program, or macro-action, that passes the evaluator, or an explicit next repair step in an iterative protocol. +  //Program synthesis / tool protocol:// $u$ encodes a specification together with a computable evaluator or tool interface; $f$ outputs a program, or macro-action, that passes the evaluator, or an explicit next repair step in an iterative protocol. 
-//Norms / dialogue acts:// $u$ encodes an interaction context together with a computable norm taxonomy; $f$ outputs an appropriate dialogue act, such as apologize, clarify, refuse, or defer, consistent with the norms and the stated context.+  //Norms / dialogue acts:// $u$ encodes an interaction context together with a computable norm taxonomy; $f$ outputs an appropriate dialogue act, such as apologize, clarify, refuse, or defer, consistent with the norms and the stated context.
  
 On prompts of the corresponding type, the agent will behave “as if” following the schema. On prompts of the corresponding type, the agent will behave “as if” following the schema.
  • uiai.1773711050.txt.gz
  • Last modified: 2026/03/17 01:30
  • by pedroortega