| Both sides previous revision Previous revision Next revision | Previous revision |
| uiai [2026/03/17 01:34] – [Assumption: Standard setup] pedroortega | uiai [2026/03/17 01:40] (current) – [Definition: Counterfactual action] pedroortega |
|---|
| Generate $(\dot{\gamma}_j,\dot{x}_j)_{j \ge k}$ as follows: | Generate $(\dot{\gamma}_j,\dot{x}_j)_{j \ge k}$ as follows: |
| |
| - **Shared prefix:** Set $\dot{\gamma}_{\le k-1} := \gamma_{\le k-1}$, $\dot{x}_{\le k-1} := x_{\le k-1}$. | * **Shared prefix:** Set $\dot{\gamma}_{\le k-1} := \gamma_{\le k-1}$, $\dot{x}_{\le k-1} := x_{\le k-1}$. |
| |
| - **Force an $\mathcal{A}$-block start:** Set $\dot{\gamma}_k := 1$. | * **Force an $\mathcal{A}$-block start:** Set $\dot{\gamma}_k := 1$. |
| |
| - **Evolve branch chronologically:** For $j \ge k$, first sample the next substrate symbol by $\dot{x}_j \sim \mu(\cdot \mid \underline{a\hat{o}}_{<t} a_t\,w\,\dot{x}_{k:j-1})$, so $\mu$ emits the content of the forced $\mathcal{A}$-block in the branch, conditioned on the shared past and the already-emitted branch block prefix. Then sample the next gate value by | * **Evolve branch chronologically:** For $j \ge k$, first sample the next substrate symbol by $\dot{x}_j \sim \mu(\cdot \mid \underline{a\hat{o}}_{<t} a_t\,w\,\dot{x}_{k:j-1})$, so $\mu$ emits the content of the forced $\mathcal{A}$-block in the branch, conditioned on the shared past and the already-emitted branch block prefix. Then sample the next gate value by |
| $$ | $$ |
| \dot{\gamma}_{j+1} \sim \Gamma(\cdot \mid \dot{\gamma}_{\le j}, \dot{x}_{\le j}). | \dot{\gamma}_{j+1} \sim \Gamma(\cdot \mid \dot{\gamma}_{\le j}, \dot{x}_{\le j}). |
| To define the world’s $\mathcal{A}$-continuation at $k$, run the following tokenization procedure, initialized from the already-written on-path transcript up to $k-1$. Let $(\dot{\gamma}_j)_{j \ge k}$ be generated as follows: | To define the world’s $\mathcal{A}$-continuation at $k$, run the following tokenization procedure, initialized from the already-written on-path transcript up to $k-1$. Let $(\dot{\gamma}_j)_{j \ge k}$ be generated as follows: |
| |
| - **Shared prefix:** Set $\dot{\gamma}_{\le k-1} := \gamma_{\le k-1}$. | * **Shared prefix:** Set $\dot{\gamma}_{\le k-1} := \gamma_{\le k-1}$. |
| |
| - **Force an $\mathcal{A}$-block start:** Set $\dot{\gamma}_{k} := 1$. | * **Force an $\mathcal{A}$-block start:** Set $\dot{\gamma}_{k} := 1$. |
| |
| - **Read transcript chronologically:** For $j \ge k$, let $x_j$ be the next substrate symbol generated by the world on-path. Then sample the next gate value by | * **Read transcript chronologically:** For $j \ge k$, let $x_j$ be the next substrate symbol generated by the world on-path. Then sample the next gate value by |
| $$ | $$ |
| \dot{\gamma}_{j+1} \sim \Gamma(\cdot \mid \dot{\gamma}_{\le j}, x_{\le j}). | \dot{\gamma}_{j+1} \sim \Gamma(\cdot \mid \dot{\gamma}_{\le j}, x_{\le j}). |
| Assume $(\Sigma,\Gamma,\pi,\mu)$ is an interaction system where $\pi := M$ is the //universal semimeasure// and $\mu$ is a //primitive measure//. Let $(k_i)_{i \ge 1}$ be an action-slot schedule. The following conditions hold: | Assume $(\Sigma,\Gamma,\pi,\mu)$ is an interaction system where $\pi := M$ is the //universal semimeasure// and $\mu$ is a //primitive measure//. Let $(k_i)_{i \ge 1}$ be an action-slot schedule. The following conditions hold: |
| |
| - **Action-slot is chosen by coin flip.** | * **Action-slot is chosen by coin flip.** At each $k_i$, the gate draws $\gamma(k_i) \sim \mathrm{Bernoulli}(\rho_i)$, $\rho_i \in (0,1)$, where $\rho_i$ is a chronological function of the agent-visible history $h_i$. Conditional on $h_i$, the bit $\gamma(k_i)$ is independent of the world’s $\mathcal{A}$-token $\dot{a}^{(k_i)}$ at $k_i$. |
| At each $k_i$, the gate draws $\gamma(k_i) \sim \mathrm{Bernoulli}(\rho_i)$, $\rho_i \in (0,1)$, where $\rho_i$ is a chronological function of the agent-visible history $h_i$. Conditional on $h_i$, the bit $\gamma(k_i)$ is independent of the world’s $\mathcal{A}$-token $\dot{a}^{(k_i)}$ at $k_i$. | |
| |
| - **Gate held fixed through action-slot.** | * **Gate held fixed through action-slot.** The gate holds the value of $\gamma(k_i)$ fixed throughout the $\mathcal{A}$-token beginning at $k_i$. If $\gamma(k_i)=0$, the world writes the $\mathcal{A}$-token, so it is a third-party action. If $\gamma(k_i)=1$, the agent writes the $\mathcal{A}$-token, so it becomes an intervention $\hat{a}$ from the agent’s view. |
| The gate holds the value of $\gamma(k_i)$ fixed throughout the $\mathcal{A}$-token beginning at $k_i$. If $\gamma(k_i)=0$, the world writes the $\mathcal{A}$-token, so it is a third-party action. If $\gamma(k_i)=1$, the agent writes the $\mathcal{A}$-token, so it becomes an intervention $\hat{a}$ from the agent’s view. | |
| |
| - **Infinitely many agent-written slots.** | * **Infinitely many agent-written slots.** With probability $1$, $\gamma(k_i)=1$ occurs for infinitely many $i$. |
| With probability $1$, $\gamma(k_i)=1$ occurs for infinitely many $i$. | |
| |
| **Induced agent interventions and world targets.** | **Induced agent interventions and world targets.** |
| Notice that we can combine a variety of schema rules that instantiate well-known decision principles and other preference structures, such as: | Notice that we can combine a variety of schema rules that instantiate well-known decision principles and other preference structures, such as: |
| |
| - //Bayes-optimal finite-horizon POMDP control:// $u$ encodes an executable finite POMDP, horizon or discount, and reward parameters; $f$ outputs a Bayes-optimal adaptive controller. | * //Bayes-optimal finite-horizon POMDP control:// $u$ encodes an executable finite POMDP, horizon or discount, and reward parameters; $f$ outputs a Bayes-optimal adaptive controller. |
| - //Safety-first / constrained control:// $u$ specifies dynamics plus a hard safety predicate, or budget constraint, and a secondary objective; $f$ outputs a controller that enforces the constraint when feasible and otherwise follows the specified fallback rule. | * //Safety-first / constrained control:// $u$ specifies dynamics plus a hard safety predicate, or budget constraint, and a secondary objective; $f$ outputs a controller that enforces the constraint when feasible and otherwise follows the specified fallback rule. |
| - //Multi-objective tradeoffs:// $u$ provides multiple reward components and weights, or a specified scalarization; $f$ outputs the optimal controller under that tradeoff. | * //Multi-objective tradeoffs:// $u$ provides multiple reward components and weights, or a specified scalarization; $f$ outputs the optimal controller under that tradeoff. |
| - //Choice from comparisons:// $u$ contains a finite set of candidates plus computable pairwise comparisons or rankings; $f$ outputs the candidate, or action program, selected by a computable revealed-preference rule. | * //Choice from comparisons:// $u$ contains a finite set of candidates plus computable pairwise comparisons or rankings; $f$ outputs the candidate, or action program, selected by a computable revealed-preference rule. |
| - //Rule- / constitution-following:// $u$ encodes a finite set of rules, hard constraints, plus a computable tie-breaker; $f$ outputs an action or program satisfying the rules when feasible, and otherwise follows an explicitly encoded fallback. | * //Rule- / constitution-following:// $u$ encodes a finite set of rules, hard constraints, plus a computable tie-breaker; $f$ outputs an action or program satisfying the rules when feasible, and otherwise follows an explicitly encoded fallback. |
| - //Program synthesis / tool protocol:// $u$ encodes a specification together with a computable evaluator or tool interface; $f$ outputs a program, or macro-action, that passes the evaluator, or an explicit next repair step in an iterative protocol. | * //Program synthesis / tool protocol:// $u$ encodes a specification together with a computable evaluator or tool interface; $f$ outputs a program, or macro-action, that passes the evaluator, or an explicit next repair step in an iterative protocol. |
| - //Norms / dialogue acts:// $u$ encodes an interaction context together with a computable norm taxonomy; $f$ outputs an appropriate dialogue act, such as apologize, clarify, refuse, or defer, consistent with the norms and the stated context. | * //Norms / dialogue acts:// $u$ encodes an interaction context together with a computable norm taxonomy; $f$ outputs an appropriate dialogue act, such as apologize, clarify, refuse, or defer, consistent with the norms and the stated context. |
| |
| On prompts of the corresponding type, the agent will behave “as if” following the schema. | On prompts of the corresponding type, the agent will behave “as if” following the schema. |