Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| bmas [2026/06/19 15:26] – pedroortega | bmas [2026/06/20 13:18] (current) – [Safe AI Should be Bounded and Multi-Agent] pedroortega | ||
|---|---|---|---|
| Line 3: | Line 3: | ||
| **David Hyland, Daniel Jarne Ornia, Nicholas Bishop, Joel Dyer, Olivia Macmillan-Scott, | **David Hyland, Daniel Jarne Ornia, Nicholas Bishop, Joel Dyer, Olivia Macmillan-Scott, | ||
| + | //Keywords: AI safety, bounded agency, multi-agent systems, modularity, verification, | ||
| - | //Keywords: AI safety, bounded agency, multi-agent systems, modularity, governance. // | + | {{ bmas_position.pdf | Position Paper, }} June 2026 |
| + | ===== Abstract ===== | ||
| - | Position Paper\\ | + | The scaling paradigm treats bounds on compute, memory, information, |
| - | June 2026\\ | + | |
| - | ===== Summary ===== | + | The safety claim is architectural. If unsafe behaviour requires a conjunction of capabilities, |
| - | The dominant story of recent AI progress is scaling: make models larger, train on more data, give them more compute, and hope that broader capability emerges. This paper argues for a complementary design principle. Instead of treating boundedness as a limitation to overcome, we should treat it as a safety-relevant feature to engineer. | + | ===== Introduction ===== |
| - | The proposal is **Bounded Multi-Agent Systems** (BMAS): | + | Current frontier |
| - | ===== Core idea ===== | + | BMAS starts from a different decomposition. A task induces a capability profile: reasoning, knowledge, coding, planning, tool use, verification, |
| - | A monolithic AI system tends to concentrate many capacities | + | The BMAS proposal is to design systems |
| - | BMAS instead asks whether capability can be distributed across specialized components. A planner can plan, a retriever can retrieve information, | + | ===== Bounded agents ===== |
| - | This is not a claim that multi-agent systems are always better than monolithic systems. The paper’s position is more careful: different architectures are appropriate for different tasks, budgets, risks, and assurance requirements. BMAS expands the design space for safe AI rather than replacing every existing approach. | + | A bounded |
| - | ===== Why boundedness helps ===== | + | Let $H$ be an unsafe behaviour. Suppose $H$ requires private information, |
| - | The capability argument | + | This is the basic logic of bounded agency. Safety is improved by removing direct causal paths from dangerous inputs |
| - | The safety argument is that bounded systems shift some risks into more familiar engineering territory. Narrower agents may be easier to understand and predict. Explicit interfaces make interactions easier to monitor. Local verification becomes possible at component boundaries. Redundancy can make the system more robust to the failure of any single | + | ===== Bounded multi-agent systems ===== |
| - | The practicality argument | + | A bounded multi-agent system is a collection of bounded agents with designed interfaces. |
| - | ===== A motivating safety example ===== | + | The interfaces are central. |
| - | The paper highlights | + | The relevant unit of design is therefore |
| - | The important move is not merely adding more agents. It is designing the interfaces so that dangerous combinations of information and affordances are deliberately avoided. | + | ===== Capability argument ===== |
| - | ===== Research agenda ===== | + | BMAS can create system-level capability without assigning broad capability to every component. This is a standard fact about organised systems. Firms, laboratories, |
| - | BMAS raises many open questions. We need better methods for designing verifiable interfaces and contracts between agents; orchestration mechanisms for assigning tasks to bounded specialists; | + | The same structure applies |
| - | The paper also points toward an institutional research agenda. AI systems | + | There is also a learning argument. General models |
| - | ===== Takeaway ===== | + | This gives a concrete capability mechanism. A general model discovers a procedure; a bounded component stores or executes it; a verifier checks its outputs; an orchestrator decides when to invoke it. Capability is preserved through reuse, while the broad model need not retain all authority at execution time. |
| - | The central claim is simple: safe AI should not be pursued only by scaling individual models and then trying to control them after the fact. Boundedness should be part of the architecture. | + | ===== Safety argument ===== |
| - | A bounded multi-agent | + | BMAS changes the structure of failure. In a monolithic system, unsafe behaviour may arise from an internal trajectory that is difficult to observe. In a BMAS, the corresponding trajectory must pass through messages, tool calls, delegation decisions, verifier outputs, and execution gates. These are observable events. |
| + | |||
| + | This makes monitoring more precise. | ||
| + | |||
| + | The “lethal trifecta” gives the clearest example. Private data, untrusted content, and external communication form a dangerous conjunction. A system that reads private mail, browses arbitrary web pages, and sends messages can be induced to leak secrets if untrusted text controls the action channel. A BMAS can separate the three functions. The private-data agent summarizes under a restrictive contract. The untrusted-content agent works in a sandbox. The communication agent receives only approved content. A verifier mediates transfers. The unsafe path now requires a failure of the interface policy, not merely a failure of model judgment. | ||
| + | |||
| + | The same reasoning applies | ||
| + | |||
| + | ===== Verification and compositionality ===== | ||
| + | |||
| + | BMAS makes verification local. A verifier need not certify an entire intelligent system. It can certify that a proof follows | ||
| + | |||
| + | Local verification has a clear mathematical form. Suppose a component contract states that inputs in class $X$ must produce outputs in class $Y$. The verification problem is to test membership in $Y$ conditional on an input in $X$. This is easier than verifying arbitrary behaviour over the full state space of a general model. | ||
| + | |||
| + | The global problem remains compositional. If components satisfy | ||
| + | |||
| + | $$ | ||
| + | P_1 \wedge \cdots \wedge P_n \nRightarrow G. | ||
| + | $$ | ||
| + | |||
| + | A science of BMAS therefore requires composition theorems: conditions under which local guarantees survive routing, delegation, aggregation, | ||
| + | |||
| + | Redundancy also needs formal treatment. Multiple agents improve reliability only when their errors are sufficiently decorrelated. If a generator and verifier share the same blind spot, verification | ||
| + | |||
| + | ===== Governance argument ===== | ||
| + | |||
| + | BMAS gives governance concrete objects. A component can be audited. An interface can be specified. A permission can be revoked. A verifier can be benchmarked. A log can identify which agent proposed, checked, approved, and executed an action. | ||
| + | |||
| + | This matters for accountability. | ||
| + | |||
| + | Privacy also becomes architectural. Data minimisation is enforced by giving each component only the data required for its contract. A medical-data component need not communicate externally. A communication component need not inspect raw records. A planning component can operate on summaries. These restrictions reduce the harm caused | ||
| + | |||
| + | BMAS also supports distributed ownership. Data, tools, verifiers, and agents can be controlled by different parties. This matters for pluralistic alignment because different agents can represent different users, institutions, | ||
| + | |||
| + | ===== Risks specific to BMAS ===== | ||
| + | |||
| + | BMAS introduces risks that monolithic systems do not expose in the same form. | ||
| + | |||
| + | First, coordination can fail. A decomposition may omit a necessary dependency, duplicate work, or route subtasks to inappropriate specialists. | ||
| + | |||
| + | Second, interfaces can be porous. An agent may encode forbidden information in an allowed channel. A planner may smuggle instructions through a retrieval query. A verifier may approve an output outside its competence. | ||
| + | |||
| + | Third, agents can collude. Collusion is especially serious when agents share objectives, training data, or communication conventions. Monitoring must therefore inspect both content and communication patterns. | ||
| + | |||
| + | Fourth, capabilities may recombine. Even when no component individually has the capability profile required for harm, the system | ||
| + | |||
| + | These risks do not undermine the BMAS proposal. They specify its technical agenda. The object of study is the architecture-induced relation between local bounds and global behaviour. | ||
| + | |||
| + | ===== Open problems ===== | ||
| + | |||
| + | The paper identifies several problems that need theory and benchmarks. | ||
| + | |||
| + | **Task decomposition.** | ||
| + | Given a task, a resource budget, and an assurance requirement, | ||
| + | |||
| + | **Agent composition.** | ||
| + | Characterize how capabilities combine under hierarchy, debate, voting, markets, delegation, and redundancy. | ||
| + | |||
| + | **Multi-agent risk.** | ||
| + | Measure harms that arise from interaction rather than from any single component: collusion, drift, cascading failure, manipulation, | ||
| + | |||
| + | **Compositional safety.** | ||
| + | Prove conditions under which local component guarantees imply global system guarantees. | ||
| + | |||
| + | **Recoverability.** | ||
| + | Design systems whose failures are detectable, containable, | ||
| + | |||
| + | **Benchmarks.** | ||
| + | Compare BMAS and monolithic systems under matched task distributions, | ||
| + | |||
| + | ===== Conclusion ===== | ||
| + | |||
| + | BMAS treats boundedness as an architectural primitive. Bounds on information, | ||
| + | |||
| + | The research programme is precise. Identify the capability profile required by the task. Identify the capability profile required for unsafe behaviour. Design bounded agents whose composition covers the former while controlling paths to the latter. Prove that the interface rules preserve the intended safety properties. Evaluate the resulting architecture against monolithic baselines. | ||
| + | |||
| + | Safe AI requires this level of architectural analysis. Scaling determines what a model can do. BMAS determines which components may do what, with which information, | ||