Organizations today grapple with information overload—a deluge of messages, meetings, and documents that contain critical decisions and knowledge, but in unstructured, noisy forms. Conventional approaches, such as retrieval-augmented generation (RAG), enhance large language models (LLMs) with document recall, yet fall short on temporal awareness, ownership tracking, and actionability. We introduce Signal–Context Architecture (SCA), a model-agnostic AI framework that separates live information "signals" from static knowledge "context," then fuses them with a validation step to produce executive-grade answers with provenance. In SCA, an Insights Agent first processes high-velocity signals (e.g. meeting transcripts, chats, emails) into a structured Decision Ledger of accomplishments, action items, and decisions. Meanwhile, a Knowledge Layer curates context from institutional memory (documents, wikis, policies), weighting evidence by recency and source authority. A Deep Research Agent then composes answers by combining the Decision Ledger and Knowledge Layer, ensuring that every answer is grounded in both recent events and trusted documentation. SCA directly addresses key gaps of standard RAG pipelines—time sensitivity, accountability, follow-through, and provenance—by binding events to evidence before generation. We detail the SCA design principles, system pipeline, and data model. Finally, we discuss an initial implementation and measurements in an enterprise setting, illustrating how SCA delivers precise recall (e.g. "Who decided what, when, and why?") and board-level synthesis (e.g. drafting quarterly plans) with citations, timestamps, and deep links to source data. The result is an AI assistant that turns conversation chaos into actionable, audit-ready intelligence, future-proofed by a model-agnostic approach and human-in-the-loop governance.
Contents
Introduction
Modern enterprises run on continuous flows of information—from rapid-fire Slack threads and meeting discussions to vast archives of documents and knowledge bases. Extracting reliable, decision-ready insight from this noisy, ever-changing data is a grand challenge. Generative AI offers potential solutions, but off-the-shelf large language models struggle with factuality and recency, especially in high-stakes business contexts. Retrieval-Augmented Generation (RAG) techniques improve knowledge recall by feeding LLMs with relevant documents, allowing models to cite sources and reduce hallucinations. However, simply augmenting models with document chunks has proven insufficient for organizational intelligence.
Several critical gaps remain when deploying AI assistants for executives and teams:
- Temporal Awareness: Business decisions are time-sensitive; what changed yesterday often matters more than a static document from last year. Traditional RAG typically relies on a fixed index (e.g. a snapshot of Wikipedia or a company drive) and thus may miss recent updates. Even with periodic index updates, answers about evolving situations (e.g. "Who is the current president?" or "What was decided in last week's meeting?") are challenging. Recent research has begun incorporating temporality into retrieval models, confirming that time-aware processing is needed for up-to-date answers.
- Ownership and Accountability: Executives ask not only what happened, but who decided it, when, and why. Standard LLM or RAG approaches do not inherently track decision provenance (e.g. which person or meeting produced a decision). In multi-party work streams, critical context lies in dialogue acts like commitments or decisions. Identifying such acts is non-trivial—decisions can be phrased ambiguously ("Let's go with Option B" vs. an explicit "Decision: Choose Option B"). Prior work in NLP shows that with appropriate models and training, dialogues can be parsed for decisions and action items. Yet, few production AI systems attempt to capture decision genealogy — linking a decision to its origin and stakeholders.
- Follow-through (Actionability): Answers that merely summarize information have limited value if they don't facilitate action. In an organizational setting, a helpful answer should not only state facts but also highlight action items (what needs to be done, by whom) or surface next steps. For instance, an executive asking "What's blocking Project X this week?" expects a response that identifies blockers and owners, not just a generic summary. Purely retrieval-based systems return relevant text passages but don't synthesize to-do lists or status updates. Even advanced meeting summarization systems often miss translating decisions into actionable tasks. This gap between insight and action means work doesn't move forward.
- Provenance and Trust: In high-stakes environments, uncited claims don't get adopted. A CEO or board will rightly distrust an AI-generated recommendation if it's not backed by evidence. RAG methods introduced the ability for models to cite sources (analogous to footnotes) to improve user trust. However, not all implementations enforce strict citation of both recent events and long-term knowledge. Ensuring that every answer is traceable to a transcript line or document paragraph is essential for credibility. Moreover, the quality of sources matters: information pulled from an outdated policy or a random wiki page can mislead. Thus, provenance must be coupled with source validation (e.g. preferring authoritative or recent sources).
In summary, existing AI assistants often operate as monolithic black-box models trying to "understand everything at once." This fails to handle the dynamic, noisy reality of organizational data. Signal–Context Architecture (SCA) tackles these challenges by separating the problem into two streams—"hot" signals and "cold" context—and then unifying them with a verification step. By first structuring the live signal stream (to capture timelines, owners, and actions) and separately curating a trusted context store (to provide evidence), SCA can generate answers that are timely, accountable, actionable, and verifiable. We hypothesize that this dual-stream approach will significantly outperform vanilla RAG on executive decision support queries.
This paper presents SCA's design and an initial implementation. In Section 2, we situate SCA relative to related work in retrieval-augmented LLMs, meeting understanding, and knowledge management. Section 3 details the architecture of SCA, including the Signal pipeline with the Insights Agent, the Context pipeline with the Knowledge Layer, and the Deep Research Agent that fuses information. We describe how decisions are captured in a Decision Ledger and linked to supporting evidence. Section 4 outlines key design principles and implementation notes that make SCA durable and model-agnostic. We also provide a qualitative comparison of SCA vs. a traditional RAG system to highlight capability differences. Section 5 discusses our evaluation approach and early results, including metrics like decision recall time and action item extraction coverage. Finally, Section 6 concludes with limitations and future directions, such as integrating recommendation engines and multi-step workflow orchestration on top of SCA's structured knowledge.
Background and Related Work
Retrieval-Augmented Generation (RAG)
RAG is a family of techniques that combine LLMs with information retrieval from external data sources. Instead of relying solely on a model's parametric knowledge, a RAG system fetches relevant documents (usually via vector similarity search or dense retrievers) and provides them as context to the LLM during generation. Lewis et al. (2020) introduced the term "RAG" in their seminal work, and showed that augmenting a generative model with retrieved Wikipedia passages improved factual question-answering. The approach has since been widely adopted in hundreds of papers and many commercial services. The appeal of RAG lies in its modularity (one can update the external knowledge source without retraining the model) and its propensity to produce answers with traceable sources. By giving models "footnotes" to cite, RAG helps mitigate hallucinations and builds user trust. For instance, Nvidia's RAG reference architecture emphasizes that citing sources makes the LLM's responses more reliable for users.
Despite these strengths, basic RAG implementations have limitations that motivate SCA. Traditional RAG typically treats the knowledge base as a static corpus (e.g. a fixed snapshot of company documents). As discussed, this makes it slow to capture temporal changes. Researchers have proposed extensions like TempRAG or Time-aware RALM to incorporate temporality by indexing multiple versions of documents or adding time metadata to queries. These approaches confirm that simply swapping in a new corpus is not always sufficient for domains with frequent updates. SCA addresses temporality not just by keeping context up-to-date, but by maintaining an event log of what recently happened (the Decision Ledger). Another limitation of RAG is that it usually retrieves text passages that loosely match the query, but it does not know about structured events (like "decision made in Meeting X"). Our approach can be seen as injecting a knowledge graph of decisions and actions into the retrieval loop, rather than retrieving raw text alone. Lastly, while RAG can provide citations to documents, it doesn't inherently capture why that document is relevant (e.g. was it the spec that a decision was based on, or a policy that constrains it?). SCA's validation step explicitly links events to evidence, aiming to ensure that the evidence is supportive and contextual (analogous to efforts in verifiable QA where the system checks that retrieved docs truly support the answer).
Meeting Summarization and Action Item Extraction
There is a growing body of work in applying NLP and LLMs to meeting transcripts and workplace conversations. Recent LLM-based systems can generate meeting recaps, often focusing on summaries of discussion points or highlighting key decisions and action items. For example, Wu et al. (2023) design an LLM-powered meeting recap system that produces two kinds of outputs: important highlights and structured minutes (with sections for decisions, tasks, etc.). Their user study found the approach promising for efficiency, but noted limitations: the automated summaries sometimes missed important details, mis-attributed information, or failed to gauge what was important to participants. These shortcomings underline the need for improved accuracy in identifying truly salient information like decisions and commitments. Traditional NLP approaches have treated decision detection and action item extraction as classification problems on dialogue acts. As one example, Bhattacharya (2020) explored fine-tuning BERT to identify dialogue acts such as decisions and action items in multi-party meetings, achieving state-of-the-art results on benchmark datasets. Such research shows it is feasible to parse raw transcripts into structured records, although it requires careful handling of ambiguity and context. SCA's Insights Agent builds on these ideas by using advanced language models to structure the raw conversations into a machine-interpretable ledger. Unlike a generic meeting summarizer, the Insights Agent is specialized to extract a Decision Ledger comprising clearly labeled Decisions (with who/what/when/why), Action Items (with owner and due date), and Accomplishments (deliverables or milestones achieved). This structured approach echoes the emerging practice in some teams of maintaining "decision logs" or "decision registers" for meetings and projects, whether manually or semi-automatically.
Organizational Memory and Knowledge Management
SCA's Context (Cold) layer relates to long-standing concepts in knowledge management, where organizations strive to build a "single source of truth" for policies, specs, and historical decisions. Corporate wikis, intranets, or document repositories (SharePoint, Confluence, Notion, etc.) serve as institutional memory, but they often grow disorganized and outdated. Search engines and enterprise search appliances have been the traditional tool for retrieving information from these repositories. Modern semantic search using vector embeddings has greatly improved findability, even for unstructured text. Yet, relevance in enterprises is multifaceted: beyond keyword or semantic similarity, factors like document recency, authoritativeness, and usage frequency are important. SCA's Knowledge Layer incorporates these factors by design. It performs retrieval with semantic chunking and authority weighting, meaning that when searching context documents, it boosts content that is recent or authored by domain experts or frequently referenced. This approach is aligned with common practices in enterprise search ranking (e.g., boosting pages with recent edits or certain access patterns) and ensures that, say, the official "Security Policy" authored by the security team ranks above a random engineer's notes on security. By weighting for recency, SCA prioritizes fresh information when relevant, consistent with the notion that updated indexes yield better answers to time-sensitive queries. By weighting author credibility, SCA implements a trust model: not all sources are equal, and an executive answer should draw on validated, high-quality documents whenever possible. Finally, by tracking usage signals (how often a document is linked or viewed), the system infers which documents are considered important by the organization. All retrieved context fragments in SCA carry a citation (link to the source and metadata), so when the Deep Research Agent composes an answer, every claim can be traced back to its origin.
In summary, SCA synthesizes ideas from these areas: it is inspired by RAG's combination of retrieval and generation (but extends it with structured intermediates and temporal handling), it leverages advances in meeting AI to structure conversations into decisions/actions, and it employs enterprise search best practices to maintain a high-quality knowledge base. We next describe the architecture of SCA in detail.
SCA Architecture and Components
Dual-stream pipeline: signals, context, and fusion
At a high level, Signal–Context Architecture (SCA) implements a dual-stream pipeline with a subsequent fusion step (Figure 1). This approach is reminiscent of the Lambda architecture in big data systems, which separates a batch layer (cold path) for comprehensive but slow processing and a speed layer (hot path) for low-latency updates. In SCA, the "cold path" corresponds to the Context stream (institutional knowledge base), and the "hot path" corresponds to the Signal stream (live inputs). Both streams feed into a fusion layer where responses are generated. By separating to conquer and then unifying to understand, SCA ensures each part of the problem is handled by specialized mechanisms.
Figure 1 — SCA Dual-Stream Pipeline
3.1 Signals (Hot Stream) → Insights Agent → Decision Ledger
The Signals pipeline ingests high-velocity, high-volume data from live communication channels. Typical inputs include:
- Meetings: real-time audio/video meeting transcripts (from platforms like Zoom, Google Meet, Teams), potentially with speaker identification and timestamps.
- Chat and Email: messages from Slack/Teams channels, email threads, etc., often noisy with colloquialisms, reactions, or tangential comments.
- Project trackers: updates from issue trackers like Linear or Jira (tickets created/closed), which reflect decisions in project scope and bug triage.
These sources are rich in content about what just happened, but are unstructured and intermingled with noise. The Insights Agent is a specialized component (powered by one or multiple NLP/LLM models) that continuously or periodically parses the signal stream and extracts structured nuggets of information. The output of this agent is the Decision Ledger, a living structured record that organizes events into three key categories:
- Decisions: Records of decisions made, capturing who decided what, when, and optionally why (the rationale). A decision entry may point to the exact moment in a meeting transcript or the specific message where the decision was made (for traceability, e.g. a deep link to the video timestamp or Slack permalink). It also links to any available rationale or discussion context. Each decision has metadata: timestamp, decider (person or group), and links to related items (like tasks spawned or documents updated).
- Action Items: Records of tasks or follow-ups identified, including what needs to be done, who owns it, and by when (deadline) if stated. This essentially forms an automatically generated to-do list extracted from discussions ("Alice will do X by next week"). The agent may infer action items even when not explicitly stated ("We should review the design" implies an action item "Review the design"). Each action item is linked back to its source event for context and can be updated when completed.
- Accomplishments: Records of completed work or significant milestones achieved, with what was accomplished and when. For example, if in a meeting someone announces "Project Y was deployed to production last night," the agent can log this as an accomplishment (artifact deployed, timestamp). Accomplishments provide a historical ledger of deliverables and outcomes, which can later be used to answer questions like "What did the team deliver last month?"
Figure 2 — Decision Ledger Structure
The Insight Agent uses a mixture of techniques to populate the Decision Ledger. It employs temporal understanding (identifying references to time, deadlines, ordering of events) to place events correctly. It also builds a decision genealogy – linking related events across time. For instance, a decision made in a planning meeting may be confirmed later in a Slack thread, then implemented via a Pull Request. SCA explicitly links these into a chain: decided-by → confirmed-by → implemented-by relations, forming a graph of how an idea progresses from decision to execution. This provides context for follow-through; if someone asks later "Did we really implement feature X that was approved?", the system can traverse the graph and find if an accomplishment (e.g. PR merged) is linked to that decision. Such cross-references ensure accountability and prevent issues from falling through cracks or being re-decided due to lost context.
An engineering sync decides to migrate infrastructure from AWS to GCP. The Insights Agent creates a Decision entry: "Decision: Migrate infra to GCP — Decider: Luis Ortega; Date: Aug 12, 2025; Rationale: cost & performance; transcript @00:14:23."
The next day in Slack #infra, a confirmation is linked to the same decision. The RFC document "Infra Migration RFC v3" and a Jira ticket are cross-referenced. The enriched entry now connects people, time, rationale, and artifacts.
Later query: "Which meeting decided to migrate infra to GCP?" — answered instantly with person, date, rationale, transcript link, and Slack confirmation.
By structuring signals before trying to answer user questions, SCA reduces the reliance on the generative model to "remember" or infer these details. The knowledge of decisions and tasks is explicitly stored and can be retrieved with high precision (often via simple queries or filters on the ledger, rather than semantic search). This "structure before generation" principle ensures that, for pinpoint questions (e.g. who decided X?), the system can respond based on a database query to the ledger, which is far more precise than prompting an LLM to scan raw transcripts. Early anecdotal evidence suggests this leads to near-instant recall of decisions (we measure "decision recall time" in Section 5). Indeed, turning raw meetings into a queryable decision log has been shown to let AI assistants answer questions like "What did we commit to last Monday?" by referencing past meeting records, which is aligned with SCA's goals.
3.2 Context (Cold Stream) → Knowledge Layer → Evidence Store
The Context pipeline handles the organization's institutional memory — documents and knowledge that already exist, which we treat as relatively static (low velocity) but high value. This includes: internal wikis, documented policies, product specs, design docs, engineering runbooks, OKR spreadsheets, past strategy memos, etc. These artifacts are typically stored in systems like Notion, Confluence, Google Drive, SharePoint, GitHub (for code or markdown docs), and so on. The challenge here is not real-time parsing (as with signals) but rather retrieval and validation: given a query or a piece of information from the Decision Ledger, find the most relevant supporting content from this vast repository.
SCA's Knowledge Layer uses a combination of vector-based semantic search and symbolic filtering to retrieve candidate evidence fragments. We break documents into semantically coherent chunks (e.g. paragraphs or sections) and index them with embeddings. When searching, we incorporate not just the query text, but also metadata cues. For example, if the query is related to a known Decision (from the ledger) which has tags or links (like it's a security decision, or it involves Project Atlas), the retriever can constrain or boost results from the relevant project folder or with certain keywords. This ensures we don't retrieve irrelevant context.
Critically, the Knowledge Layer scoring algorithm emphasizes three factors:
- Recency: Recent documents or recent edits are scored higher, under the assumption that for many questions (especially operational or tactical ones), the latest information is more pertinent than stale data. For instance, if a user asks "What is the current pricing for our Pro plan?", a spec updated last week is more relevant than a deck from 2022. This doesn't mean old data is dropped – just down-weighted unless specifically relevant (sometimes older archives matter for historical questions).
- Authority: We incorporate an authority weight based on the source or author. If the query is about a security policy, a page written by the Security team or the CISO is considered more reliable than an unofficial note. Authority can be defined by directory (e.g., official policies folder), by role (executives' docs might carry more weight on strategy questions), or even by crowd signals (documents that many people have labeled as canonical). This approach aligns with how humans trust information – provenance matters. It's akin to ranking official documentation higher in search results, a practice in enterprise search relevance tuning.
- Popularity/Usage: If a document or snippet has been referenced or viewed frequently (especially in contexts related to the query), it's likely useful. For example, if a design decision was discussed in Slack and a particular spec link was shared in that discussion, that spec is very likely relevant to queries about that decision. The Knowledge Layer can use such signals (e.g., the Decision Ledger's link graph itself) to boost content that's "connected" to the current context in the organization's discourse. This dynamic is unique to SCA: by having the Signals and Context pipelines inform each other (the Decision Ledger contains links to documents; the document retrieval can consider those links), we achieve a contextual retrieval that standard vector search would miss.
Figure 3 — Knowledge Layer Scoring
All retrieved context fragments are stored or represented as Evidence objects, which include the fragment text and the source identifier (document name, URL or storage path, author, date). These Evidence objects are what the final agent will cite. They are kept small (e.g., a paragraph each) to ensure the LLM can absorb multiple pieces as needed and to allow fine-grained citation (pointing to the exact section of a doc, not the whole doc). The Knowledge Layer continuously refreshes indexes (for example, if someone creates a new Google Doc or updates a Confluence page, those changes are ingested so that recency ranking remains effective). We describe storage details in Section 4, but note here that we use a hybrid of keyword and vector indices: keywords for precise filters (like "limit to files in the 'Q1 OKR' folder"), vectors for semantic similarity.
By maintaining this curated, trusted evidence base, SCA ensures that when the generative model composes an answer, it has access to not just the raw organizational knowledge, but the validated and relevant subset of it. This mitigates the "garbage-in" problem of retrieval: if the wrong or low-quality context is retrieved, even a good LLM will produce a faulty answer. Our approach, analogous to multi-step retrieval or LLM-verified retrieval, could allow iterative refinement: the Deep Research Agent (next section) might detect if the evidence is insufficient and trigger a secondary query. For now, the Knowledge Layer's initial ranking tries to get it right on the first pass by using the structured context we have (timestamps, links, etc., from the ledger).
3.3 Fusion (Validation) → Deep Research Agent → Answer Generation
The final stage of SCA is the fusion of signals and context in order to fulfill a user's query or task. The Deep Research Agent is a orchestrator (which can be implemented as an LLM chain or an agentic loop) that spans both the Decision Ledger and the Knowledge Layer. When a question comes in – whether it's a natural language query from an executive or an automated trigger (like a daily briefing request) – the Deep Research Agent orchestrates a two-part strategy:
- Structured Query to Decision Ledger: It first checks if the query pertains to recent events or decisions. For example, if the query asks, "Which meeting decided XYZ?" or "What's the status of Project ABC this week?", these clearly relate to the Signals domain (decisions, actions, or accomplishments). The agent will query the structured Decision Ledger (using SQL-like queries or graph traversals) to fetch the relevant entries. This yields pinpoint data: e.g., a specific Decision entry with all its fields.
- Contextual Retrieval to Knowledge Base: In parallel or subsequent to the above, the agent formulates a retrieval query for the Knowledge Layer to get supporting evidence. If the question is narrow (like the meeting decision example), the supporting evidence might be the transcript snippet of that decision (which is effectively stored as part of the Decision Ledger, or could be fetched from a transcript store) and any document that was referenced in making that decision (e.g. an RFC). If the question is broader (like "Draft a Q1 plan for Enterprise Deals"), the agent will break it down into aspects and retrieve multiple pieces: historical objectives, recent sales commits, pricing decisions, etc., from the ledger and docs. Here the fusion aspect is crucial: the agent uses both the structured results and the unstructured evidence in composing its final answer.
Figure 4 — Query Flow Through Deep Research Agent
The output of the Deep Research Agent is an Answer Package containing: (a) a natural language answer or narrative that addresses the query, (b) citations in-line linking to both the Decision Ledger entries and the context documents used, and (c) if applicable, a list of recommended action items or next steps (with owners and due dates) relevant to the query. The inclusion of action items makes the answer immediately operational. For example, a question "What's blocking the Atlas migration?" might yield an answer: "Three blockers were identified: (1) SSO/SAML test suite failing — owner: Maya, due Friday; (2) IAM policy review pending Security — owner: Aria; (3) Terraform state drift — owner: DevOps. See linked Jira issues for details." Such an answer not only informs the executive of status but also lays out responsibility and encourages follow-up on each item (each blocker is linked perhaps to the tracking ticket or Slack thread where it was discussed). This moves the business forward rather than a generic "Atlas migration has some delays" summary.
Two broad classes of AI assistance enabled by SCA's Deep Research Agent:
Pinpoint Recall
Specific factual questions about a decision, event, or commitment. SCA retrieves the exact record — who, when, why — with deep links to the source transcript or message.
"Which meeting decided the Pro plan at $40/month?" → Decision entry + transcript timestamp + email approval
Synthesis & Strategy
Comprehensive analysis or draft requiring multiple pieces. The agent queries the ledger for all relevant decisions, retrieves knowledge base docs, and generates a grounded, cited narrative.
"Draft our Q1 plan for Enterprise Deals" → Cited objectives, owner assignments, milestones from ledger
Through these examples, we see that SCA's fusion of structured and unstructured data yields answers that are not only correct, but come with "receipts" – every claim is tied to a meeting record or a document snippet that the user can inspect. This level of transparency is indispensable for executive adoption: a busy decision-maker might skim the answer then click the deep link to verify a key point in the original source (transcript or file). If any detail is off, they can flag it, which brings us to a final piece: human-in-the-loop validation. In our implementation, the answers (especially the complex synthesized ones or any that enact changes) can be routed to an Agent Inbox for a person to review and approve. This ensures that automation doesn't run unchecked; humans provide oversight and can correct any mistakes in the Decision Ledger or Knowledge Layer as well, improving the system over time.
Design Principles and Implementation
Several design principles are baked into SCA to ensure the system is robust, adaptable, and enterprise-ready:
Principle 01
Structure Before Generation
Events are structured into a Decision Ledger before final answer generation, preventing hallucination and enforcing consistency. The model cannot invent facts about who decided what — it must rely on the ledger.
Principle 02
Events + Evidence Binding
Any answer must be grounded in both an event (signal) and evidence (context). If either is missing, the agent surfaces the gap rather than filling it with guesswork.
Principle 03
Receipts by Default
Every response includes citations and deep links to source. This is not optional — it creates an audit trail of AI-generated insights that decision-makers can verify on demand.
Principle 04
Minimal Surfaces, Maximum Integration
Designed to fit existing workflows — Command Palette UI, shareable meeting pages, Agent Inbox. One integrated experience rather than a dozen separate AI helpers.
Principle 05
Governance and Security Ready
SSO/SAML, RBAC data scoping, audit logs, and BYOK (Bring Your Own Key) model support from the ground up. Enterprise IT requirements are foundational constraints, not retrofitted features.
Architecture Implementation Notes
We have implemented SCA's prototype as a combination of cloud services and a desktop client:
Data Model
In our system, the following logical entities are defined:
- Event: A raw event from a signal source (e.g., one utterance in a meeting, or a message in Slack, or a new email). Events carry metadata (timestamp, source channel, speaker/sender).
- Decision: A normalized decision record derived from one or more events. It has fields (decision text, owner/decider, time, rationale, status) and links to the source event(s) and any follow-up events (confirmations, implementations).
- ActionItem: A task record with fields (task description, owner, due date, status) and link to the originating event or decision. Status can be open/closed; if closed, it may link to an Accomplishment event.
- Accomplishment: A record of a completed deliverable or milestone, with fields (description, timestamp, link to artifact if any, e.g., a URL to the deployed feature or merged PR).
- Evidence: A fragment of a document or knowledge base content, with text and a reference (doc id, section, author, date). We also store an "authority score" or source type with it.
- Link: A relationship between two of the above (e.g., Decision decided-by Event, Decision confirmed-by Event, Decision documented-in Evidence, ActionItem implemented-by Accomplishment, etc.). These links form the graph that is our Decision Ledger in a broader sense.
Figure 5 — SCA Data Model
These entities are stored in a document-oriented database that allows flexible querying (we used a combination of a graph database for links and a document store for the content). We also maintain vector indices for relevant fields (like embedding of decision text, of evidence text) to assist retrieval.
Pipelines
The Signals pipeline is implemented with a streaming ETL (extract-transform-load) process. We integrate with meeting platforms via APIs or webhooks to get live transcripts, which the Insights Agent (implemented as a service calling an LLM with custom prompts) processes at the end of each meeting or in chunks during the meeting. For chat and emails, we use event listeners (e.g., Slack API, Gmail API) to get new messages, and batch them for analysis periodically. The transform step applies the LLM to identify any new decisions or tasks. We fine-tuned a model for this extraction, and also use heuristics (e.g., keywords like "decided", "FYI" for accomplishments, action verbs for tasks). The output is loaded into the Decision Ledger store.
The Context pipeline involves a combination of a crawler (for file repositories) and a retriever. We use an open-source vector database (like FAISS or Milvus) to index content. A scheduling job updates the index for new or modified documents (with a change feed or polling). Retrieval is implemented as a service that accepts a complex query (including optional filters for recency or source) and returns top-N evidence chunks with scores.
The Fusion/Answering uses a multi-step agent implemented in a framework (we experimented with LangChain-style agent loops). It first queries the ledger (via direct DB query) if the question seems to match certain patterns (we built a classifier to route queries: e.g., contains "which meeting" → likely ledger; contains "plan" or "summary" → needs synthesis). It then calls the retriever for additional context. Finally, it constructs a prompt for a large language model that includes: a) a preamble with instructions to cite sources and include actions if relevant, b) the relevant Decision Ledger entries (formatted as structured text or a summary thereof), c) the evidence snippets (with source tags), and d) the user's question. We use a "chain-of-thought" style prompt where the agent is encouraged to first list relevant facts (with their sources) then formulate the answer. This resembles few-shot prompting for QA with citations. The LLM (GPT-4 in our prototype) generates an answer which we post-process to ensure citation formats are correct and that every claim has at least one citation. If any confidence issues arise (like the LLM gave an answer but our system detected no citation for a sentence), we flag it for human review. The answer is then delivered to the user via the desktop app's UI. If it's an action-type command (like an instruction to "email this to the team"), we route it to the Agent Inbox for the user to approve the actual email send.
Storage & Search
The Decision Ledger (events, decisions, etc.) is stored in a hybrid manner: we use a document store (NoSQL) for flexible querying of records by various fields, and we maintain both keyword indices (for fast exact match, e.g., find Decision where title contains "GCP migration") and vector indices (for semantic search of similar decisions). This hybrid search proved useful – for example, if a user asks something not directly recorded, like "Have we considered migrating to Azure?" even if no decision explicitly says that, a semantic search might pull the "migrate to GCP" decision as related (since Azure is similar context) and then the system can say "We didn't decide Azure; we chose GCP because…". The context documents are indexed in a vector DB as mentioned, and we also use a lightweight SQL for metadata (to do filtering by date or author).
To keep derived data fresh (the Decision Ledger is derived from raw events), we implemented a change data capture stream. If a message is edited or if a meeting transcript is corrected, we propagate updates to the ledger (with versioning for decisions if needed). This way the ledger isn't static; it evolves, and we can even trace how a decision record changed (audit trail).
Quality Control Loop
We have a continuous evaluation harness: sample questions are periodically run against the system (some handcrafted, some from real user queries with permission), and the results are checked for correctness. We measure recall and precision on decision queries (does it find the right meeting and decision?), citation accuracy (does each citation actually support the sentence it's attached to?), and action item validity (are suggested tasks actually relevant and not duplicates?). We also measure latency and cost. Based on these metrics, we adjust the system: e.g., if a certain type of question is often answered incorrectly, we might change the prompt or use a larger model for that case. The multi-model routing is an interesting aspect: because SCA is model-agnostic, we can choose cheaper or faster models for straightforward tasks. For instance, extracting action items might be done with a smaller fine-tuned model, whereas a board-level strategy answer might use GPT-4 for quality. We route by quality/cost/latency considerations, an approach that keeps the system efficient without sacrificing on important queries.
SCA vs. Vector Database + RAG Baseline
To concretely highlight the differences, the following table presents a comparison across key capabilities:
| Capability | Vector / RAG Baseline | SCA — Our Approach |
|---|---|---|
| Temporal understanding | Limited. Static snapshots; needs manual re-indexing for new data. | Built-in. Temporal data is first-class: decisions have timestamps, recent info prioritized. |
| Ownership & accountability | Not captured. Linking who/when is ad-hoc or lost. | Explicit. Every decision tied to an owner and time in the Decision Ledger. |
| Trusted sources | Partial. Can cite, but no source authority guarantee. | Enforced. Retrieval ranks by source credibility and recency. Bibliography of vetted sources. |
| Action orientation | Sometimes. Typically answer content only. | Full. Outputs action items with owners. Integrates with email/Slack for follow-through. |
| Model flexibility | High lock-in. Often built on a specific LLM API. | Agnostic. Modular design allows plugging in different LLMs per task. BYOK support. |
Table 1: Qualitative comparison of a typical vector database + RAG pipeline vs. the proposed Signal–Context Architecture. SCA addresses many gaps by design, providing temporal tracking, structured ownership, stronger provenance, action integration, and model-agnostic extensibility.
Durability and Evolution
SCA is not tied to any single model or even a single technology stack. As better transcription models or dialogue understanding models emerge, the Insights Agent can be upgraded. If a new, more powerful LLM appears, it can be integrated into the Deep Research Agent's toolkit. This future-proofing is intentional: the AI field moves fast, and enterprise systems must be able to incorporate advances without a complete overhaul. Similarly, SCA's separation of concerns (signals vs. context) means it can survive changes in data sources. For instance, if tomorrow Slack is replaced by Microsoft Teams, only the signal ingestion connector changes; the core idea of a Decision Ledger remains the same. Over time, the Decision Ledger becomes a compounding asset – it accumulates corporate knowledge that even new employees or new AI models can leverage. What's notable is that the ledger's value appreciates with usage: every week more decisions and outcomes are logged, making the AI's answers richer and more contextually grounded. In contrast, a naive AI assistant that doesn't log or learn from interactions is static or even forgetting as context windows slide.
Finally, by keeping a human-in-the-loop for important outputs (the Agent Inbox for approvals, the ability for users to give feedback on answers), SCA ensures that it augments rather than replaces human decision-making. All AI-proposed actions are draft-first, meaning the AI might draft an email or task, but a human sends or assigns it. This prevents errors or overreach from propagating without oversight, a critical safety feature for autonomous agents in the workplace. It's in line with recommendations for responsible AI deployment, where human oversight and auditability are paramount.
Evaluation and Preliminary Results
Evaluating a system like SCA requires measuring both its technical performance on information queries and its impact on human workflows. We outline our evaluation approach and any early results:
5.1 Technical Evaluation
We measure classic information retrieval and extraction metrics on tasks derived from real use cases:
- Decision Recall and Precision: We created a benchmark set of 50 questions asking for specific decisions (e.g. "When and who decided X?") where the ground truth is known from meeting notes. SCA's Decision Ledger lookup was able to answer a large majority directly. We measure the recall time (how fast the correct answer with link is produced) – in our tests, it was typically under 2 seconds for queries hitting the ledger, whereas a vector search baseline took several seconds to retrieve and often required the LLM to read through irrelevant text. We also measure precision/recall of the Insights Agent in correctly extracting decisions in the first place by manually annotating transcripts. Initial results show high precision (few false decisions logged) but some misses (recall ~0.8) for very implicit decisions; we are improving prompt fine-tuning to catch those.
- Action Item Extraction Coverage: To evaluate how well the system captures tasks, we compared the Action Items logged by SCA for a set of meetings to a human-generated list of action items for those meetings. The precision of logged items was around 90% (almost all AI-logged tasks were real tasks), and recall was around 75% (the AI missed some tasks that were phrased vaguely). This is on par with state-of-the-art dialogue act extraction results reported in literature. The missing tasks often had ambiguous language like "we should probably…", which we plan to handle by more sophisticated context understanding.
- Answer Accuracy and Support: For complex questions (synthesis queries), we performed a qualitative evaluation. We looked at 10 "board-level" questions (e.g. "Summarize Q3 outcomes and what we learned") and had domain experts rate the answers. They checked if all claims were supported by citations (supportiveness) and if any important point was missing (completeness). In all cases, the answers contained only claims that could be traced to a source (by design, since the system includes the citation), but occasionally the wording could be misleading (e.g. conflating two related decisions). Experts rated 8/10 answers as useful and accurate, and 2 as needing minor corrections. We consider this promising, though a larger formal user study is ongoing.
We also tested SCA versus a baseline RAG system on a set of temporal and ownership-related queries. For example, query: "What did the marketing team decide in July about the launch plan?" The RAG baseline (with vector search over all transcripts and docs) often returned a generic launch plan document (which didn't have the date context) or a hallucinated summary, whereas SCA was able to list the specific decisions from July meetings with dates and names. This showcases the value of temporal partitioning and the ledger.
5.2 User Adoption and Efficacy
Beyond accuracy, a key measure is whether SCA actually helps users (execs, managers, teams) make decisions faster or with more confidence. While a full longitudinal study is ongoing, we track several proxy metrics in our pilot deployments:
- Decision Recall Usage: How often do users use the system to look up past decisions? We log instances of queries that hit the Decision Ledger. In a pilot with 10 users over 3 weeks, there were ~5 lookups per user per week on average, with queries like "when did we agree on the new pricing?" being common. The click-through rate (CTR) on the provided deep links (e.g., user clicking the transcript link the AI gave) was 60%, indicating users do utilize the citations to verify or get more detail.
- Action Completion Rate: For action items the AI surfaces (e.g., in a weekly "what's blocking" report), we monitor whether those items get completed by the owners by the due date (from task system data). The idea is that if AI highlighting blockers leads to them being resolved, that's a positive outcome. Early anecdotal evidence: one team lead credited the AI's reminder of a forgotten task for getting it done before a deadline. We plan to quantify this more rigorously (perhaps tasks completed vs. not when mentioned by AI vs. not mentioned).
- Narrative Reuse: We examine if the narratives or draft plans the AI produces get used in real documents. For instance, if the AI drafts a Q1 plan, does the user end up copy-pasting large portions of it into the official plan doc? In one case, about 70% of the text from an AI-generated strategy draft (with citations) was incorporated into the final version, with some edits. This suggests that SCA's content can serve as a strong starting point for executive communications. We also see citations from the AI answer being kept in the final doc, which is interesting as it means even final human-edited docs now carry the references the AI provided (improving traceability of those docs too).
- Feedback and Learning: We gather user feedback through the Agent Inbox interface, where users can thumbs-up or down an answer or send corrections. This feedback loop is invaluable to identify errors (e.g., "AI misattributed this decision to Alice, but it was Bob"). We found that explicit errors are rare, but we did get feedback like "the rationale provided for decision X is not the main reason we did it" – which indicates the AI chose one justification from the transcript that the user felt wasn't the key one. We are exploring letting users edit the rationale field in the ledger in such cases, which will propagate to better answers next time.
Overall, these metrics and observations suggest that SCA can significantly improve the efficiency of information retrieval in an organizational setting, while maintaining the trust through verification. Users particularly appreciated the "one-click to source" aspect, aligning with research that shows users value the ability to drill down into sources for AI-provided answers. Performance-wise, the system is fast enough for interactive use (most answers returned in 5-8 seconds, which includes multiple retrievals and LLM calls; pure lookup queries return in under 2 seconds). We note that the quality of the output is heavily dependent on the quality of the ingested data: if meetings aren't transcribed correctly (ASR errors) or if people have side conversations off-record, the ledger can have gaps. Thus, our evaluation also looks at how robust the system is to imperfect input. Techniques like prompting the model to ignore garbled text, or fall back to asking a user if something was unclear, are being tested.
Limitations and Future Work
While SCA shows promise, there are limitations and open challenges:
- Recommendation and Proactivity: Thus far, SCA focuses on answering questions and generating content on demand. It does not yet proactively recommend decisions or flag issues unless asked. A logical next step is to build a Recommendation Agent on top of the Decision Ledger – for example, suggesting risk mitigations if many action items are overdue, or recommending a decision based on similar past decisions (collaborative filtering of decisions). We plan to explore recommendation features, but carefully: any recommendations will leverage the structured base (so they can cite why the recommendation is made, based on past patterns) and will likely go through human approval.
- Multi-step Workflow Automation: SCA currently can execute single actions in a draft manner (like drafting an email or creating a ticket from an action item), especially via its Command Palette or Agent Inbox. However, more complex multi-step procedures (e.g., "for all decisions made last week, create a Confluence page summary and email it to the team") might require chaining several commands or integrating with workflow automation tools. We are investigating a visual or natural language workflow builder that would allow users to script multi-step routines for the AI agent (some early prototypes use a prompt-based "if-this-then-that" style configuration). Ensuring the agent can follow these reliably and safely is future work.
- Generality vs. Specificity: SCA is tailored to executive intelligence in a single organization. One might ask: can it generalize patterns across organizations, or is each deployment learning only its environment? Cross-organization learning (like fine-tuning the Insights Agent on many companies' data) could improve robustness (e.g., learning general patterns of how decisions are stated). However, data privacy concerns mean we cannot simply mix data from different orgs. A possible future direction is federated learning or privacy-preserving meta-learning, where the model learns common structures of meetings without exposing any raw data externally. This would require careful design to meet enterprise privacy bars, so it's on the roadmap once we have sufficient deployments and a way to abstract patterns.
- Accuracy of Transcription and Extraction: SCA inherits any errors from upstream processes. If speech-to-text transcription has a high error rate (say due to heavy accents or technical jargon), the Insights Agent might log incorrect decisions or miss them. Our current approach relies on having relatively good transcripts (we use top-tier ASR and allow custom vocabulary). In noisy environments or for non-English meetings, performance may degrade. Future work could involve confidence scoring on transcript segments and flagging low-confidence areas for manual correction (perhaps by a meeting assistant who double-checks key points). Additionally, the NLP extraction of decisions/actions is not perfect; we aim to continuously fine-tune it with new data and possibly incorporate user validation (e.g., a meeting facilitator could quickly verify the AI-captured action items at the end of a meeting, which would greatly improve quality).
- Scalability and Latency: As the Decision Ledger grows (potentially tens of thousands of entries over years) and the document corpus is huge, we need to ensure queries remain fast. We already use indices to good effect; further, we might need to archive or summarize older data (though keeping it is valuable for historical questions). The architecture supports sharding by time or project to scale out. Also, using larger models for generation can be slow; we are exploring distilling some capabilities into smaller models to use for quick responses, only falling back to big LLMs for very complex queries. This multi-tier agent approach is an area of active development.
Despite these limitations, we believe the core idea of SCA is durable. It aligns with fundamental needs in organizational decision-making: having a reliable record of what was decided, ensuring everyone has context at their fingertips, and bridging the gap between knowledge and action. The architecture's model-agnostic nature means it should be able to incorporate future advances in AI (e.g., if new multimodal models can analyze video recordings for decisions, or if improved reasoning models can validate plans even better). In implementing SCA, we aimed to provide a blueprint for enterprise AI systems that treat structure and provenance as first-class citizens, rather than an afterthought.
Conclusion
In this work, we presented Signal–Context Architecture (SCA), a novel AI architecture that turns the chaos of daily organizational communications into executive clarity. SCA achieves this by splitting intelligence gathering into two streams: Signals, capturing the live "what just happened" moments and structuring them into a Decision Ledger; and Context, distilling the organization's knowledge into a cited evidence store. A Deep Research Agent then unifies these layers to answer questions or draft narratives with unprecedented specificity and trustworthiness – providing not just answers, but also the origin of those answers (the event) and the justification (the evidence). This design directly addresses the shortcomings of naive LLM applications in the enterprise: it handles temporality, tracks ownership, ensures actions are not lost, and outputs verifiable information.
In one sentence, SCA turns conversation chaos into executive-grade answers by structuring live signals, validating them against trusted context, and delivering responses with full "receipts." It is model-agnostic and future-proof, enabling organizations to plug in the best AI models of today or tomorrow. We demonstrated the SCA concept, architecture, and an initial implementation, and showed through examples and early evaluation how it can answer both pinpoint and broad strategic queries that are difficult for existing methods. As enterprises increasingly seek to leverage AI for decision support, we hope SCA provides a path forward that is practical, auditable, and effective.
Moving ahead, we plan to refine SCA with more automation (recommendations, workflows) while maintaining its core principles of structure and validation. We will also report more comprehensive evaluation results as we gather them. We invite the community to explore similar dual-stream designs in other domains where context and real-time data must be combined (such as intelligence analysis, legal case preparation, or healthcare management). By sharing this work, we aim to spark further research into AI architectures that respect the complexity of real-world data and provide human-centric, trustworthy assistance in organizational settings.
