FAQ

    Everything we get asked about AI engagements.

    107 questions, sorted by topic — services, hiring, agents, evaluations, security, cost, model selection, and industry patterns. Search or jump to a section.

    About AISD

    What does AISD do?

    AISD is an AI-native software development company that builds production AI for mid-market and enterprise teams. Three core services: AI Modernization (embedding AI into existing products — copilots, intelligent search, predictive analytics), AI Agents (autonomous workflows for support, document processing, sales outreach), and AI Workflow Automation (n8n, Zapier, Make, Clay).

    How is AISD different from a typical software development agency?

    Three differences. First, every AISD engineer is senior — minimum 5 years building production software, with shipped AI features. Second, we publish hourly engagement bands and project ranges so you know roughly what an engagement costs before the first call. Third, we take fewer concurrent projects so a partner stays close to delivery.

    Is AISD SOC 2 / GDPR / HIPAA compliant?

    GDPR: yes — we handle EU personal data under standard data-processing agreements and apply data-minimization patterns (redaction at source, retention windows, right-to-erasure tooling). SOC 2: Type II audit in progress. HIPAA: we deliver HIPAA-aligned engagements (BAAs available, PHI handling patterns established) but do not yet hold a third-party HIPAA attestation. We will not claim certifications we don't hold.

    Engagement & contracts

    How does AISD's contract structure work?

    Three engagement models. Fixed-price for AI MVPs and agent builds where scope is well-defined after a discovery sprint — the most common model. Time-and-materials for staff augmentation, billed monthly with a not-to-exceed (NTE) ceiling so spend is predictable. Retainer for ongoing optimization, eval-harness operations, and managed AI services — flat monthly fee for a defined scope of capacity. We pick the model that matches the work, not the one that maximizes our margin.

    What does engagement onboarding look like?

    Week 0: paperwork (MSA + SOW or order form, BAAs / DPAs as needed) and access provisioning to your Slack, GitHub, Linear, and any production systems we'll need read access to. Week 1: discovery sprint — domain interviews, success metrics, the riskiest assumption surfaced and validated with a throwaway prototype. Week 2 onwards: build, with weekly demos and async daily updates in your channel. No surprise onboarding tax — Week 1 produces working artifacts, not slideware.

    Who owns the IP we build together?

    You do. By default our SOWs assign all custom code, prompts, eval datasets, model artifacts, and documentation to you on full payment. AISD retains rights to generic tooling and reusable internal libraries that pre-existed the engagement; we'll call those out specifically in the SOW. If your legal team needs custom IP terms (e.g., joint-development arrangements, exclusivity in a vertical), we negotiate them before the SOW is signed.

    How does AISD handle confidentiality and NDAs?

    Mutual NDA on day one — happy to sign yours or use our short-form. We treat client data as confidential by default; engineers work in client-provisioned environments where possible, and on dedicated AISD infrastructure with project-level isolation otherwise. Customer logos, case studies, and testimonials are opt-in only — we never publish proof of work without written approval.

    Can AISD work with non-US customers?

    Yes. We engage US, Canadian, UK, EU, Australian, and other English-speaking markets directly, billing in USD by default (EUR/GBP available on request). For EU customers we operate under standard contractual clauses (SCCs) and apply data-residency patterns where required. For customers outside English-speaking markets we partner case-by-case based on language and time-zone fit.

    How do you handle change requests mid-engagement?

    Two patterns. For fixed-price engagements: small adjustments (under ~10% of original scope) absorbed without paperwork. Larger changes are quoted as a delta against the original SOW; you sign the delta or descope something to fit. For T&M: change requests are just retasks; we update Linear and continue. We track scope changes weekly so neither side is surprised at month-end.

    What happens if we pause or end an engagement early?

    30-day notice on T&M and retainer engagements. For fixed-price builds, you pay for work completed plus a percentage of remaining scope as a kill fee — typically 20%, negotiable. We do clean handoffs: documentation, runbooks, eval datasets, prompt versions, deployment instructions. No vendor lock-in — your team can take over without us in the room.

    How does AISD handle multi-vendor environments?

    Often. Customers frequently have an existing dev shop, a data team, an ML group, and us. We're explicit about ownership in the SOW (e.g., AISD owns the agent service; your team owns the data pipeline; the existing dev shop owns the frontend). Weekly cross-team sync if needed. We don't try to displace functioning teams — we slot in where AI engineering is the missing capability.

    Cost & pricing

    How do we control LLM inference costs in production?

    Five levers, applied on every AISD engagement. Prompt caching (30–90% reduction when system prompts are stable). Model routing — easy queries to Haiku/mini/Flash, hard reasoning to Opus/GPT-5 (40–70% reduction). Output discipline — schemas constrain output length; output tokens cost 3–5× input tokens (20–50% reduction). Batch APIs for non-realtime workloads (50% off at OpenAI / Anthropic). Embedding-first retrieval instead of long-context loading (60–95% reduction vs. stuffing context every call).

    What does AISD charge for a discovery sprint?

    Discovery sprints are typically 1–2 weeks at $15,000–$30,000. Output: architecture proposal, eval harness design, fixed-price build SOW, working throwaway prototype on the riskiest technical assumption. If you proceed with the build, the discovery fee is credited toward the build cost. If you walk away, you keep the deliverables and a senior engineer's honest take on whether the project should ship at all.

    How does AISD price T&M vs fixed-price work?

    Fixed-price for AI MVPs ($45–120k) and agent builds ($40–150k) where scope is clear after discovery. T&M for staff augmentation: $95–155/hour depending on engineer seniority, with a monthly NTE ceiling. Retainer for ongoing ops: $8–25k/month for a defined capacity scope. We publish indicative bands on the pricing page so buyers can budget before the first call.

    What are the hidden costs of running AI agents at scale?

    Five usually-underestimated cost lines. Inference at peak (not average — autoscaling cost spikes). Eval harness runs (every PR triggers a model-graded test suite). Vector DB hosting (small for 1M docs, large for 100M+). Observability (LangSmith / Langfuse / DataDog seats grow with team). Prompt regression effort (engineering time to keep prompts stable across model upgrades). Budget ~25% of inference cost for these collectively.

    How do AISD's hourly rates compare to traditional dev shops?

    AISD's $95–155/hour band is mid-market for senior US/EMEA engineers — below boutique consultancies (often $180–300/hr), above offshore shops ($25–80/hr), comparable to top US dev shops. The differentiator isn't the hourly rate — it's velocity. AI-native engineers ship in 4–8 weeks what generic dev teams ship in 4–8 months because they don't need to learn the AI patterns on your time and dollar.

    Security & compliance

    What's AISD's security and compliance posture?

    GDPR: yes, we handle EU personal data under standard DPAs with data-minimization patterns (redaction at source, retention windows, right-to-erasure tooling). HIPAA: HIPAA-aligned engagements with BAAs available; we don't yet hold a third-party HIPAA attestation. SOC 2 Type II: audit in progress. We will not claim certifications we don't hold; if a customer's procurement requires something we don't have, we say so before the call ends.

    How does AISD handle customer data during engagements?

    Default pattern: engineers work in customer-provisioned environments (your AWS, GCP, Azure account) so data never leaves your perimeter. When that's not possible, we use AISD-managed infrastructure with project-level isolation, encryption at rest and in transit, no cross-customer data sharing. Engineers are NDA-bound; access is provisioned per-engagement and revoked at engagement end. We do not train models on customer data.

    Does AISD support on-prem or air-gapped deployments?

    Yes. For regulated workloads (healthcare PHI, banking core systems, classified environments) we run open-weight models (Llama, Mistral, Qwen) on dedicated infrastructure inside the customer's perimeter — no data leaves. For air-gapped: we deliver on physical media or via approved internal repositories. Inference, vector DB, and observability stack all run on-prem when required.

    How does AISD secure prompts against injection?

    Four layers. Input sanitization at the boundary — user-controlled text is escaped, prompt-injection signatures are flagged, and untrusted content is wrapped in clear delimiters. Privilege separation — agents see only the tools and data scopes their task requires. Output validation — structured-output schemas enforce shape; downstream side effects only fire if the output passes validation. Adversarial test suite — a corpus of known injection patterns runs in CI on every prompt change.

    What does the audit log capture for production AI?

    Every model call: timestamp, user ID, session ID, full prompt, full response, tool calls and their arguments, model name and version, latency, cost, outcome (success / refusal / error). Logs are searchable, retention-windowed, and exportable for compliance review. For regulated industries we add field-level redaction so PII / PHI / PCI never lands in the audit store.

    How does AISD prevent model hallucinations from causing downstream damage?

    Three patterns. Structured outputs — every model response is parsed against a schema; freeform answers don't drive side effects. Confidence thresholds — actions require not just an answer but evidence (cited document, tool-call result). Human-in-the-loop on irreversible actions — refunds, account changes, large transactions, clinical decisions are AI-suggested, human-confirmed. Plus eval harness regression tests catch hallucination patterns before they reach production.

    Engagement process

    What is AISD's discovery sprint?

    A paid 1–2 week engagement to scope a project with rigor before committing to a full build. Outputs: a written scope doc with success metrics, a technical architecture, a 1-week throwaway prototype that proves the riskiest assumption, and a fixed-price quote for the build. Typical price: $8,000–$18,000. Customers who run a discovery sprint with us are 3× more likely to ship on time and budget than customers who skip it.

    How does pricing work — fixed-price, T&M, or retainer?

    All three. Fixed-price for AI MVPs and agent builds where scope is well-defined after a discovery sprint. Time-and-materials for staff augmentation, billed monthly with a not-to-exceed ceiling. Retainer for ongoing optimization, eval-harness operations, and managed AI services — flat monthly fee for a defined scope of capacity.

    How are deliverables handed off?

    Every engagement ends with a handoff package: production deployment, architecture documentation, eval harness with golden test sets, observability dashboards with documented thresholds, on-call runbook, model upgrade procedure, and a recorded walkthrough. Plus a 30-day post-handoff window for questions and clarifications at no cost.

    How is success measured on an engagement?

    Success metrics are defined in writing during scoping and reviewed monthly. Project engagements measure: feature shipped on date, eval-harness pass rate, target business metric (e.g. 'auto-resolve rate ≥35% on customer-support tickets'). Staff augmentation engagements measure: PR throughput, code-review acceptance, and customer-side satisfaction. We do not measure success in hours billed, lines of code, or generic velocity points.

    AI consulting

    What does AISD's AI consulting include?

    AI consulting at AISD covers four work streams. Strategy: AI roadmap aligned to business outcomes, prioritized by ROI and feasibility. Architecture: model selection, retrieval pattern, agent orchestration, eval design — opinionated based on what actually works in production. Build-vs-buy: clear decisions on which problems to solve with off-the-shelf AI, custom builds, or no AI at all. Audit: review of existing AI workloads for cost, reliability, prompt-injection exposure, and ROI.

    How is AI consulting different from AI development?

    Consulting produces decisions and plans; development produces working software. AISD does both, often in sequence: a consulting engagement scopes the architecture and roadmap, then a build engagement implements it. Consulting alone is right when you're early in the AI journey, evaluating vendors, or auditing existing work. Build alone is right when scope is already clear. Most AISD customers do a 2-week paid discovery sprint first — that's a consulting engagement that produces a fixed-price build proposal.

    What does an AI consulting engagement cost?

    Two formats. Discovery sprint: $8,000–$18,000 for 1–2 weeks, produces a written architecture, throwaway prototype, and fixed-price build quote. Strategic engagement: $25,000–$75,000 for 4–8 weeks, produces a 12-month AI roadmap, prioritized initiative list, and architecture for the top three. Both are paid up front, fixed-scope. We do not run open-ended advisory retainers with no deliverables.

    AI modernization

    What is AI modernization?

    AI modernization is embedding AI capabilities into existing software products — not replacing the product, augmenting it. Common patterns: in-product copilots (chat assistants scoped to user data), intelligent search (replacing keyword search with hybrid semantic + structured retrieval), summarization and digesting at the right surfaces, predictive analytics, and agentic workflows for power users. The product gains AI features without a ground-up rewrite.

    How does AISD approach AI modernization vs. a generic software upgrade?

    Three differences. We pick AI features by metric impact, not by what's technically interesting — every shipped feature has a predeclared metric and a rollback plan. We build an eval harness on day one; AI features that lack an eval harness regress silently. We instrument cost, latency, and outcome from launch — generic upgrades treat observability as Phase 2; AI features can't afford that gap because their cost spirals are silent and fast.

    What does the difference between a copilot and an agent look like in practice?

    A copilot suggests; the user accepts. GitHub Copilot, Notion AI, and Linear's smart assignee are copilots — they propose actions inside an existing user workflow. An agent acts autonomously across multiple steps and tools toward a goal — a customer-support agent that reads a ticket, queries the order API, drafts a refund, and posts back is an agent. Copilots are lower-risk and faster to ship; agents capture more business outcome but require more discipline (eval harness, observability, guardrails).

    What does AI modernization typically replace?

    Five common targets. Keyword search becomes hybrid semantic + structured retrieval. Static onboarding tours become conversational guidance. FAQ articles become in-product copilots. Manual triage queues become AI-routed with human exceptions. Reporting dashboards become natural-language analytics ('what was MRR last quarter by segment?'). The product surface stays mostly the same; the intelligence layer becomes 10× more useful.

    AI MVP

    What's in scope vs out of scope for an AI MVP?

    In scope: one core AI feature (agent, copilot, or workflow), deployed to a beta cohort, with eval harness, basic observability, and a fixed-price build. Out of scope by default: deep custom infrastructure, multi-tenant rearchitecture, novel model training, or production-scale ops (we hand off ops to your team or to an AISD retainer post-launch). Discovery sprint pins the scope before the SOW is signed.

    How does AISD validate MVP traction post-launch?

    Pre-registered metric (set in discovery), instrumented before launch, monitored as the staged rollout proceeds. Beta cohort of 50–500 representative users. Compare to holdout. Decision gate at the end of the MVP build: Continue (the metric moved positively, scale up); Iterate (signal is mixed, run a second short build); Stop (the feature didn't move the metric — this is OK; pre-registered failure is cheaper than a year of wishful thinking).

    What's AISD's approach to MVP architecture decisions?

    Boring stack, opinionated patterns. TypeScript or Python (whichever your team owns). Postgres + pgvector (or your existing vector DB). Anthropic / OpenAI APIs by default with prompt caching. Vercel or your existing cloud. We avoid novel infrastructure unless the MVP specifically requires it — every additional system is an additional thing to operate. The MVP is a vehicle for proving the use case, not for showcasing AISD's architecture taste.

    Does AISD do continued post-MVP work?

    Three options. (1) Hand-off to your team — full documentation, runbooks, prompt versions, eval datasets, and a 2-week handoff sprint. (2) Retainer for ongoing optimization — flat monthly fee for prompt iteration, eval-harness operations, model upgrades. (3) Build the next feature — fixed-price scope of work two on top of the MVP. Most customers pick (2) for the first 3–6 months post-launch.

    AI agents

    What is an AI agent?

    An AI agent is software that uses a language model to plan and take multi-step actions toward a goal, calling tools (APIs, databases, other systems) along the way. The minimal pattern: a model + a set of tools + a control loop. Unlike a chatbot — which responds and waits — an agent acts, observes the result, and decides what to do next, often across dozens of steps.

    What's the difference between an AI agent and a chatbot?

    A chatbot turns user input into a response and stops. An agent turns user input into a plan, executes that plan by calling tools, observes the results, and revises until the goal is met or it asks for help. A chatbot answering 'what's my order status' reads from a knowledge base. An agent handling the same query queries the orders API, checks the shipping system, identifies a delay, drafts a refund request, posts it to the ticket queue, and emails the customer.

    What's the difference between agentic AI and generative AI?

    Generative AI is a capability: producing text, images, code, audio. Agentic AI is an architectural pattern that uses generative AI to drive autonomous, multi-step action with tools. All agentic AI uses generative AI under the hood; not all generative AI is agentic. A summarization endpoint is generative but not agentic. A customer-support agent that reads tickets, looks up orders, and posts replies is both. The agentic pattern is what unlocks measurable business outcomes.

    How long does it take to build a production AI agent?

    Working prototype: 2 weeks. Production-grade agent (with eval harness, guardrails, observability, and a runbook): 6–10 weeks. The prototype-to-production gap is where most projects fail — the prototype handles the happy path; production has to handle the long tail.

    What does it cost to build an AI agent?

    A production AI agent at AISD typically costs $40,000–$150,000 depending on complexity. Drivers: number of integrated systems, evaluation rigor required, compliance overhead, and ongoing operational scope. Prototypes alone are cheaper ($10k–$25k) but rarely worth it without a path to production.

    Where do AI agents fail in production?

    Four predictable failure modes. Tool errors: an API the agent calls is down or returns unexpected data and the agent doesn't recover gracefully. Prompt injection: user-controlled text reaches the agent and overrides its instructions. Cost spirals: an agent that loops without termination conditions burns inference budget. Distribution shift: input patterns change after launch and the agent's prompts no longer match reality. Mitigations: strict tool-call schemas, prompt-injection test suites in CI, cost caps, and weekly eval re-runs.

    How do you evaluate AI agent performance?

    Three layers of measurement. Offline: a golden test set of 50–500 representative inputs scored automatically (model-graded) and by humans on a sample. Run on every PR. Online: per-call metrics — latency, cost, tool-call success rate, schema-validation pass rate, downstream business outcome. Human-in-loop: weekly review of escalated and low-confidence cases, fed back into the test set.

    Should I use n8n, LangGraph, or build from scratch?

    It depends on workflow shape and team. n8n wins when the agent is mostly orchestrating SaaS tools and the control flow is straightforward — deploys faster, easier for non-engineers to maintain. LangGraph wins when the agent has complex branching, multi-agent coordination, or needs tight Python integration with custom code. From scratch wins for simple, high-volume agents where every layer of abstraction is overhead.

    Workflow automation

    When should I use n8n vs Zapier vs Make?

    Zapier wins on simplicity and breadth — 6,000+ integrations, near-zero learning curve, good for marketers and non-engineers. Pricing scales aggressively with volume. Make wins on visual orchestration of medium-complexity flows — better than Zapier on conditional logic, cheaper at volume. n8n wins on engineer-grade workflows, self-hosting, custom code nodes, and AI-native features. Default rule: Zapier for <5 step flows owned by non-engineers, Make for medium complexity, n8n for engineer-owned production workflows.

    How do you secure workflow automations against prompt injection?

    Five layers. Input sanitization — strip or quarantine instruction-like text from user-controlled fields. Privilege separation — agents that read untrusted content cannot directly call high-privilege tools. Tool-call confirmation — high-stakes actions require human approval or a separate verification step. Output validation — every tool call's arguments validated against a strict schema; anomalies fail closed. Adversarial test suite — a CI test set of known prompt-injection attacks runs on every release.

    Legacy modernization

    How does AISD approach legacy modernization differently from a generic dev shop?

    Three differences. AI-accelerated refactor — Claude / Cursor / equivalent tooling drives 30–50% faster code translation than manual rewrites, especially on procedural-to-OO and Cobol/PL/SQL migrations. Eval harness on legacy behavior — we capture current-system inputs and outputs, then run them through the new system as a regression suite, so equivalence is provable, not asserted. Strangler-fig deployment — old and new systems run side-by-side; traffic shifts incrementally so rollback is one-click.

    Can AISD modernize a Cobol / mainframe / legacy system?

    Yes — and we recommend a discovery sprint first. Cobol-to-Java/Python translation is now reliably AI-accelerated, but the hard part isn't translation — it's understanding the implicit business logic encoded in 30 years of patches. We extract that logic via behavioral capture (run the legacy system on a representative input set, snapshot outputs) and turn it into the eval harness for the new system. Translation without that step produces brittle, untestable replacements.

    When does AISD recommend legacy modernization vs. living with the legacy system?

    Three signals that say modernize: (1) the legacy system blocks AI features you need to ship — frontier capabilities can't be bolted onto certain stacks; (2) the SME pool is shrinking faster than the system's remaining useful life — Cobol expertise, certain mainframe stacks; (3) operational cost (licensing + ops) is now larger than rebuild + ongoing ops. Two signals that say live with it: the system works, and the modernization scope is fundamentally a re-platforming exercise without business outcome.

    AI integration

    How does AISD integrate AI with existing identity / SSO / RBAC?

    Standard pattern: AI features inherit the existing auth model. SSO (Okta, Auth0, Azure AD) provides the user identity; the AI agent receives a scoped token, queries the data layer with that token, and the existing RBAC enforces what the user can access. The model never sees data the user couldn't already see. Audit logs propagate the user identity through every model call so usage and policy violations are attributable.

    How does AISD handle data integration for AI features?

    Three layers. Source-of-truth stays in the existing system (Postgres, Snowflake, Salesforce, your CRM). Read-only integration first — we pull data via your existing APIs or replication, never through direct DB access we don't own. Write paths go through your existing service layer with full validation — AI features don't get to bypass business rules. Data minimization at every boundary; AI sees only the fields needed for the task.

    When to pick AI integration vs AI modernization?

    AI integration is connecting AI to systems you already own — making your stack 'AI-aware.' AI modernization is embedding AI features into a product. Picking: if the goal is internal productivity (operations, support, sales), integration first — connect AI to existing tools, prove value, then expand. If the goal is product differentiation (customers see and use the AI), modernization — embed AI into the user-facing surface from day one.

    AI software development

    How long does it take to build an AI MVP?

    Most AI MVPs at AISD ship a usable version in 4–8 weeks. Week 1 is a discovery sprint. Weeks 2–6 are the build, with weekly demos and a working version by week 4. Weeks 7–8 harden, document, and hand off.

    What does an AI MVP cost?

    AISD AI MVPs typically range $45,000–$120,000 depending on scope. Drivers: number of model integrations, complexity of retrieval/data layer, custom UI surface area, and compliance requirements. We publish indicative bands on the pricing page so buyers can budget before the first call.

    What's the difference between RAG, fine-tuning, and agents?

    RAG (retrieval-augmented generation) grounds a model's response in external data — used when answers must be current or proprietary. Fine-tuning changes model weights to teach a specific style or domain — used when prompts can't reliably elicit the behavior. Agents wrap a model with tools and a control loop so it can take multi-step action — used when the task involves decisions and side-effects, not just generation.

    How do you ensure AI features are reliable in production?

    Five layers: an offline eval harness with golden test sets run on every PR; confidence thresholds and structured-output validation that gate downstream side effects; runtime observability — every model call logged with inputs, outputs, latency, cost; circuit breakers and deterministic fallbacks for every model dependency; and a weekly review ritual where prompt regressions get caught before they become incidents.

    How do you handle hallucinations in production AI?

    Hallucinations are the wrong mental model — the issue is ungrounded generation. Mitigations applied in layers: ground every factual claim in retrieved sources, returned alongside the answer; structured outputs with schema validation; confidence scoring with thresholds — low-confidence answers are escalated, not surfaced; human-in-the-loop checkpoints for high-stakes actions; continuous eval against a golden set.

    Model selection

    Which LLM does AISD pick for production agents?

    Default for the agent control loop: Claude Sonnet 4.6 with prompt caching. Default for classification, routing, short responses: Claude Haiku 4 or Gemini 2.5 Flash. Reserved for hard reasoning steps: Claude Opus 4 or GPT-5 frontier. Open-weight models (Llama, Mistral, Qwen) earn their place when volume, latency, or data sovereignty drives the decision. We pick per-step, not per-product — different parts of the same agent often run different models.

    When does AISD recommend self-hosting open-weight models?

    Three triggers, only one needs to be true. Volume: above 5–10M tokens/day on a single model, dedicated GPU economics start beating hosted APIs. Latency: dedicated infrastructure gives stable first-token latency under 200ms; hosted APIs vary. Data sovereignty: regulated workloads (PHI, banking, classified) where data cannot leave the perimeter. Below those thresholds, frontier APIs (Claude, GPT, Gemini) win on quality, ecosystem, and developer velocity.

    Does AISD work with Anthropic, OpenAI, Google, or open-source?

    All four. We're model-agnostic by design — no exclusive partnerships that bias the recommendation. Our default stack uses Anthropic (best tool-use reliability, prompt caching), OpenAI (broadest ecosystem, structured outputs), and Google (long context, cheapest at volume). Open-weight (Llama, Mistral, Qwen) for self-hosted scenarios. We pick per workload, document the rationale, and design for portability so model lock-in is bounded.

    How does AISD handle model deprecation?

    Three patterns. Eval harness covers prompts model-by-model — when a new model ships, we re-run the harness and quantify quality / cost / latency delta before any cutover. Abstraction layer in code — model identifiers are config, not hardcoded; switching providers is a config change, not a refactor. Prompt portability — we avoid model-specific prompt syntax where reasonable. Net effect: model upgrades are weeks of engineering, not quarters of replatforming.

    What's AISD's approach to multi-model routing?

    Two-layer router. Upstream classifier (small, fast, cheap — often a fine-tuned BERT or a Haiku/Flash call) decides which downstream model handles the query. Downstream models specialize: Haiku for FAQ deflection, Sonnet for tool-using agents, Opus for complex reasoning, code-tuned models for code workflows. Cost savings typically 40–70% vs. naively running every query through the most capable model. Eval harness covers each route end-to-end.

    Evals & measurement

    What does AISD's eval harness look like?

    Three layers. Golden test set: 50–500 representative inputs with expected outputs (or scoring rubrics for open-ended responses). Automatic scoring: exact match, embedding similarity, LLM-as-judge with a calibrated rubric, custom assertions for domain logic. CI integration: every PR runs the harness and blocks merges that degrade aggregate quality on key metrics. Plus a manual review sample (5–10% of evals) reviewed weekly to catch drift the automatic scorers miss.

    How do you evaluate AI features without ground truth?

    Four techniques, used in combination. LLM-as-judge with explicit rubrics (calibrated against human-rated samples). Pairwise comparison between model outputs (judge picks the better answer). User signal collection (thumbs up/down, did-the-action-succeed metrics). Synthetic ground truth: a senior engineer rates a sample, we use those ratings to train a lightweight classifier, the classifier scales to thousands of evals. None of these are perfect; using them together gets you 90% of the way to robust eval.

    How do you catch prompt regressions in CI?

    Every prompt change is a PR. CI runs the eval harness against the new prompt vs the previous one and blocks merge if any guarded metric drops below threshold. Guarded metrics include: pass rate on golden set, per-category accuracy (especially edge cases), cost per call, p95 latency, refusal rate (false-positive refusals are surprisingly common after prompt edits). Alerting in production catches anything CI missed.

    How does AISD measure ROI of an AI feature?

    Pre-register the metric in the discovery sprint — the actual business outcome the feature should move (handle time, conversion rate, NPS, cost-per-resolution, throughput per FTE). Instrument it before launch. Roll out staged (1% → 10% → 50% → 100%) with the metric as the rollout gate. Compare to a holdout cohort. Negative results are accepted and rolled back. We don't ship 'AI features' that don't move a metric — that's how AI projects rot into vanity demos.

    Hiring engineers

    How much does it cost to hire an AI developer?

    Three pricing paths. AISD staff augmentation (senior, AI-native): $95–175/hour depending on seniority and engagement length. Marketplace contract (Toptal, Turing, Upwork): wide range — $40–200/hour with high variance in quality. Full-time hire (US): total comp typically $200k–$450k for a senior AI engineer. The hidden cost in hires is recruiting (3–6 months) and ramp (2–3 months). Staff augmentation pays off when you need impact in <90 days.

    What's AISD's vetting process?

    Every AISD engineer passes four gates. Technical screen — live problem-solving on AI engineering tasks (not generic LeetCode). System design — they design a production AI system end-to-end with one of our principals. Reference check — past clients confirm shipped production work. Paid trial sprint — a real, scoped piece of work with our team before the engineer faces a customer. Roughly 3% of applicants pass all four.

    Insurance

    What AI use cases work for insurance companies today?

    Five proven use cases for P&C and life carriers. FNOL (first notice of loss) triage agents — categorize and route incoming claims, cut handle time 30–60%. Claims document processing — extract structured data from policy docs, medical records, and adjuster notes. Underwriting copilots — surface risk signals and policy precedents during quote review. Customer-service deflection — agents that resolve policy and billing questions without escalation. Fraud-signal scoring — combine claim narratives with structured data to flag suspicious claims for manual review.

    How does AISD handle PHI / PII / regulatory requirements in insurance?

    Three patterns. Data minimization: agents see only the fields they need; everything else is redacted at the boundary. Audit logging: every model call is logged with inputs, outputs, and decision rationale — searchable and exportable for regulatory review. On-prem / VPC deployment: where state regulations or carrier policy requires, we run open-weight models on dedicated infrastructure with no data leaving the customer's perimeter. We deliver HIPAA-aligned engagements; SOC 2 Type II audit is in progress.

    What's the typical ROI for an AI agent in insurance?

    Outcomes vary by use case. Typical AISD insurance customer outcomes: FNOL triage reduces processing from 45 minutes to 8 minutes per claim (~80% reduction). Document extraction cuts adjuster review time 30–50%. Customer-service deflection resolves 25–40% of inbound queries without human handoff. Payback period: 4–9 months on the build cost. ROI is highest when the workflow has high volume, structured input/output requirements, and a measurable downstream metric (claims-cycle time, NPS, loss ratio).

    Fintech

    What AI use cases work for fintechs today?

    Five proven patterns. Fraud-signal scoring (combine transaction features + narratives, output explainable risk scores). Document processing (KYC, loan docs, ID verification → structured fields with audit trails). Customer-service deflection (resolve balance / transfer / dispute queries without human handoff). Underwriting copilots (surface comparable cases and risk signals to underwriters during review). Compliance monitoring (flag anomalies, auto-draft SARs). Highest-ROI is whichever has the highest volume + structured outputs.

    How does AISD handle PCI / SOC 2 / banking-regulator constraints?

    Three patterns. PII / PCI redaction at the boundary — agents see only the fields they need; PANs and SSNs never reach the model. Audit-grade logging — every decision logged with rationale, exportable for examiners. VPC / on-prem deployment when state regulators or bank policy require it; we run open-weight models on dedicated infra with no data leaving your perimeter. SOC 2 Type II audit in progress; HIPAA-aligned engagements available.

    What's typical ROI for an AI build at a fintech?

    Outcomes by use case. Fraud scoring: 30–50% reduction in false positives + faster reviewer disposition. Document processing: 40–70% reduction in operations review time. Customer-service deflection: 25–40% auto-resolve. Build cost typically pays back in 4–9 months on the volume use cases. We recommend starting with a 2-week discovery sprint scoped to a single use case before committing to a full build.

    Healthcare

    What healthcare AI use cases does AISD ship?

    Six patterns shipped to production. Clinical document agents (ambient scribe-style structured documentation from encounters). Patient onboarding + scheduling agents (24/7 conversational intake). RCM agents (eligibility, prior auth, claims appeals). Provider copilots (surface relevant prior cases, drug interactions, guidelines). Quality-of-care monitoring (parse notes for adherence to care pathways). Customer-service deflection for member services.

    How does AISD handle HIPAA / PHI?

    BAAs available; PHI handling patterns established. Data-minimization at every boundary — agents see only the fields needed. Field-level audit logging. On-prem / VPC deployment for the most regulated workloads, with open-weight models on dedicated infrastructure. SOC 2 Type II audit in progress; we deliver HIPAA-aligned engagements but do not yet hold a third-party HIPAA attestation.

    What outcomes have AISD healthcare customers seen?

    Documented outcomes from our healthcare engagements: 47% reduction in doctor documentation time on a clinical document agent. 10–15% increase in patients served via AI-driven scheduling. 35% reduction in no-shows. 85% reduction in scheduling wait times. 40% reduction in admin staff burden. Read /case-studies/healthcare-ai-agent for the full story.

    SaaS

    What AI features should a SaaS company ship first?

    Highest-ROI patterns: in-product copilot (context-aware help, scoped to user's data), intelligent search (replace keyword with hybrid retrieval and citations), AI-powered onboarding (reduce time-to-first-value), summarization at edges (long docs, thread digests, change summaries), agentic workflows for power users (multi-step tasks the user describes in natural language). Pick one with measurable user-engagement impact, ship behind a feature flag, measure via cohort retention and engagement.

    How do you build AI features without breaking the existing app?

    Layered approach. Add the AI capability behind a feature flag, scoped to a beta cohort. Deploy the model behind a service boundary with cost caps and rate limits. Validate via offline eval first (golden test set on representative inputs), then online metrics. Roll out percentile by percentile — 1%, 10%, 50%, 100% — watching for cost, latency, satisfaction. Roll back if any metric goes the wrong way. Standard staged-deployment hygiene applied to AI.

    Should we use frontier APIs or self-host an open model?

    Default to frontier APIs (Claude, GPT, Gemini) until you have evidence to justify the operational cost of self-hosting. Frontier wins on reasoning, tool use, and continuous improvement; self-hosted wins on per-call cost at very high volume, latency control, and data sovereignty. Most SaaS workloads — copilots, search, summarization — are better served by frontier APIs with prompt caching and model routing.

    E-commerce

    What AI use cases work for e-commerce?

    Five proven patterns. Product-catalog RAG (natural-language search that understands attributes, occasions, and intent). Personalized recommendations (collaborative filtering + LLM re-ranking on context). Customer-service deflection (returns, order status, sizing, simple disputes). Catalog enrichment (LLM-generated descriptions, attribute extraction from supplier feeds). Voice-of-customer analysis (synthesize reviews into product-team-ready insight). Highest impact tends to be search + recommendation when SKU count is high.

    How do you measure AI impact on e-commerce metrics?

    Three layers. Product metrics: search-result CTR, add-to-cart rate, conversion rate, AOV. Quality metrics: relevance ratings on a labeled sample, hallucination rate on facts. System metrics: latency p95, cost per session. Experimentation discipline: pre-register hypotheses, run A/B with sufficient power, accept negative results. We bake all three into the eval harness before launch.

    Real estate

    What AI use cases work for real estate brokerages and property managers?

    Six proven use cases. Listing description agents generate MLS-compliant copy from photos + facts in under a minute (70% time saved). Lead-qualification bots run 24/7 on web + SMS, capturing intent and routing to the right agent. Lease + offer document processing extracts structured fields from applications and addenda. Comparable market analysis (CMA) generation produces first drafts from a property address. Tenant communication agents handle maintenance, billing, and lease questions at 25–40% auto-resolution. Investment portfolio analytics surface NOI and occupancy trends across properties.

    How do you keep AI-generated listing copy fair-housing compliant?

    Three layers. Prompts include explicit fair-housing constraints and protected-class rules — configurable per jurisdiction. Output validators run discriminatory-language classifiers before any copy is published; flagged outputs are returned to the agent for regeneration. Audit logs capture every generation with the input, output, and validator results — exportable for state DRE or DOJ review. Real-estate AI without fair-housing engineering is a compliance risk; we engineer for it from day one.

    What's the typical ROI for AI in real estate?

    Outcomes vary by use case. Listing description automation saves 70% of agent time per listing. Lead-qualification bots increase qualified-appointment volume 30–50% by handling 24/7 intake. CMA agents cut prep time from hours to minutes — proposal turnaround drops dramatically. Tenant communication deflection reduces inbound support volume 25–40%. Payback: typically 3–6 months on the build cost.

    Can AI replace real estate agents?

    No, and we don't recommend trying. Real estate is a relationship business; the AI's job is to give agents 70%+ of their time back so they can focus on relationships, negotiations, and complex transactions. We build agents that draft listings, qualify leads, process documents, and answer routine tenant questions — leaving the high-judgment work to human agents who close deals.

    Manufacturing

    What AI use cases work for manufacturing today?

    Six proven use cases. Predictive maintenance — sensor data + LLM-augmented anomaly detection cuts unplanned downtime 20–40%. Vision-based quality inspection — replaces or augments human inspectors, especially on high-volume lines. Production scheduling agents — optimize line balance and changeovers across SKUs. Demand forecasting with macro signals (weather, news, commodity prices). Document processing for SOPs, work instructions, supplier contracts. Operator copilots — natural-language access to MES / ERP data on the factory floor.

    Does AISD support edge / air-gapped manufacturing deployments?

    Yes. Many manufacturing environments are air-gapped or have strict OT/IT separation. We deploy open-weight models on dedicated edge hardware (NVIDIA IGX, on-prem GPUs) with no data leaving the plant. Eval harness, vector DB, and observability stack all run inside the perimeter. Connectivity to cloud frontier APIs is optional — used only for non-sensitive use cases or batch retraining windows.

    How does AISD integrate with MES / SCADA / ERP?

    Read-only first via standard interfaces (OPC-UA, MQTT, REST). We respect plant-floor protocols — AI features never write directly to PLCs or safety systems. Write-back goes through the existing MES/ERP layer with full validation. Identity and audit propagate through; if the operator who initiated an action is in the ERP audit log, the AI's contribution is too. We've integrated with SAP, Oracle, AVEVA, Rockwell, and homegrown ERPs.

    What's the typical ROI for manufacturing AI?

    Outcomes vary by use case. Predictive maintenance: 20–40% reduction in unplanned downtime, 6–12 month payback on retrofitting an existing line. Vision QA: 30–60% reduction in escapes (defects reaching customer), payback in 3–9 months on volume lines. Document processing: 40–60% reduction in operations time on SOPs, work orders, supplier docs. Demand forecasting: 5–15% reduction in inventory carrying cost on volatile SKUs.

    Logistics & supply chain

    What AI use cases work for logistics and supply chain?

    Five proven use cases. Route optimization with real-time signals (traffic, weather, customs delays). Demand forecasting at SKU × node level. Warehouse picking optimization — agent-driven path planning and slot recommendations. Document processing — bills of lading, customs declarations, freight invoices, proofs of delivery. Shipment-visibility agents that aggregate data from carriers and surface exceptions before they become customer issues. Carrier rate optimization across spot and contract.

    How does AISD handle logistics' fragmented tech stack?

    We assume fragmentation. Most logistics customers run a TMS, a WMS, multiple carrier portals, customs brokers, and homegrown spreadsheets. Pattern: build an integration layer that pulls from each system through whatever APIs / file drops / EDI feeds exist, normalize into a unified schema, and expose AI features on top. The AI agent never tries to be the system of record — it surfaces signals across systems and surfaces exceptions back into the existing tools.

    What's the typical ROI for AI in logistics?

    Outcomes vary by use case. Route optimization: 5–15% mileage reduction, fuel + time savings. Document processing: 40–70% reduction in clearance / processing time per shipment. Carrier rate optimization: 3–8% on freight spend at scale. Shipment visibility: 30–50% reduction in customer-service inbound on 'where's my order' queries. Payback periods: 4–12 months depending on volume and integration complexity.

    Professional services

    What AI use cases work for professional services firms (consulting, accounting, advisory)?

    Six proven use cases. Proposal generation from RFPs + firm precedent — 40–60% reduction in proposal time. Knowledge management — engagement memory across consultants, projects, and methodologies. Contract review and negotiation copilots. Client communication agents that draft updates from project status data. Resource allocation optimization across staff and engagements. Due diligence acceleration — document review at 3–5× human speed.

    How does AISD handle client confidentiality in professional services AI?

    Hard-walled per-client data isolation. Each engagement's data lives in a separate namespace; agents working on Client A's matter cannot retrieve, summarize, or be influenced by Client B's data. Audit logs make this verifiable. For firms with strict matter-management requirements (legal, M&A advisory), we deploy with on-prem or VPC isolation. Confidentiality clauses propagate through every model call — confidential markers in source documents flow to outputs.

    What's the typical ROI for AI in professional services?

    Two outcomes dominate. Realization rate — AI accelerates deliverables, so the same engagement scope is completed in less consultant hours, lifting realized rate per hour 15–30%. Win rate — AI-accelerated proposals are higher quality and faster to send, lifting RFP win rates 5–15% in competitive situations. Plus knowledge-management value: previously-tribal expertise becomes searchable, accelerating new staff ramp-up.

    Concepts & learn

    What's the simplest way to start building an AI agent?

    Pick one workflow with high volume, structured input/output, and a clear success metric. Map the tools the agent will need (each is an integration). Build the eval harness on day 1 — 50–500 representative inputs scored automatically. Pick orchestration based on workload, not novelty (single-loop ReAct or plan-and-execute beats multi-agent for most cases). Ship a prototype in 2 weeks, productionize in 6–10. Read /learn/what-is-agentic-ai for the full primer.

    What do AI agents do?

    An AI agent uses a language model to plan and take multi-step actions toward a goal, calling tools (APIs, databases, other systems) along the way. The minimal pattern: a model + a set of tools + a control loop. Unlike a chatbot — which responds and waits — an agent acts, observes the result, and decides what to do next, often across dozens of steps.

    What is the difference between agentic AI and generative AI?

    Generative AI is a capability: producing text, code, images, or audio from a prompt in a single inference pass. Agentic AI is an architectural pattern: a system that uses a language model to plan and take multi-step actions toward a goal, calling tools (APIs, databases, other systems) along the way. Every agentic system uses generative AI as its reasoning engine. Not every generative system is agentic — a summarizer is generative, not agentic; a customer-support agent that reads tickets, queries order APIs, and issues refunds is both.

    Is ChatGPT agentic AI or generative AI?

    ChatGPT in its base form is generative: you prompt it, it responds, it stops. When ChatGPT uses plugins or function calling — searching the web, running code, reading files — it becomes agentic: it's taking actions with tools in a loop. The underlying LLM is the same; the architecture around it is what changes it from generative to agentic.

    What can agentic AI do that generative AI can't?

    Agentic AI can take actions across external systems, not just generate content. It can query databases, call APIs, write to CRMs, send emails, run code, and loop across those actions until a goal is met. Generative AI alone produces text or other content and stops — a human has to take the output and do something with it. The ROI gap is significant: generative AI improves individual productivity; agentic AI automates entire workflows.

    Should I start with generative AI or agentic AI?

    Start with generative if the workflow is new and your team isn't ready to own a long-running agent in production. Generative systems are easier to deploy, cheaper to run, and simpler to evaluate. Graduate to agentic once you understand the inputs, success metrics, and failure modes. Most production ROI lives in the agentic pattern — but a generative prototype is often the fastest way to prove the use case before committing to agent infrastructure.

    Question not answered?

    Talk to a senior AI engineer.

    A 30-minute call gets you direct answers on your specific use case — and an honest take on whether AISD is the right partner.