What is the AI-102 certification, what six skill areas does the current (Dec 2025+) exam measure, and what is its retirement status as of mid-2026?
AI-102 (Designing and Implementing a Microsoft Azure AI Solution) earns the Azure AI Engineer Associate credential. The Dec 23 2025 update reorganized it into six functional groups: Plan and manage an Azure AI solution; Implement generative AI solutions; Implement an agentic solution (new, 5-10%); Implement computer vision solutions; Implement natural language processing solutions; and Implement knowledge mining and information extraction solutions (now includes Content Understanding). The exam is proctored, 100 minutes, passing score 700/1000, with case-study/interactive items. Critically: AI-102 retires June 30 2026; its successor is AI-103 (Developing AI Apps and Agents on Azure), which deepens agentic/Foundry content.
Trace the naming history: what were 'Cognitive Services' and 'Azure AI Studio' renamed to, and what is the platform called as of 2026?
Azure AI Studio became Azure AI Foundry at Ignite 2024, then Microsoft Foundry at Ignite 2025 (formalized in the January 2026 Product Terms) — positioned as the cross-product 'AI app and agent factory' spanning Azure, Microsoft 365, and Dynamics, not just Azure. Separately, the old Cognitive Services / Azure AI Services capabilities were rebranded Foundry Tools at Ignite 2025. The underlying capabilities (multi-region resources, single endpoint/key, projects, model catalog) are continuous; only branding and the unified resource provider changed. AI-102 docs and study materials lag the names, so expect mixed terminology on the exam.
A team wants one billing/management surface and a single key to call Vision, Language, and Speech. Which resource type should they provision, and what's the tradeoff versus single-service resources?
Provision a multi-service Azure AI Services account. It exposes one endpoint, one key, and one billing line across Vision, Language, Speech, Document Intelligence, etc., simplifying management and RBAC. The tradeoff: a single-service resource (e.g., a standalone Speech resource) isolates cost tracking, regional placement, throttling/quota, and access per capability, and some features or free tiers exist only on single-service resources. Choose multi-service for convenience and consolidation; single-service for granular cost, security, and quota isolation.
Explain the difference between semantic ranking, vector search, and keyword/hybrid search in Azure AI Search, and when each is appropriate.
Vector search retrieves by embedding similarity (e.g., cosine/HNSW over a Collection(Edm.Single) field), capturing meaning beyond keywords — good for paraphrase and multilingual recall. Keyword (BM25) search matches terms precisely — good for exact tokens, codes, names. Hybrid runs both and fuses results with Reciprocal Rank Fusion (RRF), giving the best recall in practice. Semantic ranking is a second-stage (L2) re-ranker — a Microsoft language model that re-scores the top BM25/hybrid candidates and emits captions/answers. Best RAG pattern: hybrid retrieval plus semantic ranker on top. Semantic ranking is a separately billable feature.
Document Intelligence: distinguish prebuilt, layout, and custom models, and explain when custom neural vs custom template is correct.
Prebuilt models (invoice, receipt, ID, W-2, etc.) extract known fields with no training. Layout extracts text, tables, selection marks, and structure but no field semantics. Custom models extract fields you define from your own forms. Custom template (formerly 'form') models suit highly consistent, fixed-layout forms, train on as few as ~5 documents, and are fast/cheap. Custom neural models handle structured, semi-structured, and varying layouts, need more samples and training time, but generalize far better. Choose template for rigid identical layouts; neural when layouts vary or documents are semi-structured. Composed models route across multiple custom models.
What does Azure AI Content Safety provide, and how do its severity levels and thresholds interact with Azure OpenAI's built-in content filter?
Content Safety detects harmful content across four categories — hate, sexual, violence, self-harm — returning a severity level (0-7, surfaced in buckets like safe/low/medium/high) for text and images, plus Prompt Shields (jailbreak/indirect-injection detection), groundedness detection, and protected-material detection. Azure OpenAI deployments include a content filter built on the same engine; you create configurable policies setting a threshold (low/medium/high) per category for both prompt and completion, and can request modified/abuse-monitoring exceptions. Standalone Content Safety moderates non-OpenAI content or applies policy before/after any model. A higher threshold is more permissive (blocks only higher severity).
Scenario MCQ: you must extract a non-standard 'PO Reference' field, not in the prebuilt invoice model, from vendor invoices. Cheapest correct approach? (A) Train a custom neural model from scratch on all fields (B) Use prebuilt-invoice and add a small custom model only for the missing field (C) Use Layout and regex (D) Fine-tune Azure OpenAI
Best answer: (B). Use the prebuilt invoice model for its many standard fields and train a small custom extraction (neural) model for the non-standard PO Reference, merging the two results — so you don't re-label every field. (A) wastes labeling effort and loses prebuilt accuracy. (C) Layout returns geometry/tables but no field semantics, so regex is brittle. (D) Azure OpenAI fine-tuning is the wrong, far costlier tool for structured field extraction. Document Intelligence is purpose-built and cheaper here.
Compare Azure AI Vision Image Analysis, Custom Vision, and Face: which fits 'detect manufacturing defects on a proprietary part', 'caption arbitrary photos', and 'verify a returning customer's identity', and what governance gate applies?
Proprietary defect detection → Custom Vision (or a Foundry custom image classification/object-detection model): defects aren't in any general taxonomy, so train on your own labeled images. Caption arbitrary photos → AI Vision Image Analysis 4.0 (dense captions, tags, OCR/Read, smart crops) — pretrained, no training. Identity verification → the Face service (detection, verification, identification, liveness). Governance gate: Face identification/verification and liveness are Limited Access features requiring an approved Microsoft application and use-case attestation under the Responsible AI standard; some capabilities (e.g., emotion/face attributes inference) were retired for fairness reasons.
For Azure AI Speech: contrast batch transcription, real-time speech-to-text, and Custom Speech, and how would you improve accuracy for a medical dictation product?
Real-time STT streams audio over the SDK/WebSocket for low-latency live captions and commands. Batch transcription submits stored audio files (via the Batch API plus storage) for asynchronous, high-throughput transcription of large volumes — no live interaction. Custom Speech adapts the base model to your domain using domain audio + transcripts and/or plain-text and structured (pronunciation, phrase) data, deployed to a custom endpoint. For medical dictation: build a Custom Speech model with a domain dataset of medical terms and drug names plus a pronunciation file and representative audio+transcripts; deploy a custom endpoint; optionally add phrase lists at request time for short-term biasing. This sharply raises recognition of specialized vocabulary.
#speech#custom-speech#batch-transcription#stt
Advancedconcept
In Azure AI Search RAG, what is the push-vs-pull indexing distinction and what does integrated vectorization add?
Pull (indexer-based) indexing has AI Search crawl a data source (Blob, ADLS, SQL, Cosmos) on a schedule, running a skillset to chunk, OCR, and enrich. Push indexing means your app sends documents directly via the REST/SDK index API — more control, no built-in scheduling. Integrated vectorization extends the pull pipeline: AI Search chunks text (Split skill) and calls an embedding model (AzureOpenAIEmbedding skill) during ingestion, and vectorizes the query at search time via a configured vectorizer — so you never manage embeddings yourself. Use it to remove custom embedding/chunking code and keep query- and index-time embedding models consistent.
A bank must run an Azure OpenAI workload with no data traversing the public internet and no inbound public access. Outline the networking controls and what each accomplishes.
Set public network access to Disabled and create a Private Endpoint, which projects the service into your VNet via a private IP; pair it with a Private DNS Zone so the resource FQDN resolves to that private IP. Egress from compute (App Service/AKS/VMs) reaches it over the VNet, optionally through ExpressRoute/VPN — no public internet. Use Managed Identity plus Microsoft Entra (Cognitive Services OpenAI User role) instead of API keys, and disable local key auth. Enforce with Azure Policy. For inter-service calls (e.g., AI Search → OpenAI) use shared private links. Customer-managed keys cover encryption at rest.
Your RAG chatbot occasionally cites facts absent from the retrieved passages. Which Azure responsible-AI tooling addresses this, and how?
This is ungrounded generation (hallucination). Use Content Safety groundedness detection, which checks whether the completion is supported by the supplied grounding sources and flags ungrounded sentences (with optional reasoning). In Foundry, the evaluation framework adds Groundedness, Relevance, Coherence, Retrieval, and Fluency evaluators (LLM-as-judge and NLP metrics) to measure RAG quality offline and via online evaluation in production. Engineering mitigations: constrain the system prompt to answer only from context, return citations, lower temperature, and gate or append a disclaimer when groundedness is low. The model proposes; deterministic groundedness checks dispose.
A chat workload averages 30 requests/min at ~1,500 prompt + 500 completion tokens each. How would you size a Standard (PayGo) vs Provisioned Throughput (PTU) deployment, and what does each quota actually limit?
Per minute that's 30×2,000 = 60,000 tokens/min plus 30 RPM. Standard deployments are governed by Tokens-Per-Minute (TPM) and a derived Requests-Per-Minute (RPM ≈ 6 per 1,000 TPM), so request TPM comfortably above 60K (e.g., 90-120K) to absorb bursts; exceeding it returns HTTP 429 and you back off. Provisioned Throughput reserves dedicated capacity in PTUs, giving predictable latency and a guaranteed throughput floor billed hourly or via reservation regardless of usage — sized from peak TPM, model, and prompt/response shape using Microsoft's capacity calculator. Use PTU for latency-sensitive steady high volume; Standard for spiky or low volume.
#azure-openai#ptu#tpm#quota#capacity-planning
Advancedsystem-design
In Azure AI Language, contrast Conversational Language Understanding (CLU), Custom Question Answering (CQA), and orchestration. How do you build an assistant that answers FAQs and also triggers actions?
CLU is intent + entity classification for utterances (the successor to LUIS) — map 'book a flight' to a BookFlight intent with entities, then your code performs the action. Custom Question Answering builds a knowledge base from docs/URLs/FAQ pairs and returns the best answer with confidence (successor to QnA Maker). For an assistant doing both, use an orchestration workflow project: a top-level orchestrator routes each utterance to the right connected CLU project (actions) or CQA knowledge base (FAQ). You publish the orchestrator as one endpoint, avoiding a monolithic model that conflates chit-chat, FAQ, and commands.
Scenario MCQ: a solution calls Azure OpenAI from Azure Functions and the security review demands no secrets in code/config. Which is correct? (A) Store the key in Key Vault and read at startup (B) Use system-assigned Managed Identity with the Cognitive Services OpenAI User role and Entra token auth (C) Use a SAS token (D) Hardcode and rotate monthly
Best answer: (B). Assign the Function a system-assigned Managed Identity, grant it the Cognitive Services OpenAI User RBAC role on the Azure OpenAI resource, disable local key auth, and acquire Entra tokens via DefaultAzureCredential — no secret exists anywhere. (A) Key Vault removes secrets from code but you still manage/rotate a long-lived key and grant Vault access; better than (C)/(D) but Managed Identity is the recommended keyless pattern. (C) SAS tokens don't apply to Azure OpenAI. (D) violates the requirement. Managed Identity + RBAC is the preferred answer.
Staff-level: an AI Search RAG system has high recall but users complain top answers are subtly wrong. Diagnose chunking, embedding-model mismatch, and re-ranking as root causes, and give the fix for each.
First instrument with Foundry retrieval/groundedness evaluators to separate retrieval failure from generation failure. Chunking: oversized or arbitrarily split chunks dilute embeddings and split answers across boundaries — fix with smaller, semantically coherent chunks, overlap, and enough per-chunk context (titles/section). Embedding mismatch: if index-time and query-time embedding models or dimensions differ (e.g., reindexed with a new model but the vectorizer wasn't updated), vectors live in incompatible spaces — re-embed the corpus and pin one model/version for both. Ranking: pure vector top-k surfaces near-duplicates; enable hybrid plus the semantic ranker so an L2 re-ranker promotes the genuinely relevant passage. Then validate answer quality, not just recall.
#rag#chunking#embeddings#reranking#debugging
Expertconcept
Expert curveball: why does enabling the semantic ranker often improve a hybrid-search RAG pipeline more than just increasing the vector top-k, and what's the retrieval-vs-ranking tradeoff?
Increasing top-k improves recall (the right passage is somewhere in the candidate set) but worsens precision at the positions the LLM actually reads, because embedding cosine similarity is a coarse first-stage signal that conflates topical and answer-bearing relevance, and RRF only fuses ranks — it doesn't understand the query. The semantic ranker is a cross-encoder-style L2 model that jointly attends to query and passage, scoring true relevance and pushing the answer-bearing chunk into the top few — exactly the slots that fit the context budget. So the bottleneck isn't recall, it's ordering: a cheap high-recall first stage feeding a precise but expensive re-ranker over a small candidate set is the classic retrieve-then-rerank design that beats brute-force k.