AI / LangChain4j interview questions

1. What is LangChain4j and what problem does it solve for Java developers? 2. What are the core modules of LangChain4j? 3. What is the AI Services feature in LangChain4j and how do you define one? 4. How does ChatMemory work in LangChain4j and what types are available? 5. What is Retrieval-Augmented Generation (RAG) in LangChain4j and how do you build a pipeline? 6. What are Tools in LangChain4j and how does tool calling work? 7. How do you integrate LangChain4j with Spring Boot? 8. What is the EmbeddingModel in LangChain4j and which providers are supported? 9. What EmbeddingStores does LangChain4j support and how do you choose one? 10. What is document splitting in LangChain4j and why is it necessary? 11. What is the @SystemMessage and @UserMessage annotation in LangChain4j AI Services? 12. How does streaming work in LangChain4j and when should you use it? 13. What is the ContentRetriever and RetrievalAugmentor in LangChain4j advanced RAG? 14. How does LangChain4j handle structured output from LLMs? 15. What is the PromptTemplate in LangChain4j and how does it differ from @UserMessage? 16. What LLM providers does LangChain4j support and how do you switch between them? 17. What is an Agent in LangChain4j and how does it differ from a simple AI Services call? 18. How do you implement multi-turn conversation with memory per user in a Spring REST API using LangChain4j? 19. What is the ImageModel in LangChain4j and which providers support image generation? 20. How do you handle errors and retries in LangChain4j? 21. How do you test LangChain4j AI Services without making real LLM API calls? 22. What is the DocumentLoader API in LangChain4j and what sources does it support? 23. What is the @Moderate annotation in LangChain4j and how does content moderation work? 24. How does LangChain4j support vision (multi-modal) LLMs that accept images as input? 25. What is the difference between synchronous and asynchronous execution in LangChain4j? 26. What is LangChain4j's support for Quarkus and how does it differ from Spring Boot integration? 27. How does LangChain4j implement the ReAct agent pattern and what are its limitations? 28. What is the ModerationModel interface in LangChain4j and how can you implement a custom one? 29. What is the Tokenizer interface in LangChain4j and why does it matter for memory management? 30. How do you persist ChatMemory across application restarts in LangChain4j? 31. What are the best practices for prompt engineering within LangChain4j AI Services? 32. How does LangChain4j integrate with observability tools like OpenTelemetry? 33. What is the InMemoryEmbeddingStore and when should you migrate to a real vector database? 34. What are common LangChain4j anti-patterns to avoid in production applications? 35. How does LangChain4j support multi-modal input processing for audio or documents beyond text and images? 36. How do you implement a custom Tool with complex parameter types in LangChain4j? 37. What is the HypotheticalDocumentEmbedder (HyDE) technique and how does LangChain4j support it? 38. How do you handle LLM output parsing failures gracefully in LangChain4j? 39. What is LangChain4j's support for graph-based RAG or knowledge graph integration? 40. What is the LangChain4j EvaluationResult API and how do you measure RAG pipeline quality?

Could not find what you were looking for? send us the question and we would be happy to answer your question.

1. What is LangChain4j and what problem does it solve for Java developers?

LangChain4j is a Java library that brings the capabilities of large language models (LLMs) into the Java ecosystem in a structured, type-safe, and production-friendly way. Before LangChain4j, Java developers who wanted to integrate GPT, Gemini, Mistral, or any other LLM into their applications had to write HTTP clients, manage JSON serialization manually, build prompt templates from scratch, and figure out how to chain multiple AI calls together — all without any standardized pattern.

LangChain4j solves this by providing a unified abstraction layer over dozens of LLM providers (OpenAI, Azure OpenAI, Anthropic, Google Vertex AI, Ollama, Mistral, HuggingFace, and more), a clean interface for building conversational memory, RAG pipelines, and tool-calling agents, and — most distinctively — an AI Services pattern that lets you declare AI behavior as a plain Java interface, completely eliminating boilerplate prompt construction code.

It is the Java equivalent of Python's LangChain/LlamaIndex ecosystems, but built idiomatically for the JVM: strongly typed, annotation-driven, Spring Boot and Quarkus compatible, and deeply integrated with Java's dependency injection patterns. The library is actively maintained and has become the de facto standard for enterprise Java teams embedding LLM capabilities into existing Spring applications.

Which LangChain4j feature lets you declare AI behavior as a plain Java interface without writing prompt construction boilerplate?AI Services

✓ Well done — AI Services is LangChain4j's signature feature: annotate a Java interface and the library generates the implementation.

ChatMemory

✗ Try again — ChatMemory manages conversation history, not the interface-as-AI-service pattern.

EmbeddingStore

✗ Try again — EmbeddingStore is for vector similarity search in RAG pipelines, not AI service declaration.

PromptTemplate

✗ Try again — PromptTemplate builds parameterized prompts but still requires wiring code. AI Services removes even that.

What makes LangChain4j different from simply writing HTTP calls to OpenAI's REST API directly in Java?It only supports OpenAI and no other providers

✗ Try again — LangChain4j actually supports dozens of providers. Its value is the unified abstraction, not vendor lock-in.

It provides a unified abstraction over many providers, typed AI Services, memory, RAG, and agent patterns without manual HTTP/JSON handling

✓ Well done — the library replaces all that low-level plumbing with clean, composable abstractions.

It executes AI calls faster than raw HTTP by using binary protocols

✗ Try again — LangChain4j does not use binary protocols; the performance benefit is developer productivity, not raw throughput.

2. What are the core modules of LangChain4j?

LangChain4j is organized into several Maven modules so you only pull in what you actually need. The main ones you will encounter in real projects are:

LangChain4j Core Modules
Module	Artifact ID	Purpose
Core	langchain4j-core	Interfaces and abstractions (ChatLanguageModel, EmbeddingModel, ChatMemory, etc.) — no provider-specific code
Main	langchain4j	High-level features: AI Services, PromptTemplate, RAG pipeline components, chains, tools
Provider starters	langchain4j-open-ai, langchain4j-anthropic, etc.	One module per LLM provider — concrete implementations of core interfaces
Embedding stores	langchain4j-chroma, langchain4j-pgvector, langchain4j-pinecone, etc.	Vector database integrations for RAG
Document loaders	Built into main module	FileSystemDocumentLoader, UrlDocumentLoader, AmazonS3DocumentLoader, etc.
Spring Boot starter	langchain4j-spring-boot-starter	Auto-configuration, bean injection, properties binding for Spring applications
Quarkus extension	quarkus-langchain4j	CDI integration and native compilation support for Quarkus

The design intentionally separates interfaces (core) from implementations (provider modules) so your application code can remain provider-agnostic. If you start with OpenAI and later want to switch to Anthropic or an on-premise Ollama instance, you swap the dependency and update a few configuration properties — the AI Services interface code stays unchanged.

Which LangChain4j module contains only interfaces and abstractions with no provider-specific code?langchain4j

✗ Try again — langchain4j (without -core) is the main module containing high-level features like AI Services and RAG components.

langchain4j-core

✓ Well done — langchain4j-core holds pure interfaces like ChatLanguageModel and EmbeddingModel without any vendor implementation.

langchain4j-open-ai

✗ Try again — langchain4j-open-ai is a provider-specific module implementing the core interfaces for OpenAI.

If you want to switch from OpenAI to Anthropic in a LangChain4j Spring Boot project, what needs to change?Rewrite all AI Services interface methods to use Anthropic-specific annotations

✗ Try again — AI Services interfaces are provider-agnostic; no rewriting is needed for the interface layer.

Swap the provider dependency and update configuration properties — the AI Services interface code remains unchanged

✓ Well done — provider abstraction means only the Maven dependency and application.properties/yaml need updating.

Create a new Spring Bean factory for each Anthropic model class

✗ Try again — Spring Boot auto-configuration handles bean creation; you just configure the new provider's properties.

3. What is the AI Services feature in LangChain4j and how do you define one?

AI Services is the flagship abstraction in LangChain4j. The idea is simple but powerful: you write a plain Java interface, annotate its methods with LangChain4j annotations that describe what each method should do with the LLM, and the library generates a working implementation at runtime using JDK dynamic proxies. You never write prompt-construction or HTTP-calling code — that is all handled by the generated proxy.

A minimal example:

import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;

interface CodeReviewer {

    @SystemMessage("You are a senior Java developer. Review code concisely.")
    @UserMessage("Review this code snippet for bugs and style issues: {{code}}")
    String review(String code);
}

// Wire it up
CodeReviewer reviewer = AiServices.builder(CodeReviewer.class)
    .chatLanguageModel(model)
    .build();

// Use it like any Java object
String feedback = reviewer.review("public void foo() { int x = 1/0; }");

The interface method can return String for raw text, a custom POJO for structured output (LangChain4j adds JSON extraction instructions automatically), TokenStream for streaming, or AiMessage for full response metadata. You can also inject ChatMemory into the service for conversational state, add @Tool-annotated methods to the same class for function calling, and mix multiple retrieval augmentors for RAG — all declared at the builder level, none of it in your interface methods.

How does LangChain4j AI Services generate the working implementation of a Java interface at runtime?It compiles a new Java class using javac at application startup

✗ Try again — LangChain4j uses JDK dynamic proxies at runtime, not compile-time code generation.

It creates a JDK dynamic proxy that intercepts method calls and routes them to the configured LLM

✓ Well done — the proxy intercepts each method call, assembles the prompt from annotations and parameters, calls the LLM, and parses the response.

It uses Spring AOP AspectJ weaving to intercept annotated interface methods

✗ Try again — LangChain4j does not require Spring AOP; it uses standard JDK dynamic proxies.

What return type should an AI Services method use to receive structured Java objects from the LLM response?Always String — structured output must be parsed manually

✗ Try again — LangChain4j supports returning custom POJOs directly; it adds JSON extraction instructions to the prompt automatically.

Only Map<String, Object> for dynamic schema

✗ Try again — you can return strongly typed POJOs, not just Maps, for better type safety.

A custom POJO class — LangChain4j appends JSON format instructions to the prompt and deserializes the response automatically

✓ Well done — returning a POJO triggers automatic JSON schema generation and response deserialization without any manual parsing code.

4. How does ChatMemory work in LangChain4j and what types are available?

ChatMemory in LangChain4j is the component responsible for maintaining conversation history across multiple exchanges with an LLM. Without it, every call to the model is stateless — the model has no knowledge of what was said in previous turns. ChatMemory solves this by accumulating the message history and injecting it into each subsequent LLM request.

LangChain4j ships two built-in ChatMemory implementations:

MessageWindowChatMemory — Keeps the last N messages (by message count). When the window is full, the oldest messages are dropped to make room for new ones. Simple and predictable, but a very long first user message might push out important context.
TokenWindowChatMemory — Keeps messages up to a maximum token count. Requires a tokenizer (model-specific) to count tokens accurately. More precise than message count for managing context window limits of the underlying LLM.

// Message-window memory — keep last 10 messages
ChatMemory memory = MessageWindowChatMemory.withMaxMessages(10);

// Token-window memory — stay under 4096 tokens
ChatMemory memory = TokenWindowChatMemory.builder()
    .maxTokens(4096, new OpenAiTokenizer(GPT_3_5_TURBO))
    .build();

// Inject into AI Services for automatic history management
Assistant assistant = AiServices.builder(Assistant.class)
    .chatLanguageModel(model)
    .chatMemory(memory)
    .build();

For multi-user applications where each user needs isolated memory, LangChain4j provides ChatMemoryProvider — a factory that returns a memory instance per memory ID. The memory ID is typically the user session ID or user account ID, passed as an annotated parameter on the AI Services method.

Which ChatMemory implementation is better for precisely managing LLM context window limits?TokenWindowChatMemory — it counts actual tokens using a model-specific tokenizer

✓ Well done — token-based windowing is more accurate for context management since LLM limits are defined in tokens, not message count.

MessageWindowChatMemory — it always fits within the LLM's token limit

✗ Try again — message count is an approximation. A single very long message can still exceed the token limit even with a small message window.

Both are equivalent — they use the same underlying token counting logic

✗ Try again — MessageWindowChatMemory counts messages (not tokens); TokenWindowChatMemory uses a model-specific tokenizer to count tokens precisely.

How do you provide separate, isolated memory for different users in a LangChain4j AI Services application?Create a new AI Services instance per user request

✗ Try again — creating AI Services instances per request is wasteful. ChatMemoryProvider handles per-user isolation cleanly at the memory layer.

Use ChatMemoryProvider with a memory ID (e.g., user session ID) so each user gets their own memory instance

✓ Well done — ChatMemoryProvider is the idiomatic LangChain4j pattern for multi-tenant memory isolation.

Prefix every user message with the user ID to simulate separate memory

✗ Try again — string prefixes contaminate the actual conversation; ChatMemoryProvider manages true per-user memory stores.

5. What is Retrieval-Augmented Generation (RAG) in LangChain4j and how do you build a pipeline?

RAG (Retrieval-Augmented Generation) is the technique of enriching an LLM prompt with relevant external content retrieved from a knowledge base before asking the model to generate a response. It solves the core limitation of LLMs — their knowledge is frozen at training time — by dynamically injecting up-to-date or domain-specific content at inference time.

In LangChain4j, a RAG pipeline has two distinct phases:

Ingestion phase (run once or periodically): Load documents → split into chunks → embed each chunk → store vectors in an EmbeddingStore.

Retrieval phase (at query time): Embed the user query → similarity-search the EmbeddingStore → inject top-K relevant chunks into the prompt → call the LLM.

// --- Ingestion ---
EmbeddingModel embeddingModel = new OpenAiEmbeddingModel.Builder()
    .apiKey(apiKey).modelName("text-embedding-ada-002").build();

EmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();

List<Document> docs = FileSystemDocumentLoader.loadDocuments("./docs");
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
    .documentSplitter(DocumentSplitters.recursive(500, 50))
    .embeddingModel(embeddingModel)
    .embeddingStore(store)
    .build();
ingestor.ingest(docs);

// --- Retrieval at query time via AI Services ---
interface Assistant {
    String answer(String question);
}

Assistant assistant = AiServices.builder(Assistant.class)
    .chatLanguageModel(chatModel)
    .contentRetriever(EmbeddingStoreContentRetriever.from(store))
    .build();

String answer = assistant.answer("What are our refund policies?");

LangChain4j also supports advanced RAG patterns like query compression, re-ranking with a cross-encoder, and multiple content retrievers that are combined via a DefaultRetrievalAugmentor. These address quality issues in naive RAG implementations where retrieved chunks are too generic or poorly ranked.

In LangChain4j's RAG pipeline, what happens during the ingestion phase?Documents are loaded, split into chunks, embedded, and stored in an EmbeddingStore

✓ Well done — ingestion is the one-time preparation step: load → split → embed → store.

User queries are embedded and matched against pre-stored LLM responses

✗ Try again — query embedding happens at retrieval time, not during ingestion.

The LLM is fine-tuned on the provided documents

✗ Try again — RAG does not modify the LLM's weights. It injects content at inference time through the prompt.

Which LangChain4j class handles the end-to-end ingestion pipeline (splitting, embedding, and storing)?ContentRetriever

✗ Try again — ContentRetriever handles the query-time retrieval side, not ingestion.

EmbeddingStoreIngestor

✓ Well done — EmbeddingStoreIngestor wires together the splitter, embedding model, and store to process documents in one step.

DocumentLoader

✗ Try again — DocumentLoader just reads raw documents from a source. EmbeddingStoreIngestor orchestrates the full pipeline after loading.

6. What are Tools in LangChain4j and how does tool calling work?

Tools (also called function calling) give LLMs the ability to invoke real Java methods during a conversation. Instead of answering entirely from its training knowledge, the model can recognize when a specific capability is needed — fetching live data, running calculations, calling APIs — and request that the application execute a registered tool and return the result to the model for incorporation into its final answer.

In LangChain4j, tools are defined by annotating Java methods with @Tool on a plain Java object. Parameters can be annotated with @P (or @ToolParam) to provide descriptions that help the model understand when and how to use them.

class WeatherTools {

    @Tool("Returns the current weather in a given city in Celsius")
    String currentWeather(@P("City name, e.g. 'London'") String city) {
        return weatherApiService.fetchCurrent(city); // real API call
    }

    @Tool("Returns the 5-day forecast for a city")
    String forecast(@P("City name") String city,
                    @P("Number of days 1-5") int days) {
        return weatherApiService.fetchForecast(city, days);
    }
}

// Register with AI Services
TravelAssistant assistant = AiServices.builder(TravelAssistant.class)
    .chatLanguageModel(model)
    .tools(new WeatherTools())
    .build();

The flow is: user sends a message → LLM decides a tool should be called → LangChain4j intercepts the tool-use response → executes the Java method → appends the result to the conversation → re-calls the LLM with the result → LLM generates the final answer. All of this happens transparently within the assistant.chat() call. The model may call tools multiple times before producing a final answer, and LangChain4j handles those multi-step loops automatically.

What annotation marks a Java method as a callable tool in LangChain4j?@SystemMessage

✗ Try again — @SystemMessage defines a system prompt on an AI Services interface method, not a tool definition.

@Tool

✓ Well done — @Tool on a method in a plain Java object registers it as a callable function that the LLM can request to invoke.

@UserMessage

✗ Try again — @UserMessage defines the user prompt template on an AI Services interface method, not a tool.

After a LLM requests a tool execution, what does LangChain4j do with the result?Returns the tool result directly to the caller without re-calling the LLM

✗ Try again — the tool result is appended to the conversation and the LLM is called again to incorporate it into a final answer.

Appends the tool result to the conversation history and re-calls the LLM to generate the final answer

✓ Well done — this is the tool loop: tool result → re-call LLM → LLM produces a final answer informed by the real data.

Logs the result and presents the tool's raw return value as the final response

✗ Try again — raw tool output is not the final response; the LLM synthesizes the tool result into a natural language answer.

7. How do you integrate LangChain4j with Spring Boot?

LangChain4j provides a dedicated Spring Boot starter (langchain4j-spring-boot-starter) that wires everything up through standard Spring Boot auto-configuration. You add the starter plus the provider-specific starter for your chosen LLM, drop configuration into application.properties, and Spring automatically creates the ChatLanguageModel, EmbeddingModel, and related beans that you can inject anywhere in the application.

<!-- pom.xml -->
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-spring-boot-starter</artifactId>
    <version>0.32.0</version>
</dependency>
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-open-ai-spring-boot-starter</artifactId>
    <version>0.32.0</version>
</dependency>

# application.properties
langchain4j.open-ai.chat-model.api-key=${OPENAI_API_KEY}
langchain4j.open-ai.chat-model.model-name=gpt-4o
langchain4j.open-ai.chat-model.temperature=0.7
langchain4j.open-ai.embedding-model.api-key=${OPENAI_API_KEY}

For AI Services specifically, Spring Boot integration uses the @AiService annotation (or you declare a @Bean manually). LangChain4j detects annotated interfaces during component scan and creates Spring-managed proxy beans — meaning the AI service is injectable like any other Spring component:

@AiService
interface CustomerSupportAgent {
    @SystemMessage("You are a helpful customer support agent.")
    String chat(String userMessage);
}

@RestController
class SupportController {
    private final CustomerSupportAgent agent;
    SupportController(CustomerSupportAgent agent) { this.agent = agent; }

    @PostMapping("/support")
    String support(@RequestBody String message) {
        return agent.chat(message);
    }
}

What annotation in LangChain4j Spring Boot integration marks an interface to be auto-created as a Spring-managed AI Services bean?@Service

✗ Try again — @Service is a standard Spring annotation. LangChain4j uses @AiService to trigger its AI Services proxy bean creation.

@AiService

✓ Well done — @AiService tells LangChain4j's Spring Boot auto-configuration to create a Spring proxy bean for that interface.

@FeignClient

✗ Try again — @FeignClient is from Spring Cloud and creates HTTP clients. @AiService creates LLM-backed service proxies.

Where do you configure the OpenAI API key when using langchain4j-open-ai-spring-boot-starter?In application.properties under langchain4j.open-ai.chat-model.api-key

✓ Well done — the starter binds to the langchain4j.open-ai.* property namespace via Spring Boot's auto-configuration.

As a constructor argument when manually creating OpenAiChatModel beans

✗ Try again — manual bean creation is possible but the starter specifically reads from application.properties automatically.

In a @Configuration class annotated with @EnableLangChain4j

✗ Try again — there is no @EnableLangChain4j annotation. The starter auto-configures everything from properties.

8. What is the EmbeddingModel in LangChain4j and which providers are supported?

An EmbeddingModel in LangChain4j converts text into dense numerical vectors (embeddings) that capture semantic meaning. Texts with similar meanings produce vectors that are geometrically close, enabling similarity search. EmbeddingModels are used during RAG ingestion (to vectorize document chunks) and at query time (to vectorize the user's question so it can be matched against stored chunks).

The core interface is minimal by design:

public interface EmbeddingModel {
    Response<Embedding> embed(String text);
    Response<List<Embedding>> embedAll(List<TextSegment> textSegments);
}

Supported embedding model providers include:

LangChain4j Embedding Model Providers
Provider	Example Model	Notes
OpenAI	text-embedding-3-small / ada-002	Most commonly used; cloud API
Azure OpenAI	text-embedding-ada-002	Enterprise Azure deployments
Google Vertex AI	textembedding-gecko	GCP-based workloads
Ollama	nomic-embed-text, mxbai-embed	Local/on-premise, no API costs
HuggingFace	sentence-transformers models	Open-source models via HF Inference API
In-process (Onnx)	all-MiniLM-L6-v2	Embedded in the JVM — no external calls, fastest

The in-process ONNX option (langchain4j-embeddings module) is particularly useful for offline environments or when minimizing API costs: the model runs entirely within the JVM with no network calls, at the cost of slightly lower embedding quality compared to frontier models.

What is the primary use of an EmbeddingModel in a LangChain4j RAG pipeline?To generate human-readable summaries of documents before storing them

✗ Try again — summarization is a separate LLM task. EmbeddingModel converts text to numerical vectors for similarity search.

To convert text into dense vectors so documents and queries can be compared by semantic similarity

✓ Well done — embeddings enable vector similarity search: the user query is embedded, then matched against document chunk embeddings in the store.

To compress large documents so they fit within the LLM's context window

✗ Try again — document splitting handles size; EmbeddingModel produces vectors for retrieval, not compression.

Which LangChain4j embedding option runs entirely in-process inside the JVM without making any external API calls?OpenAI text-embedding-ada-002 via the langchain4j-open-ai module

✗ Try again — OpenAI embeddings always require an external HTTPS call to OpenAI's API.

The ONNX in-process model from langchain4j-embeddings — runs locally within the JVM

✓ Well done — ONNX-based embedding runs locally in-process, making it suitable for air-gapped environments and zero API cost scenarios.

Ollama models — they always run in the same JVM process as the application

✗ Try again — Ollama runs as a separate local server process; the application still makes HTTP calls to it.

9. What EmbeddingStores does LangChain4j support and how do you choose one?

An EmbeddingStore is the vector database layer in LangChain4j's RAG pipeline — it stores embedding vectors alongside their source text and metadata, and supports approximate nearest-neighbor (ANN) similarity search. LangChain4j implements a unified EmbeddingStore<TextSegment> interface across all backends, so swapping stores requires only a dependency and configuration change.

LangChain4j EmbeddingStore Options
Store	Type	Best For
InMemoryEmbeddingStore	In-memory (no persistence)	Development, unit tests, prototyping
PgVector	PostgreSQL extension	Teams already on Postgres; no separate vector DB infrastructure
Chroma	Open-source vector DB	Local dev/staging, self-hosted deployments
Pinecone	Managed cloud vector DB	Production scale, fully managed
Weaviate	Open-source / cloud	Multi-modal search, built-in vectorization
Qdrant	Open-source / cloud	High-performance filtered search
Milvus / Zilliz	Open-source / cloud	Very large-scale vector workloads
Elasticsearch	Managed / self-hosted	Teams already running ELK stack
Azure AI Search	Managed Azure service	Azure-native deployments
Redis Stack	In-memory + persistence	Low-latency, existing Redis infrastructure

For choosing: start with InMemoryEmbeddingStore during development. For production, use PgVector if you already run PostgreSQL (zero additional infrastructure), or Pinecone/Qdrant if you need a dedicated managed vector database with advanced filtering and scaling controls. The interface is identical across all stores, so the choice is purely operational.

Which EmbeddingStore should you use during unit testing or prototyping in LangChain4j?InMemoryEmbeddingStore — no external infrastructure needed

✓ Well done — InMemoryEmbeddingStore is the right choice for development and tests where persistence is not required.

Pinecone — it has a free tier suitable for prototyping

✗ Try again — while Pinecone has a free tier, requiring a cloud account and network calls makes InMemoryEmbeddingStore more practical for local testing.

PgVector — PostgreSQL is available on any developer machine

✗ Try again — PgVector requires PostgreSQL with the pgvector extension installed. InMemoryEmbeddingStore needs zero infrastructure.

If your team already runs PostgreSQL in production and wants to add RAG without new infrastructure, which EmbeddingStore is the best fit?Chroma

✗ Try again — Chroma is a separate vector DB server. PgVector extends your existing PostgreSQL instance.

PgVector — adds vector search to your existing PostgreSQL with the pgvector extension

✓ Well done — PgVector extends PostgreSQL with vector types and ANN indexes, reusing existing database infrastructure with no new systems to operate.

InMemoryEmbeddingStore — it is the most production-ready option

✗ Try again — InMemoryEmbeddingStore is not production-ready since it has no persistence. PgVector is the right answer for teams on PostgreSQL.

10. What is document splitting in LangChain4j and why is it necessary?

Document splitting (also called chunking) is the process of dividing a large document into smaller, overlapping segments before embedding and storing them in the vector database. It is a necessary step in RAG pipelines because LLMs have a fixed context window (e.g., 8K, 32K, or 128K tokens). You cannot embed an entire 200-page PDF as a single unit — you need to break it into pieces that fit comfortably in the context window while still carrying enough context to be meaningful.

LangChain4j provides several DocumentSplitter implementations:

DocumentSplitters.recursive() — Recursively splits on paragraphs, then sentences, then words, aiming to preserve semantic boundaries. This is the recommended default for most text documents.
DocumentSplitters.byParagraph() — Splits strictly at paragraph boundaries.

DocumentSplitters.bySentence()

DocumentSplitters.byWord(maxTokens) — Splits by word count up to a token limit.

// Recursive splitter: 500 token chunks, 50 token overlap
DocumentSplitter splitter = DocumentSplitters.recursive(500, 50);
List<TextSegment> segments = splitter.split(document);

The overlap parameter is critical: by repeating some tokens at the boundary of adjacent chunks, you ensure that sentences or ideas that span a chunk boundary are not lost in either chunk. Without overlap, a sentence split exactly at a boundary would appear truncated in both chunks, reducing retrieval quality. A 10-20% overlap of the chunk size is a common starting point.

Why is an overlap specified when splitting documents in LangChain4j?To prevent sentences that span chunk boundaries from being lost by repeating context at the edges of adjacent chunks

✓ Well done — overlap ensures boundary-spanning content appears complete in at least one chunk, improving retrieval quality.

To increase embedding accuracy by giving the model more tokens to work with per chunk

✗ Try again — overlap is about preserving boundary context, not increasing chunk size for embedding quality.

To reduce the total number of embeddings stored in the vector database

✗ Try again — overlap actually increases the total number of tokens stored (duplicate content). Its purpose is context preservation at boundaries.

Which DocumentSplitter is generally recommended for splitting diverse text documents in LangChain4j RAG pipelines?DocumentSplitters.byWord() — word count is the most accurate measure

✗ Try again — byWord splits mechanically by word count without respecting semantic structure. DocumentSplitters.recursive() is the recommended default.

DocumentSplitters.recursive() — it tries paragraphs, then sentences, then words to preserve semantic boundaries

✓ Well done — recursive splitting respects natural text structure, producing more coherent chunks than mechanical word-count splitting.

DocumentSplitters.byParagraph() — paragraphs are always the right unit

✗ Try again — byParagraph works well for well-formatted text but fails for dense prose or code. Recursive splitting adapts to the content structure.

11. What is the @SystemMessage and @UserMessage annotation in LangChain4j AI Services?

@SystemMessage and @UserMessage are the two prompt-definition annotations at the core of LangChain4j's AI Services pattern. Together they define what gets sent to the LLM for each method invocation, replacing all manual prompt string assembly.

@SystemMessage defines the system prompt — the persona, context, constraints, and behavioral instructions that frame the entire conversation. It is sent as the role: system message in the API request. It can be a plain string literal, or point to a classpath resource file for longer prompts.

@UserMessage defines the user turn — what gets sent as the role: user message. Method parameters are injected into the template via {{paramName}} placeholders or can be injected automatically when there is only one String parameter. If @UserMessage is omitted, the first String parameter is used as the user message verbatim.

interface Translator {

    @SystemMessage("You are a professional translator. Translate precisely without adding commentary.")
    @UserMessage("Translate the following text to {{targetLanguage}}: {{text}}")
    String translate(String text, @V("targetLanguage") String lang);
}

// Or loading from a classpath template file:
interface LegalReviewer {

    @SystemMessage(fromResource = "prompts/legal-reviewer-system.txt")
    @UserMessage("Review this contract clause: {{clause}}")
    ReviewResult review(String clause);
}

The @V annotation explicitly names a variable for injection when the parameter name differs or when there are multiple parameters. Without @V, LangChain4j uses the Java parameter name (requires compilation with -parameters flag, or the @Param annotation).

What role does the content of @SystemMessage take in the LLM API request?The system role — it sets the model's persona, context, and behavioral constraints for the conversation

✓ Well done — system messages are sent with role: system and establish the model's framing before any user input is processed.

The user role — it is prepended to the user's message automatically

✗ Try again — @SystemMessage maps to role: system, not role: user. @UserMessage maps to role: user.

A configuration setting that increases model temperature for creative responses

✗ Try again — @SystemMessage is a prompt message, not a model parameter. Temperature is configured on the ChatLanguageModel builder.

If a @UserMessage template uses {{text}} and {{targetLanguage}} placeholders, how does LangChain4j resolve them?From application.properties entries named text and targetLanguage

✗ Try again — placeholders are resolved from method parameters, not from configuration properties.

From the Java method parameters, matched by name or by the @V annotation

✓ Well done — method parameters are injected into the template at the matching {{name}} position. @V allows explicit name mapping when parameter names don't match.

By concatenating all String parameters in the order they appear in the method signature

✗ Try again — LangChain4j uses named placeholder substitution, not positional concatenation.

12. How does streaming work in LangChain4j and when should you use it?

Streaming in LangChain4j allows the LLM's response to be delivered token-by-token as it is generated, rather than waiting for the entire response to be produced before returning anything to the caller. For user-facing chat interfaces, this dramatically improves perceived responsiveness — the user sees text appearing progressively instead of staring at a loading spinner for several seconds.

LangChain4j supports streaming through two mechanisms:

1. TokenStream (AI Services) — Declare the return type as TokenStream in your AI Services interface. The caller registers handlers for each token, completion, and errors:

interface StreamingAssistant {
    TokenStream chat(String message);
}

StreamingAssistant assistant = AiServices.builder(StreamingAssistant.class)
    .streamingChatLanguageModel(streamingModel) // note: streaming model
    .build();

assistant.chat("Explain quantum entanglement")
    .onNext(token -> System.out.print(token))
    .onComplete(response -> System.out.println("\nDone. Tokens used: " + response.tokenUsage()))
    .onError(Throwable::printStackTrace)
    .start();

2. Direct StreamingChatLanguageModel — Use the lower-level interface for custom streaming logic without AI Services.

For Spring Boot applications serving a web API, the streaming response is typically connected to an SSE (Server-Sent Events) endpoint or a WebSocket. Spring WebFlux's Flux<String> integrates naturally with LangChain4j's streaming by bridging the onNext callback to a reactive publisher.

Use streaming when: building conversational UIs, generating long-form content where early tokens are already useful, or when you need to display a typing indicator. Avoid streaming for batch jobs, automated pipelines, or API calls where the complete response is needed before any processing begins.

What return type must an AI Services interface method declare to enable streaming in LangChain4j?CompletableFuture<String>

✗ Try again — CompletableFuture delivers the complete response asynchronously, not token-by-token. Streaming uses TokenStream.

TokenStream

✓ Well done — declaring TokenStream as the return type enables token-by-token delivery via onNext/onComplete/onError callbacks.

Flux<String>

✗ Try again — Flux is a Project Reactor type. LangChain4j's native streaming return type is TokenStream (though you can bridge TokenStream to Flux in Spring WebFlux).

When is streaming NOT the right choice for LangChain4j?When building a chatbot UI where the user should see text appear progressively

✗ Try again — that is exactly the right use case for streaming. It should be avoided for batch processing scenarios.

In automated batch pipelines where the complete response must be available before any further processing

✓ Well done — streaming adds callback complexity without benefit when the whole response is needed upfront for subsequent processing steps.

When generating responses longer than 1,000 tokens

✗ Try again — longer responses are actually a stronger argument for streaming (less wait time), not against it.

13. What is the ContentRetriever and RetrievalAugmentor in LangChain4j advanced RAG?

LangChain4j's advanced RAG API introduces a cleaner abstraction hierarchy above the basic EmbeddingStoreContentRetriever. The two key interfaces are ContentRetriever and RetrievalAugmentor.

ContentRetriever is the interface responsible for fetching relevant content given a query. Multiple implementations are available:

EmbeddingStoreContentRetriever — retrieves via vector similarity from an EmbeddingStore
WebSearchContentRetriever — fetches live web results (e.g., via Tavily, Google) for up-to-date information
SqlDatabaseContentRetriever — generates and executes SQL to retrieve structured data (text-to-SQL RAG)

RetrievalAugmentor is the higher-level orchestrator that sits between the user query and the LLM call. The default implementation, DefaultRetrievalAugmentor, exposes a full pipeline with configurable stages:

Query transformer — Rewrites or decomposes the original query (e.g., query compression using conversation history, or HyDE — Hypothetical Document Embeddings)
Query router — Routes queries to one or more ContentRetrievers based on the query type
Content aggregator — Merges results from multiple retrievers
Content injector — Formats retrieved content for injection into the prompt

RetrievalAugmentor augmentor = DefaultRetrievalAugmentor.builder()
    .queryTransformer(new CompressingQueryTransformer(chatModel))
    .contentRetriever(EmbeddingStoreContentRetriever.from(store))
    .contentInjector(DefaultContentInjector.builder()
        .promptTemplate(PromptTemplate.from("Context:\n{{contents}}\n\nQuestion: {{userMessage}}"))
        .build())
    .build();

What does a QueryTransformer do in LangChain4j's DefaultRetrievalAugmentor pipeline?Rewrites or decomposes the user query before retrieval — e.g., compressing conversation context into a standalone query

✓ Well done — query transformation improves retrieval quality by reformulating ambiguous or context-dependent queries into precise standalone search terms.

Translates the query to a different language for multilingual retrieval

✗ Try again — translation is not the primary purpose. QueryTransformer rewrites or decomposes queries for better retrieval accuracy.

Validates that the query complies with safety guidelines before retrieval

✗ Try again — safety filtering is a separate concern. QueryTransformer is specifically about reformulating the query for improved retrieval.

Which ContentRetriever lets you query a relational database using natural language (text-to-SQL)?EmbeddingStoreContentRetriever

✗ Try again — EmbeddingStoreContentRetriever does vector similarity search, not SQL generation.

WebSearchContentRetriever

✗ Try again — WebSearchContentRetriever fetches live web results, not database records.

SqlDatabaseContentRetriever — generates SQL from the natural language query and executes it

✓ Well done — SqlDatabaseContentRetriever uses the LLM to generate SQL from the user's question, then runs it against the database.

14. How does LangChain4j handle structured output from LLMs?

Structured output means getting the LLM to return data that maps directly to a Java object — a POJO, record, enum, or collection — rather than free-form text that you parse yourself. LangChain4j makes this transparent: declare the return type of your AI Services method as the desired Java type, and the library handles everything else.

Internally, LangChain4j uses one of two strategies depending on the provider:

JSON schema injection — For models that do not natively support constrained output, LangChain4j generates a JSON schema from the return type and appends it to the prompt as instructions (e.g., "respond only in this JSON format"). The response is then deserialized using Jackson.
Native JSON mode / response format — For providers that support constrained JSON output (OpenAI's response_format: { type: json_object } or Anthropic's tool-use-for-structured-output), LangChain4j activates the native mode for more reliable output.

record ProductReview(
    String productName,
    int ratingOutOf5,
    List<String> pros,
    List<String> cons
) {}

interface ReviewAnalyzer {
    @UserMessage("Analyze this customer review and extract key information: {{review}}")
    ProductReview analyze(String review);
}

// Returns a fully populated ProductReview object
ProductReview result = analyzer.analyze("Great laptop, very fast but battery life is poor");
System.out.println(result.ratingOutOf5()); // e.g., 4

Enums work too: if you return an enum Sentiment { POSITIVE, NEUTRAL, NEGATIVE }, LangChain4j instructs the model to return exactly one of those values and maps the response to the correct enum constant. For complex nested objects and lists, Jackson handles the deserialization as long as the model produces valid JSON matching the schema.

What does LangChain4j do when you declare a custom POJO as the return type of an AI Services method?Generates a JSON schema from the POJO, appends format instructions to the prompt, and deserializes the LLM response using Jackson

✓ Well done — LangChain4j automates the full structured output pipeline: schema generation → prompt injection → response deserialization.

Throws a compile error because only String return types are supported

✗ Try again — custom POJO return types are a first-class feature of LangChain4j AI Services.

Sends the POJO class bytecode to the LLM for it to populate the fields

✗ Try again — LLMs receive text prompts. LangChain4j converts the POJO structure to a JSON schema description in the prompt.

What happens in LangChain4j when the AI Services method return type is an enum?The LLM must return a raw number corresponding to the enum ordinal

✗ Try again — LangChain4j instructs the model to return the enum constant name as a string, then maps it to the Java enum.

LangChain4j instructs the model to return one of the enum constant names and maps it to the correct Java enum constant

✓ Well done — enum return types work out of the box; the model is constrained to valid enum values and the result is mapped automatically.

Enum return types are not supported — only String and POJO types

✗ Try again — LangChain4j supports enum return types natively in AI Services.

15. What is the PromptTemplate in LangChain4j and how does it differ from @UserMessage?

PromptTemplate is the lower-level prompt construction API in LangChain4j, used when you are working directly with ChatLanguageModel or building custom chains without the AI Services abstraction. It lets you define a reusable template string with {{variable}} placeholders and fill them in programmatically at runtime.

PromptTemplate template = PromptTemplate.from(
    "You are translating from English to {{language}}. Translate: {{text}}"
);

Prompt prompt = template.apply(Map.of(
    "language", "French",
    "text", "The quick brown fox jumps over the lazy dog"
));

// Generates a Prompt object containing the filled-in text
String result = chatModel.generate(prompt.toUserMessage())
    .content().text();

The key difference from @UserMessage is the level of abstraction and who drives the execution:

PromptTemplate vs @UserMessage
Aspect	PromptTemplate	@UserMessage
Usage context	Direct ChatLanguageModel calls, custom chains	AI Services interface methods only
Variable injection	Manual Map.of(...) call	Automatic from method parameters
Code required	Template creation, apply(), generate()	Just annotation — no code
Best for	Dynamic, programmatically constructed prompts	Declarative, fixed-structure interactions

Use PromptTemplate when you need to dynamically compose different prompt templates at runtime, when you are building low-level chains, or when the fixed annotation approach of AI Services is too rigid for a particular use case.

When would you choose PromptTemplate over @UserMessage in LangChain4j?When you need better performance — PromptTemplate is faster than @UserMessage

✗ Try again — there is no meaningful performance difference. PromptTemplate is chosen for flexibility and direct model access, not speed.

When dynamically constructing prompts at runtime or working directly with ChatLanguageModel outside AI Services

✓ Well done — PromptTemplate suits programmatic prompt construction; @UserMessage is declarative and tied to AI Services interfaces.

When the prompt is longer than 1,000 characters

✗ Try again — prompt length is not the deciding factor. The choice depends on whether you are in the AI Services abstraction or working at the ChatLanguageModel level.

How are variables injected into a LangChain4j PromptTemplate?Via @Param annotations on method parameters in an AI Services interface

✗ Try again — @Param annotations are for AI Services. PromptTemplate uses the .apply(Map<String, Object>) method.

By calling template.apply(Map.of("variableName", value)) which returns a filled-in Prompt object

✓ Well done — PromptTemplate.apply() takes a Map of variable name to value and produces the resolved Prompt.

Via Spring @Value injection directly into {{spring.property}} placeholders

✗ Try again — LangChain4j's PromptTemplate is not aware of Spring's @Value mechanism. Variables are supplied programmatically via Map.

16. What LLM providers does LangChain4j support and how do you switch between them?

LangChain4j supports a wide range of LLM providers, both cloud-based and local, through its modular dependency design. Each provider is a separate Maven module that implements the core ChatLanguageModel and optionally EmbeddingModel, StreamingChatLanguageModel, and ImageModel interfaces.

LangChain4j LLM Provider Support
Provider	Artifact	Notable Models
OpenAI	langchain4j-open-ai	GPT-4o, GPT-4 Turbo, o1, o1-mini
Azure OpenAI	langchain4j-azure-open-ai	OpenAI models on Azure endpoints
Anthropic	langchain4j-anthropic	Claude 3.5, Claude 3 Opus/Sonnet/Haiku
Google Vertex AI	langchain4j-vertex-ai-gemini	Gemini 1.5 Pro, Gemini 1.5 Flash
Mistral AI	langchain4j-mistral-ai	Mistral Large, Codestral
Ollama	langchain4j-ollama	Llama 3, Mistral, Phi-3 (local)
HuggingFace	langchain4j-hugging-face	Open-source models via HF Inference
Amazon Bedrock	langchain4j-bedrock	Claude, Llama on Bedrock
Groq	langchain4j-open-ai (compatible)	OpenAI-compatible fast inference

Switching providers is entirely a configuration concern — your AI Services interface and application logic do not change:

// OpenAI
ChatLanguageModel model = OpenAiChatModel.builder()
    .apiKey("sk-...").modelName("gpt-4o").build();

// Switch to Anthropic — same interface, different builder
ChatLanguageModel model = AnthropicChatModel.builder()
    .apiKey("sk-ant-...").modelName("claude-3-5-sonnet-20241022").build();

// Same AI Services usage for both
Assistant assistant = AiServices.builder(Assistant.class)
    .chatLanguageModel(model).build();

What needs to change in your AI Services interface code when switching from OpenAI to Anthropic in LangChain4j?Nothing — the interface code is provider-agnostic; only the ChatLanguageModel builder and Maven dependency change

✓ Well done — this provider abstraction is a core LangChain4j design goal: swap the model implementation without touching application code.

Rename @SystemMessage to @AnthropicSystemMessage

✗ Try again — there are no provider-specific message annotations. @SystemMessage works with all providers.

Rewrite tool definitions because Anthropic uses a different @Tool annotation format

✗ Try again — @Tool is provider-neutral. LangChain4j translates tool definitions to each provider's native format automatically.

Which provider allows you to run LLMs locally on your own machine with no API costs or internet access?Mistral AI

✗ Try again — Mistral AI is a cloud API provider. Ollama enables fully local LLM execution.

Ollama — runs models like Llama 3 and Phi-3 on local hardware with no API fees

✓ Well done — Ollama is the LangChain4j-supported local inference backend for running open-source models privately.

HuggingFace — it downloads and runs models inside the JVM

✗ Try again — the HuggingFace integration in LangChain4j calls HuggingFace's Inference API remotely, not in-process.

17. What is an Agent in LangChain4j and how does it differ from a simple AI Services call?

In LangChain4j, an Agent is an AI Services instance that has been equipped with a set of Tools and operates in an autonomous reasoning loop. Instead of a single-shot prompt-and-respond interaction, an agent decides at each step whether to answer directly from its knowledge or to invoke one of the available tools to gather more information, then loops until it has enough to produce a final answer.

A simple AI Services call is a single round-trip: user message in → LLM response out. An agent uses a ReAct-style (Reasoning + Acting) loop:

User message is sent to the LLM along with tool descriptions
LLM reasons: "I need current stock prices" → requests a getStockPrice("AAPL") tool call
LangChain4j executes the tool and appends the result to the conversation
LLM reasons with the result: maybe needs another tool call, or produces the final answer
Loop ends when the LLM generates a final text response (no more tool requests)

class FinanceTools {
    @Tool("Gets the current stock price for a ticker symbol")
    double getStockPrice(@P("Ticker symbol like AAPL") String ticker) {
        return marketDataService.getPrice(ticker);
    }

    @Tool("Gets the P/E ratio for a company ticker")
    double getPERatio(@P("Ticker symbol") String ticker) {
        return fundamentalsService.getPERatio(ticker);
    }
}

FinancialAnalyst analyst = AiServices.builder(FinancialAnalyst.class)
    .chatLanguageModel(model)
    .tools(new FinanceTools())
    .chatMemory(MessageWindowChatMemory.withMaxMessages(20))
    .build();

// The agent may call both tools before answering
String answer = analyst.analyze("Is Apple stock overvalued relative to its P/E ratio?");

The critical difference: a simple AI Services call completes in one LLM round-trip with no tool access. An agent orchestrates multiple LLM calls and tool executions autonomously to answer questions that require real data.

What terminates the reasoning loop in a LangChain4j agent?A timeout configured on the ChatLanguageModel builder

✗ Try again — while timeouts can interrupt a hung call, the normal termination is when the LLM decides to produce a final text answer with no more tool calls.

When the LLM generates a final text response without requesting any more tool calls

✓ Well done — the agent loop continues as long as the LLM requests tools. It ends when the model decides it has enough information to answer.

After exactly three tool calls regardless of whether the LLM needs more

✗ Try again — there is no fixed tool call limit by default. The loop ends when the LLM stops requesting tools.

What reasoning pattern does LangChain4j agents follow when deciding whether to use a tool?Chain-of-Thought (CoT) — the model outputs its reasoning before the final answer

✗ Try again — CoT produces reasoning text but doesn't intrinsically involve tool invocation loops. LangChain4j agents follow the ReAct pattern.

ReAct (Reasoning + Acting) — the model alternates between reasoning about what to do and acting via tool calls

✓ Well done — ReAct is the standard pattern: Reason (what information do I need?) → Act (call the tool) → Observe (tool result) → loop until done.

Tree-of-Thought (ToT) — the model explores multiple reasoning branches in parallel

✗ Try again — Tree-of-Thought is a separate prompting technique. LangChain4j agents use the linear ReAct loop.

18. How do you implement multi-turn conversation with memory per user in a Spring REST API using LangChain4j?

Implementing per-user conversational memory in a Spring REST API requires three things: an AI Services interface with a memory-id parameter, a ChatMemoryProvider that returns isolated memory per ID, and a backing store to persist conversations across requests (or restarts).

// 1. AI Services interface with per-user memory
interface ChatAssistant {
    String chat(@MemoryId String userId, @UserMessage String message);
}

// 2. In-memory store for development (switch to Redis/DB for production)
Map<String, ChatMemory> memoryMap = new ConcurrentHashMap<>();

ChatMemoryProvider memoryProvider = memoryId ->
    memoryMap.computeIfAbsent(memoryId.toString(), id ->
        MessageWindowChatMemory.withMaxMessages(20));

// 3. Build the AI Service with the provider
ChatAssistant assistant = AiServices.builder(ChatAssistant.class)
    .chatLanguageModel(model)
    .chatMemoryProvider(memoryProvider)
    .build();

// 4. Spring REST controller
@RestController
@RequestMapping("/api/chat")
class ChatController {
    private final ChatAssistant assistant;

    ChatController(ChatAssistant assistant) {
        this.assistant = assistant;
    }

    @PostMapping("/{userId}")
    String chat(@PathVariable String userId, @RequestBody String message) {
        return assistant.chat(userId, message); // each user gets isolated memory
    }
}

The @MemoryId annotation tells LangChain4j which parameter is the memory key. The ChatMemoryProvider lambda receives this key and returns the appropriate memory store for that user. For production, replace the ConcurrentHashMap with a Redis-backed or JDBC-backed memory store so conversations survive application restarts and work across multiple pods.

Which annotation marks the memory isolation key parameter in a LangChain4j AI Services multi-user interface?@UserId

✗ Try again — @UserId is not a LangChain4j annotation. @MemoryId is the correct annotation for the memory isolation key parameter.

@MemoryId

✓ Well done — @MemoryId tells LangChain4j which method parameter to use as the key for ChatMemoryProvider lookup.

@SessionId

✗ Try again — @SessionId is not a LangChain4j annotation. The correct one is @MemoryId.

Why replace the ConcurrentHashMap memory store with Redis or a database in production?Because ConcurrentHashMap cannot store more than 10,000 conversations

✗ Try again — ConcurrentHashMap has no built-in size limit. The production issue is that in-memory state is lost on restart and not shared across multiple pods.

Because in-memory state is lost on application restart and is not shared across horizontally scaled pods

✓ Well done — persistence and horizontal scaling are the two production requirements that in-memory maps cannot satisfy.

Because Spring Boot auto-configuration requires a persistent store for ChatMemoryProvider

✗ Try again — Spring Boot auto-configuration works with any ChatMemoryProvider. The persistence requirement is a production reliability concern, not a Spring Boot constraint.

19. What is the ImageModel in LangChain4j and which providers support image generation?

ImageModel is the LangChain4j interface for text-to-image generation — sending a text prompt and receiving a generated image in return. It follows the same provider-abstraction pattern as ChatLanguageModel: your code works against the interface, and the actual generation is delegated to whichever provider you configure.

public interface ImageModel {
    Response<Image> generate(String prompt);
    Response<List<Image>> generate(String prompt, int n);
    Response<Image> edit(Image image, String prompt);  // inpainting
    Response<Image> edit(Image image, Image mask, String prompt);
}

The Image response object contains either a URL to the generated image (hosted by the provider) or a Base64-encoded data URI, depending on the provider and configuration.

ImageModel model = OpenAiImageModel.builder()
    .apiKey(apiKey)
    .modelName(DALL_E_3)
    .size("1024x1024")
    .quality("standard")
    .build();

Response<Image> response = model.generate(
    "A serene Japanese zen garden at dawn, photorealistic");

String imageUrl = response.content().url().toString();
// or Base64: response.content().base64Data()

Supported image generation providers in LangChain4j:

OpenAI DALL-E 2 / DALL-E 3 — Available via langchain4j-open-ai; DALL-E 3 supports higher quality and natural language understanding
Azure OpenAI DALL-E — Via langchain4j-azure-open-ai for enterprise Azure deployments
Stability AI — Via langchain4j-stability-ai for Stable Diffusion models

Note that ImageModel is a distinct interface from ChatLanguageModel. Multi-modal vision models that accept images as input (GPT-4V, Claude 3) are handled through ChatLanguageModel's message API using ImageContent, not through ImageModel.

What is the correct LangChain4j interface for generating images from text prompts?MultiModalChatLanguageModel

✗ Try again — multi-modal models that receive images as input use ChatLanguageModel. The interface for text-to-image generation is ImageModel.

ImageModel

✓ Well done — ImageModel is the dedicated LangChain4j interface for text-to-image generation workflows.

DiffusionModel

✗ Try again — DiffusionModel is not a LangChain4j interface name. The correct interface is ImageModel.

How is an LLM's vision capability (analyzing images as input) different from ImageModel in LangChain4j?They are the same interface — ImageModel handles both generation and analysis

✗ Try again — ImageModel is for text-to-image generation only. Vision (image-as-input) uses ChatLanguageModel with ImageContent in the user message.

Vision uses ChatLanguageModel with ImageContent in the message; ImageModel is exclusively for text-to-image generation

✓ Well done — these are separate capabilities with different interfaces: ChatLanguageModel for analysis/vision, ImageModel for generation.

Vision models require a separate VisionLanguageModel interface not yet in LangChain4j

✗ Try again — vision is supported through the existing ChatLanguageModel interface using ImageContent in user messages.

20. How do you handle errors and retries in LangChain4j?

LangChain4j itself does not provide a built-in retry framework — it intentionally delegates retry logic to the infrastructure layer. However, there are several natural integration points for error handling depending on your deployment context.

Rate limit handling (HTTP 429) — Most provider implementations in LangChain4j throw a dev.langchain4j.exception.RateLimitException when the LLM provider returns a 429. You handle this at the call site or through Spring's @Retryable mechanism:

// Using Spring Retry with @Retryable
@Service
class AiService {
    private final ChatAssistant assistant;

    @Retryable(
        retryFor = RateLimitException.class,
        maxAttempts = 3,
        backoff = @Backoff(delay = 2000, multiplier = 2)
    )
    public String chat(String userId, String message) {
        return assistant.chat(userId, message);
    }

    @Recover
    public String fallback(RateLimitException ex, String userId, String message) {
        return "Service is temporarily busy. Please try again in a moment.";
    }
}

Timeout handling — Configure timeouts directly on the ChatLanguageModel builder:

OpenAiChatModel model = OpenAiChatModel.builder()
    .apiKey(apiKey)
    .timeout(Duration.ofSeconds(30))
    .maxRetries(2) // some providers support built-in retries in the client
    .build();

The OpenAI and some other provider clients support a maxRetries parameter that enables automatic retries with exponential backoff inside the HTTP client before the exception propagates to your code. For structured error handling across all exceptions, wrapping the AI Services call in a try-catch and mapping to application-specific error responses is standard practice. Resilience4j's circuit breaker is another option for preventing cascading failures when an LLM provider is degraded.

What exception does LangChain4j throw when the LLM provider returns an HTTP 429 rate limit response?RateLimitException from the dev.langchain4j.exception package

✓ Well done — RateLimitException is LangChain4j's typed exception for provider 429 responses, enabling targeted retry logic.

HttpStatusException with status 429

✗ Try again — LangChain4j wraps HTTP errors into typed exceptions. The rate limit case throws RateLimitException, not a generic HTTP exception.

LangChain4jException with message "rate_limited"

✗ Try again — there is no generic LangChain4jException. Rate limiting specifically maps to RateLimitException.

Where do you configure request timeouts for a LangChain4j OpenAI model?In application.properties via spring.ai.openai.timeout

✗ Try again — that is a Spring AI property path, not LangChain4j. LangChain4j configures timeouts directly on the model builder.

On the OpenAiChatModel builder using .timeout(Duration.ofSeconds(N))

✓ Well done — LangChain4j configures timeout at the model builder level, where it is applied to all HTTP calls for that model instance.

Via a Spring @Bean of type HttpClientConfiguration injected into the model

✗ Try again — LangChain4j does not use Spring HTTP configuration. Timeout is set directly on the builder.

21. How do you test LangChain4j AI Services without making real LLM API calls?

Testing AI Services without hitting real LLM endpoints is essential for fast, cost-free, deterministic unit tests. LangChain4j supports this through mock model implementations and the AiServices builder accepting any ChatLanguageModel — including test doubles you create yourself.

The most direct approach is to implement a simple mock that returns predetermined responses:

// Simple lambda mock
ChatLanguageModel mockModel = (messages, toolSpecifications) ->
    new AiMessage("The capital of France is Paris.");

GeographyAssistant assistant = AiServices.builder(GeographyAssistant.class)
    .chatLanguageModel(mockModel)
    .build();

String answer = assistant.ask("What is the capital of France?");
assertThat(answer).isEqualTo("The capital of France is Paris.");

For more complex scenarios, Mockito works naturally since ChatLanguageModel is an interface:

@ExtendWith(MockitoExtension.class)
class TranslatorTest {

    @Mock
    ChatLanguageModel mockModel;

    @Test
    void translatesText() {
        AiMessage fakeResponse = new AiMessage("Bonjour le monde");
        when(mockModel.generate(anyList())).thenReturn(new Response<>(fakeResponse));

        Translator translator = AiServices.builder(Translator.class)
            .chatLanguageModel(mockModel).build();

        assertThat(translator.translate("Hello world", "French"))
            .isEqualTo("Bonjour le monde");
    }
}

For integration tests that require a real LLM but want cost control, use Ollama with a small local model (e.g., tinyllama) via Testcontainers. This gives you real model behavior without OpenAI billing and can run in CI pipelines. The langchain4j-ollama module combined with the Testcontainers Ollama image enables fully automated integration test suites with no API keys required.

What is the simplest way to create a mock ChatLanguageModel for LangChain4j unit tests?Implement a lambda that satisfies the ChatLanguageModel interface and returns a predetermined AiMessage

✓ Well done — since ChatLanguageModel is a functional interface, a lambda mock is the most concise test double option.

Use the @MockLLM Spring annotation to auto-inject a test model

✗ Try again — @MockLLM does not exist in LangChain4j. You create mock models via lambda or Mockito.

Set langchain4j.test-mode=true in application-test.properties

✗ Try again — there is no test-mode property. Mocking is done at the model object level in your test code.

Which tool lets you run real LLM integration tests in CI without OpenAI API keys or billing?WireMock — record and replay OpenAI HTTP interactions

✗ Try again — WireMock works for HTTP replay but requires recording real API calls first. Ollama + Testcontainers provides a fully local alternative.

Ollama with Testcontainers — runs a local model in a Docker container during tests

✓ Well done — Ollama via Testcontainers starts a real LLM locally in CI, enabling genuine integration tests without cloud API dependencies.

The LangChain4j in-memory replay adapter that captures prod responses

✗ Try again — no such built-in replay adapter exists in LangChain4j. Testcontainers + Ollama is the idiomatic solution.

22. What is the DocumentLoader API in LangChain4j and what sources does it support?

Document loaders are the entry point of any RAG ingestion pipeline — they read raw content from a source and return it as a list of Document objects, each containing the text content and source metadata. LangChain4j's loaders all implement the DocumentLoader interface and populate the Document.metadata() map with source-specific information like file path, URL, or S3 key.

Built-in document loader sources include:

LangChain4j Built-in Document Loaders
Loader	Source	Notes
FileSystemDocumentLoader	Local files and directories	Supports glob patterns; auto-detects parser by extension
UrlDocumentLoader	HTTP/HTTPS URLs	Fetches and parses web pages
ClassPathDocumentLoader	Classpath resources	Good for embedded documentation in JARs
AmazonS3DocumentLoader	AWS S3 buckets	Via langchain4j-document-loader-amazon-s3
AzureBlobStorageDocumentLoader	Azure Blob Storage	Via langchain4j-document-loader-azure-storage-blob
GitHubDocumentLoader	GitHub repositories	Loads files from a repo branch

Document parsers handle format-specific extraction: TextDocumentParser for plain text, ApachePdfBoxDocumentParser for PDFs, ApacheTikaDocumentParser for Word/Excel/PowerPoint and 100+ other formats. Parsers are composable with loaders:

// Load all PDFs from a directory
List<Document> docs = FileSystemDocumentLoader.loadDocuments(
    "./knowledge-base",
    PathMatcher.of("glob:**.pdf"),
    new ApachePdfBoxDocumentParser()
);

Which LangChain4j document parser handles Word, Excel, and PowerPoint files along with 100+ other formats?ApacheTikaDocumentParser

✓ Well done — Apache Tika is a content detection and extraction library that handles an enormous range of formats, making it the go-to parser for heterogeneous document collections.

ApachePdfBoxDocumentParser

✗ Try again — PdfBox handles PDFs specifically. For Office documents and other formats, ApacheTikaDocumentParser is the correct choice.

TextDocumentParser

✗ Try again — TextDocumentParser handles plain text files only. Tika covers the rich format variety.

What metadata does LangChain4j automatically populate in a Document loaded by FileSystemDocumentLoader?The file's MD5 checksum for deduplication

✗ Try again — checksums are not auto-populated. The loader adds source metadata like the absolute file path.

Source metadata including the file path, which is stored in Document.metadata() for traceability

✓ Well done — source metadata lets you trace retrieved chunks back to their origin file, which is useful for citing sources in RAG responses.

A timestamp of when the document was last modified

✗ Try again — file modification time is not automatically included in the metadata by default. The file path is the primary auto-populated metadata.

23. What is the @Moderate annotation in LangChain4j and how does content moderation work?

The @Moderate annotation integrates content moderation directly into the AI Services pipeline. When placed on an AI Services method, LangChain4j automatically runs the user message through OpenAI's Moderation API before passing it to the language model. If the content is flagged as violating content policies, a ModerationException is thrown before the LLM is ever called — protecting you from sending inappropriate content upstream and from generating harmful responses.

interface SafeAssistant {

    @Moderate   // automatic moderation check on every call
    @SystemMessage("You are a helpful customer service assistant.")
    String chat(String userMessage);
}

// Build with a moderation model configured
SafeAssistant assistant = AiServices.builder(SafeAssistant.class)
    .chatLanguageModel(chatModel)
    .moderationModel(OpenAiModerationModel.builder()
        .apiKey(apiKey)
        .build())
    .build();

// Usage
try {
    String response = assistant.chat(userInput);
} catch (ModerationException e) {
    // Input was flagged — respond with a rejection message
    return "I cannot process that request.";
}

The moderation check happens before the main LLM call, which means: no tokens wasted on the primary model, no risk of the LLM processing harmful prompts, and your system gets an automatic first line of defense. The moderation model (currently only OpenAI's text-moderation-latest is supported natively) returns categories and confidence scores for hate, harassment, self-harm, violence, and sexual content.

For applications where OpenAI's moderation is not suitable (on-premise deployments, or different moderation criteria), you can implement the ModerationModel interface with custom logic and plug it in identically.

When does the @Moderate check run relative to the main LLM call in LangChain4j?Before the main LLM call — flagged content throws ModerationException without ever reaching the language model

✓ Well done — pre-call moderation is the correct and cost-efficient design: no primary model tokens are consumed for flagged requests.

After the main LLM call — the LLM response is moderated before returning to the caller

✗ Try again — @Moderate checks the user input before the LLM is called, not the LLM response afterward.

Concurrently — moderation and LLM calls run in parallel for speed

✗ Try again — parallel execution would defeat the purpose; if moderation flags the input, the LLM call should be prevented entirely.

What exception does LangChain4j throw when @Moderate flags a user message?SecurityException

✗ Try again — Java's SecurityException is a JVM security concern. LangChain4j throws its own ModerationException for content policy violations.

IllegalArgumentException with message "content_policy_violation"

✗ Try again — LangChain4j uses a typed exception specifically for this case: ModerationException.

ModerationException — a typed exception that signals the moderation check failed

✓ Well done — catching ModerationException specifically lets you distinguish content policy violations from other errors and respond appropriately.

24. How does LangChain4j support vision (multi-modal) LLMs that accept images as input?

Multi-modal LLMs like GPT-4o, Claude 3, and Gemini can process images alongside text. In LangChain4j, image input is handled through the UserMessage content builder, which accepts a list of Content objects — combining TextContent and ImageContent in a single user turn.

// Pass an image URL
UserMessage message = UserMessage.from(
    TextContent.from("What defects do you see in this product image?"),
    ImageContent.from("https://cdn.example.com/product-photo.jpg")
);

AiMessage response = chatModel.generate(List.of(message)).content();

// Or pass Base64-encoded image data (for local files)
byte[] imageBytes = Files.readAllBytes(Path.of("screenshot.png"));
String base64 = Base64.getEncoder().encodeToString(imageBytes);

UserMessage visionMessage = UserMessage.from(
    TextContent.from("Describe any errors shown in this screenshot"),
    ImageContent.from(base64, "image/png")
);

Vision capabilities also work with AI Services. You can define a method that accepts a UserMessage directly, or use @UserMessage with image parameters:

interface ImageAnalyzer {
    @UserMessage("Analyze this image and describe what you see.")
    String analyze(UserMessage messageWithImage);
}

Important considerations: not all models in the same provider family support vision (e.g., GPT-3.5 cannot process images; GPT-4o can). Check that your configured model name is a vision-capable variant. Image inputs consume significantly more tokens than text, which affects both cost and context window usage — high-resolution images can consume thousands of tokens depending on the model's tile-based processing strategy.

How do you include an image alongside a text prompt in a LangChain4j multi-modal request?Construct a UserMessage with both TextContent and ImageContent items in the content list

✓ Well done — multi-modal input uses the UserMessage.from(TextContent, ImageContent) builder to include both text and image in one user turn.

Use the ImageModel.analyze(image, prompt) method

✗ Try again — ImageModel is for text-to-image generation. Vision analysis uses ChatLanguageModel with ImageContent in the UserMessage.

Encode the image as a Base64 string and embed it directly in the @UserMessage annotation template

✗ Try again — embedding Base64 in annotations is not the intended pattern. ImageContent handles Base64 through the UserMessage builder API.

What should you verify when configuring a vision-capable model in LangChain4j?That the EmbeddingModel is also configured to process images

✗ Try again — EmbeddingModel handles text vectors, not vision. The key check is that the ChatLanguageModel model name is a vision-capable variant.

That the configured model name supports vision — not all models in a provider family do (e.g., GPT-3.5 cannot process images)

✓ Well done — vision capability is model-specific. Using a non-vision model with image inputs will result in an API error from the provider.

That the image file is stored in the same region as the Azure OpenAI deployment

✗ Try again — image storage region is not a LangChain4j concern. The critical check is the model name's vision support.

25. What is the difference between synchronous and asynchronous execution in LangChain4j?

LangChain4j supports both synchronous and asynchronous execution models for LLM calls. The choice affects how your application thread behaves while waiting for the (potentially slow) LLM response.

Synchronous — The calling thread blocks until the complete response is received. This is the default and simplest mode, appropriate for batch jobs, background tasks, and thread-per-request servers where thread blocking is acceptable.

// Sync: thread blocks until response arrives (may take 5-30 seconds)
String answer = assistant.chat("What is quantum computing?");

Asynchronous (CompletableFuture) — Declare the return type as CompletableFuture<String> (or any other response type) in your AI Services interface. LangChain4j submits the call on a separate thread and returns immediately with a future:

interface AsyncAssistant {
    CompletableFuture<String> chat(String message);
    CompletableFuture<ProductReview> analyze(String review); // works with POJOs too
}

// Non-blocking: returns immediately, response arrives later
CompletableFuture<String> future = assistant.chat("Explain blockchain");
future.thenAccept(answer -> System.out.println("Got answer: " + answer));
// ... continue doing other work ...

Streaming (TokenStream) — Token-by-token delivery. Neither sync nor truly async — it is event-driven and provides progressive output rather than waiting for the full response or getting it all at once later. Best for UI responsiveness.

For Spring WebFlux applications, the recommended pattern is returning Flux<String> by bridging LangChain4j's TokenStream to a reactive publisher via Sinks.Many or a FluxSink. Pure CompletableFuture works for non-streaming Spring MVC async (DeferredResult) or WebFlux scenarios.

What return type do you declare in a LangChain4j AI Services method to make it non-blocking?Future<String> from java.util.concurrent

✗ Try again — LangChain4j specifically uses CompletableFuture for async AI Services. Plain Future is not the supported return type.

CompletableFuture<String> — LangChain4j detects this return type and executes the LLM call on a separate thread

✓ Well done — CompletableFuture is LangChain4j's async return type for complete-response delivery without blocking the calling thread.

Mono<String> from Project Reactor

✗ Try again — Mono is a reactive type. LangChain4j uses CompletableFuture for async; bridging to Mono requires additional conversion code.

What is the key difference between CompletableFuture and TokenStream as return types in LangChain4j?CompletableFuture is for streaming; TokenStream is for complete responses

✗ Try again — this is backwards. CompletableFuture delivers the complete response asynchronously; TokenStream delivers tokens progressively.

CompletableFuture delivers the complete response after LLM finishes; TokenStream delivers tokens one-by-one as they are generated

✓ Well done — CompletableFuture is async-complete; TokenStream is streaming/progressive. Different use cases: background tasks vs. real-time UIs.

They are equivalent — LangChain4j uses them interchangeably

✗ Try again — CompletableFuture and TokenStream have fundamentally different delivery semantics and are not interchangeable.

26. What is LangChain4j's support for Quarkus and how does it differ from Spring Boot integration?

LangChain4j has a dedicated Quarkus extension (quarkus-langchain4j) maintained under the Quarkiverse umbrella. It provides CDI-based injection, Quarkus-native configuration, and — critically — native compilation support through GraalVM, enabling LangChain4j applications to be compiled to native executables with sub-second startup times.

The main differences from the Spring Boot integration:

LangChain4j: Spring Boot vs Quarkus Integration
Aspect	Spring Boot Integration	Quarkus Integration
Dependency injection	Spring IoC / @Autowired	CDI / @Inject
Configuration	application.properties (spring.langchain4j.*)	application.properties (quarkus.langchain4j.*)
AI Service registration	@AiService (or @Bean)	@RegisterAiService
Native image support	Spring Native (experimental for AI libs)	First-class via GraalVM — officially supported
Dev mode	Spring DevTools hot reload	Quarkus Dev mode live reload + Dev UI panel for AI
Observability	Spring Actuator + Micrometer	Quarkus OpenTelemetry auto-instrumentation

// Quarkus - register an AI service with @RegisterAiService
@RegisterAiService(tools = WeatherTools.class)
interface WeatherAssistant {
    @SystemMessage("You are a weather assistant.")
    String chat(String userMessage);
}

// Inject it as a CDI bean
@ApplicationScoped
class WeatherEndpoint {
    @Inject WeatherAssistant assistant;
}

Quarkus Dev mode provides a visual Dev UI panel specifically for LangChain4j where you can inspect registered AI services, test prompts interactively, and view conversation history — a significant developer experience advantage over the Spring Boot approach for teams working on Quarkus applications.

What annotation is used in Quarkus to declare a LangChain4j AI service (equivalent to @AiService in Spring Boot)?@AiService

✗ Try again — @AiService is the Spring Boot annotation. The Quarkus equivalent is @RegisterAiService.

@RegisterAiService

✓ Well done — @RegisterAiService is the Quarkus LangChain4j annotation that triggers CDI bean creation for the AI service interface.

@LangChainService

✗ Try again — @LangChainService is not a LangChain4j annotation. Quarkus uses @RegisterAiService.

What is the key advantage Quarkus LangChain4j offers over the Spring Boot integration for production deployments?Quarkus supports more LLM providers than Spring Boot

✗ Try again — provider support is equivalent. The Quarkus advantage is first-class GraalVM native image compilation for sub-second startup.

First-class GraalVM native image compilation enabling sub-second startup — officially supported unlike Spring Native for AI libraries

✓ Well done — native compilation is the production differentiator for Quarkus: cold start in milliseconds, low memory footprint, ideal for serverless deployments.

Quarkus processes LLM requests faster due to JVM tuning at compile time

✗ Try again — the JVM-level performance for LLM HTTP calls is similar. The startup time and memory footprint benefits of native compilation are the real advantage.

27. How does LangChain4j implement the ReAct agent pattern and what are its limitations?

The ReAct (Reasoning + Acting) pattern in LangChain4j is implemented automatically by the AI Services framework whenever you register tools with a chat language model. There is no explicit ReAct class to instantiate — the pattern emerges from the interaction between the tool-equipped LLM and LangChain4j's tool execution loop.

The concrete mechanics inside LangChain4j's AI Services when tools are present:

Tool schemas (name, description, parameter types) are serialized from your @Tool annotated methods and included in every LLM request
If the LLM returns a tool call in its response, LangChain4j intercepts it, looks up the corresponding method, deserializes the arguments, and invokes the Java method via reflection
The tool result is appended as a ToolExecutionResultMessage to the conversation history
The LLM is called again with the updated history — it can reason about the result and either call another tool or produce a final text answer
This loop continues until the LLM stops requesting tools (step 4's output is not a tool call)

Known limitations of the current implementation:

No parallel tool execution — When the LLM requests multiple tools simultaneously (some models support this), LangChain4j executes them sequentially, not in parallel, which increases latency for multi-tool queries
No configurable max iterations — There is no built-in loop guard. A misbehaving model or misconfigured tool could theoretically loop indefinitely. You must add your own application-level timeout
Single agent only — LangChain4j does not natively orchestrate multi-agent workflows where agents delegate subtasks to other agents. Custom code is required for that pattern
Tool schemas depend on model support — Tool calling requires a model that supports the function calling protocol. Older or smaller models may produce unreliable tool call JSON

What happens in LangChain4j when a tool-equipped LLM requests multiple tools at once?LangChain4j executes all requested tools in parallel for maximum speed

✗ Try again — parallel tool execution is a known LangChain4j limitation. Tools are currently executed sequentially even when multiple are requested simultaneously.

LangChain4j executes the tools sequentially — parallel execution is not yet supported, adding latency

✓ Well done — sequential execution of parallel tool requests is a documented LangChain4j limitation that increases response time for multi-tool queries.

Only the first requested tool is executed; the rest are ignored

✗ Try again — all tool requests are eventually processed; they just run sequentially rather than in parallel.

What risk exists if there is no max-iteration limit in a LangChain4j agent tool loop?The conversation memory fills up causing an OutOfMemoryError

✗ Try again — memory overflow is a secondary concern. The primary risk is infinite looping consuming API credits and blocking the thread indefinitely.

A misbehaving model or misconfigured tool could cause infinite tool-call loops, exhausting API credits and blocking threads

✓ Well done — without a loop guard, a stuck agent burns through your API quota and holds a thread indefinitely. Application-level timeouts are needed.

LangChain4j automatically terminates after 10 tool calls by default

✗ Try again — there is no built-in 10-call limit in LangChain4j. You must implement your own safeguard.

28. What is the ModerationModel interface in LangChain4j and how can you implement a custom one?

The ModerationModel interface in LangChain4j defines the contract for content moderation checks. It takes a String input and returns a Response<Moderation> — where Moderation contains a boolean flagged() result and optionally category-level scores. LangChain4j's @Moderate AI Services annotation uses whichever ModerationModel you register on the builder.

The built-in implementation is OpenAiModerationModel, which calls OpenAI's text-moderation-latest API. But for custom moderation logic — rule-based keyword filtering, an internal ML model, or a different provider's moderation API — you implement the interface directly:

public class KeywordModerationModel implements ModerationModel {

    private static final Set<String> BLOCKED = Set.of(
        "badword1", "badword2", "competitor-brand"
    );

    @Override
    public Response<Moderation> moderate(String text) {
        boolean flagged = BLOCKED.stream()
            .anyMatch(word -> text.toLowerCase().contains(word));

        return Response.from(
            flagged ? Moderation.flagged(text) : Moderation.notFlagged()
        );
    }
}

// Plug in as the moderation model
SafeAssistant assistant = AiServices.builder(SafeAssistant.class)
    .chatLanguageModel(chatModel)
    .moderationModel(new KeywordModerationModel())
    .build();

Custom implementations are particularly useful for on-premise deployments that cannot use external APIs, for organizations with specific terminology blocklists, or for domain-specific moderation where generic toxicity models produce too many false positives. The interface is small and straightforward — moderate(String) is the only method you must implement.

What is the single method you must implement to create a custom LangChain4j ModerationModel?moderate(String text) returning Response<Moderation>

✓ Well done — ModerationModel has one required method. Return Moderation.flagged() to block or Moderation.notFlagged() to allow the content.

classify(String input) returning ModerationCategory

✗ Try again — classify() is not the ModerationModel method. The interface defines moderate(String) returning Response<Moderation>.

check(List<ChatMessage> messages) returning boolean

✗ Try again — ModerationModel operates on a single text string, not a full message list. The method signature is moderate(String).

When would you implement a custom ModerationModel instead of using OpenAiModerationModel?When you want to moderate LLM responses instead of user inputs

✗ Try again — response moderation is a valid use case but not specific to custom implementations; OpenAiModerationModel can also moderate responses. Custom models are for air-gapped environments or domain-specific rules.

For air-gapped environments, organization-specific blocklists, or when the generic toxicity model produces too many false positives for your domain

✓ Well done — these are the practical scenarios where a custom ModerationModel outperforms the generic OpenAI approach.

When you need moderation to run asynchronously on a background thread

✗ Try again — the execution thread model is not the reason to implement a custom ModerationModel. Air-gapping and domain specificity drive that decision.

29. What is the Tokenizer interface in LangChain4j and why does it matter for memory management?

The Tokenizer interface in LangChain4j counts the number of tokens in a given string or list of messages using the specific tokenization algorithm of a target model. This is necessary because LLMs do not process raw characters or words — they operate on tokens, which are sub-word units that vary in count depending on the model's vocabulary. The same sentence can produce different token counts in GPT-4 vs Claude vs Llama.

Token counting matters for two concrete reasons in LangChain4j:

TokenWindowChatMemory — Uses a Tokenizer to ensure the accumulated conversation history never exceeds the model's context window limit. Without accurate token counting, you either truncate valid context too early or exceed the limit and get API errors.
Cost estimation — Before sending a request, counting tokens lets you estimate API cost (most providers charge per input/output token) and set guardrails on expensive queries.

// Count tokens for OpenAI GPT-4
Tokenizer tokenizer = new OpenAiTokenizer(GPT_4);
int tokensInPrompt = tokenizer.estimateTokenCountInMessage(
    SystemMessage.from("You are a helpful assistant.")
);

// Use with TokenWindowChatMemory for precise context management
ChatMemory memory = TokenWindowChatMemory.builder()
    .maxTokens(8192, new OpenAiTokenizer(GPT_4))
    .build();

LangChain4j ships tokenizers for OpenAI models (using the jtokkit library, which implements the BPE tokenization algorithm used by OpenAI), and approximate tokenizers for other models. For models without exact tokenizer support, the approximate tokenizer estimates based on average characters-per-token ratios — less precise but sufficient for rough context management.

Why is exact token counting more important than character counting for context window management?LLMs process tokens, not characters — context window limits are defined in tokens, and the character-to-token ratio varies significantly by content

✓ Well done — code, numbers, and non-English text have very different character-to-token ratios. Accurate token counting prevents both under-utilization and context overflow.

Character counting is slower than token counting for large documents

✗ Try again — performance is not the issue. The accuracy of context window management is why token counting is essential.

API providers bill per character, so character counting is needed for billing

✗ Try again — providers bill per token, not per character. Token counting is needed both for context management and cost estimation.

What Java library does LangChain4j use under the hood for OpenAI-compatible token counting?Apache Commons Text tokenizer

✗ Try again — Apache Commons Text is for general string operations. LangChain4j uses jtokkit for OpenAI BPE tokenization.

jtokkit — a Java implementation of OpenAI's BPE tokenization algorithm

✓ Well done — jtokkit provides exact token counts matching OpenAI's cl100k_base and o200k_base encodings used by GPT models.

SentencePiece Java wrapper

✗ Try again — SentencePiece is used in models like T5 and LLaMA. OpenAI uses BPE tokenization, implemented in LangChain4j via jtokkit.

30. How do you persist ChatMemory across application restarts in LangChain4j?

LangChain4j's built-in MessageWindowChatMemory and TokenWindowChatMemory use in-memory storage — conversations vanish when the application restarts or when a new pod starts in a Kubernetes cluster. For production persistence you need a persistent ChatMemoryStore implementation.

LangChain4j defines the ChatMemoryStore interface with three methods:

public interface ChatMemoryStore {
    List<ChatMessage> getMessages(Object memoryId);
    void updateMessages(Object memoryId, List<ChatMessage> messages);
    void deleteMessages(Object memoryId);
}

You implement this against any persistence backend and plug it into the memory configuration:

// Redis-backed implementation example
@Component
class RedisChatMemoryStore implements ChatMemoryStore {
    private final RedisTemplate<String, String> redis;
    private final ObjectMapper mapper;

    @Override
    public List<ChatMessage> getMessages(Object memoryId) {
        String json = redis.opsForValue().get("chat:" + memoryId);
        if (json == null) return new ArrayList<>();
        return mapper.readValue(json, new TypeReference<>(){});
    }

    @Override
    public void updateMessages(Object memoryId, List<ChatMessage> messages) {
        redis.opsForValue().set("chat:" + memoryId,
            mapper.writeValueAsString(messages),
            Duration.ofHours(24));
    }

    @Override
    public void deleteMessages(Object memoryId) {
        redis.delete("chat:" + memoryId);
    }
}

// Wire into memory
ChatMemoryProvider provider = memoryId ->
    MessageWindowChatMemory.builder()
        .id(memoryId)
        .maxMessages(20)
        .chatMemoryStore(redisChatMemoryStore)
        .build();

What interface must you implement to add a custom persistent backend for LangChain4j ChatMemory?ChatMemoryStore — providing getMessages, updateMessages, and deleteMessages methods

✓ Well done — ChatMemoryStore is the persistence contract. Implement it against Redis, JDBC, or any store and plug it into ChatMemory.builder().

PersistentChatMemory — a subclass of MessageWindowChatMemory

✗ Try again — PersistentChatMemory is not a LangChain4j class. You implement the ChatMemoryStore interface and inject it into the existing memory types.

MessageRepository extending JpaRepository<ChatMessage, String>

✗ Try again — a Spring Data JPA repository could back the implementation, but the interface LangChain4j requires is ChatMemoryStore, not Spring Data's Repository.

Which parameter on ChatMemory.builder() associates the memory instance with a specific persistence record?.chatMemoryStore(store) — the store itself is the identifier

✗ Try again — .chatMemoryStore() provides the storage backend, but .id() is what identifies which record within that store to use.

.id(memoryId) — sets the key used to look up messages in the ChatMemoryStore

✓ Well done — the memory ID (typically the user or session ID) is the key that ChatMemoryStore uses to retrieve and save the correct conversation history.

.partitionKey(memoryId) — partitions messages for distributed storage

✗ Try again — partitionKey is not a ChatMemory builder method. Use .id() to set the lookup key for the ChatMemoryStore.

31. What are the best practices for prompt engineering within LangChain4j AI Services?

Prompt engineering in LangChain4j is about designing the @SystemMessage and @UserMessage content so the LLM reliably produces what you need. Several practices have proven effective in production LangChain4j applications:

1. Keep system messages focused and specific. A system message that tries to do too many things (act as a customer service agent AND a code reviewer AND limit to company topics) produces mediocre results for all of them. One interface, one clear role.

2. Use explicit output format instructions for structured responses. When returning POJOs, the auto-generated JSON schema is usually sufficient, but for edge cases add explicit instructions: "Always respond in valid JSON. Do not add explanation text outside the JSON."

3. Load long prompts from classpath resources, not annotations. Multi-paragraph system prompts inlined in annotations are hard to read, test, and update without a recompile:

// Hard to maintain
@SystemMessage("You are a... (200 words here)...")

// Better — load from file
@SystemMessage(fromResource = "prompts/customer-service-system.txt")

4. Use few-shot examples for consistent formatting. Include 1-3 examples of ideal input → output pairs in the system message when the output format is non-trivial. This dramatically reduces malformed JSON or incorrect tone.

5. Version prompt files in source control separately from code. Treat src/main/resources/prompts/ as a versioned artifact. Prompt changes should go through review since they affect model behavior as much as code changes do.

6. Test with multiple inputs before deploying. LLM outputs are non-deterministic. Write parameterized tests covering edge cases: empty input, very long input, input in a non-English language, adversarial prompt injection attempts.

Why should long system prompts be loaded from classpath resources rather than inlined as annotation strings in LangChain4j?Annotations have a 255-character limit in Java

✗ Try again — Java annotations do not have a 255-character limit. The reason is maintainability: file-based prompts are easier to read, review, and update.

File-based prompts are easier to read, test, and update independently of code recompilation

✓ Well done — separating prompts into resource files enables prompt engineers to iterate without touching Java source files, and makes diffs clearer in code review.

LangChain4j cannot process annotation strings longer than 500 characters

✗ Try again — there is no such LangChain4j limitation. The best practice is about maintainability, not technical constraints.

What prompt technique helps ensure consistent output formatting when LangChain4j's auto-generated JSON schema is not reliable enough?Increasing the model's temperature to generate more varied outputs

✗ Try again — higher temperature increases variation, which is the opposite of what you want for consistent formatting.

Adding few-shot examples of ideal input-output pairs to the system message

✓ Well done — few-shot examples in the system message guide the model toward the exact format you want, reducing malformed or inconsistently formatted responses.

Setting maxOutputTokens to exactly match the expected response length

✗ Try again — token limits truncate output but don't enforce structure. Few-shot examples teach the model the desired format.

32. How does LangChain4j integrate with observability tools like OpenTelemetry?

LangChain4j 0.31+ introduced native OpenTelemetry instrumentation for tracing LLM calls. When the langchain4j-open-telemetry module is on the classpath alongside an OTel SDK, LangChain4j automatically creates spans for each LLM call, embedding attributes from the OpenTelemetry Semantic Conventions for Generative AI Systems (draft spec).

Each span captures:

gen_ai.system — The LLM provider (e.g., openai)
gen_ai.request.model — The model name used
gen_ai.request.max_tokens — Max tokens configured
gen_ai.usage.input_tokens — Actual input tokens consumed
gen_ai.usage.output_tokens — Actual output tokens generated
gen_ai.request.temperature — Temperature setting

For Spring Boot, adding the OTel Spring Boot starter alongside the LangChain4j OTel module is sufficient for automatic instrumentation:

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-open-telemetry</artifactId>
</dependency>
<dependency>
    <groupId>io.opentelemetry.instrumentation</groupId>
    <artifactId>opentelemetry-spring-boot-starter</artifactId>
</dependency>

With these in place, every LLM call appears as a span in your Jaeger, Zipkin, Grafana Tempo, or any OTLP-compatible backend — showing latency distribution across providers and models, token usage trends, and which AI services are called in which order within a user request. This is critical for diagnosing slow AI paths in production without guessing.

Which OpenTelemetry attribute records the number of output tokens consumed in a LangChain4j LLM span?gen_ai.tokens.output

✗ Try again — the correct attribute name follows the OTel semantic conventions: gen_ai.usage.output_tokens.

gen_ai.usage.output_tokens

✓ Well done — gen_ai.usage.output_tokens is the OTel Semantic Conventions attribute for tracking LLM output token consumption per request.

langchain4j.response.tokens

✗ Try again — LangChain4j does not use a custom attribute namespace. It follows the OTel gen_ai.* semantic conventions.

What is needed beyond adding langchain4j-open-telemetry to enable automatic LLM tracing in a Spring Boot application?Manual span creation around each AI Services method call

✗ Try again — that is exactly what the OTel integration avoids. Automatic instrumentation creates spans without any manual code.

Adding the OpenTelemetry Spring Boot starter alongside the LangChain4j OTel module

✓ Well done — both the LangChain4j OTel module and the OTel Spring Boot starter are needed for fully automatic zero-code LLM call tracing.

Implementing a custom ChatLanguageModel wrapper that records spans

✗ Try again — custom wrappers were the pre-OTel approach. The langchain4j-open-telemetry module handles this automatically.

33. What is the InMemoryEmbeddingStore and when should you migrate to a real vector database?

InMemoryEmbeddingStore is LangChain4j's simplest EmbeddingStore implementation: it holds all embeddings in a Java List in heap memory, performs linear scan (brute-force cosine similarity) for similarity search, and has zero external dependencies. It ships in the core module with no additional Maven dependency.

// Zero setup — ready to use in any test or prototype
EmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();

// Serialize to JSON file for lightweight persistence
String json = store.serializeToJson();
Files.writeString(Path.of("embeddings.json"), json);

// Deserialize on next startup
EmbeddingStore<TextSegment> restored =
    InMemoryEmbeddingStore.fromJson(Files.readString(Path.of("embeddings.json")));

It does support basic JSON file persistence via serializeToJson() and fromJson(), so for truly small corpora it can survive restarts — but it is still a single-file, single-node solution.

You should migrate to a real vector database (PgVector, Qdrant, Pinecone, etc.) when any of these conditions are true:

Scale — More than ~50,000 document chunks. Linear scan becomes visibly slow (~100ms+) at this scale versus ANN index millisecond queries
Filtering — You need metadata-filtered similarity search (find documents by author AND semantic similarity). InMemoryEmbeddingStore has no filtering support
Persistence — Multiple pods that need to share the same embeddings. A JSON file cannot serve multiple instances
Updates — Frequent document additions or deletions. Rebuilding the in-memory store from scratch is expensive for large corpora
Disaster recovery — If re-embedding your entire corpus on every restart takes more than seconds, the file-based approach is too fragile

What search algorithm does InMemoryEmbeddingStore use for similarity queries?Linear scan — it compares the query vector against every stored embedding using cosine similarity

✓ Well done — brute-force linear scan is simple and exact but O(n) per query, making it impractical for large corpora.

HNSW (Hierarchical Navigable Small World) approximate nearest neighbor index

✗ Try again — HNSW is used by dedicated vector databases like Qdrant and Weaviate. InMemoryEmbeddingStore uses simple linear scan.

Inverted index with TF-IDF weighting

✗ Try again — TF-IDF is a keyword search technique. EmbeddingStore uses vector cosine similarity, not keyword matching.

Which InMemoryEmbeddingStore feature provides basic survival across restarts without migrating to a real vector database?Automatic periodic snapshotting to an S3 bucket

✗ Try again — InMemoryEmbeddingStore has no S3 integration. The built-in persistence method is serializeToJson()/fromJson().

serializeToJson() / fromJson() — saves and loads the entire store as a JSON file

✓ Well done — the JSON serialization methods provide lightweight file-based persistence for small corpora that need to survive restarts.

JPA integration via @Entity annotation on the TextSegment class

✗ Try again — InMemoryEmbeddingStore has no JPA integration. JSON serialization is the only built-in persistence mechanism.

34. What are common LangChain4j anti-patterns to avoid in production applications?

As LangChain4j adoption has grown, several recurring mistakes in production deployments have emerged. Knowing these saves debugging time and prevents costly incidents.

1. Creating ChatLanguageModel or AI Services as request-scoped beans. These are expensive to initialize (TCP connections, key validation, token counting setup). They must be singletons — one instance per application lifecycle, not one per request.

2. Using InMemoryEmbeddingStore in production. Linear scan becomes unacceptably slow above ~50,000 chunks, there is no filtering support, and multiple pods cannot share it. Switch to PgVector or a managed vector DB before going live.

3. Not configuring timeouts. LLM API calls can stall for 60+ seconds. Without a .timeout(Duration.ofSeconds(30)) on the model builder, a hung upstream provider will exhaust your thread pool in a synchronous Spring MVC application.

4. Logging full prompts in production. System messages often contain proprietary business logic. User messages may contain PII. Log only token counts and model names by default; log full prompts only at DEBUG with PII scrubbing.

5. Ignoring ModerationException in safety-critical applications. If you enable @Moderate, surround every AI Services call with a try-catch for ModerationException and return a safe fallback. Uncaught exceptions surface as 500 errors.

6. Embedding the same corpus on every application startup. The ingestion pipeline (load → split → embed → store) should run once and persist results. Re-embedding on startup wastes API budget and delays readiness for large corpora.

7. Hardcoding model names as String literals. Use the constants provided by each provider module (e.g., OpenAiChatModelName.GPT_4_O) so model upgrades are refactor-friendly and typos are caught at compile time.

What is the consequence of creating ChatLanguageModel as a Spring request-scoped bean?A new instance with its own connection overhead is created on every HTTP request, causing severe performance degradation

✓ Well done — ChatLanguageModel initialization is expensive. Creating it per-request is a classic resource leak anti-pattern in LangChain4j applications.

Spring will refuse to inject it because ChatLanguageModel is not serializable

✗ Try again — Spring does not require beans to be serializable. The issue is performance overhead from per-request initialization.

Request-scoped beans are actually preferred for thread safety — singleton LLM clients are not thread-safe

✗ Try again — LangChain4j model clients are thread-safe singletons by design. Request scope adds unnecessary overhead without any safety benefit.

What is the recommended way to reference model names in LangChain4j to avoid typos and ease future upgrades?Load model names from application.properties so they can be changed without recompiling

✗ Try again — externalized configuration is valid for operator-configurable settings, but using typed constants provides compile-time safety that strings cannot.

Use provider module constants like OpenAiChatModelName.GPT_4_O for compile-time safety

✓ Well done — typed constants eliminate silent typos and make model upgrades refactor-friendly with IDE tooling support.

Store model names in a database table so they can be updated at runtime without redeployment

✗ Try again — dynamic model switching at runtime is an edge-case pattern. Typed constants in code are the baseline best practice for safety and maintainability.

35. How does LangChain4j support multi-modal input processing for audio or documents beyond text and images?

Beyond text and image inputs, some LLM providers support audio transcription and document (PDF) understanding as native model inputs. LangChain4j exposes these through additional Content types in the UserMessage builder, following the same pattern as ImageContent.

Audio input — For providers that support audio understanding (like OpenAI GPT-4o Audio or Google Gemini), AudioContent wraps a base64-encoded audio clip with a MIME type:

byte[] audioBytes = Files.readAllBytes(Path.of("customer-call.mp3"));
String base64Audio = Base64.getEncoder().encodeToString(audioBytes);

UserMessage message = UserMessage.from(
    AudioContent.from(base64Audio, "audio/mp3"),
    TextContent.from("Summarize the key complaints in this customer call recording.")
);

PDF / document input — Some providers (Anthropic Claude, Gemini) accept raw PDF bytes as input, allowing the model to read and understand the document structure natively rather than extracting text first:

byte[] pdfBytes = Files.readAllBytes(Path.of("contract.pdf"));
String base64Pdf = Base64.getEncoder().encodeToString(pdfBytes);

UserMessage message = UserMessage.from(
    TextContent.from("Identify all payment terms in this contract."),
    PdfFileContent.from(base64Pdf)  // provider-specific support required
);

Important caveat: multi-modal support beyond text and images is provider-specific. Before using AudioContent or PdfFileContent, verify that your configured model and LangChain4j provider module version support it. Using these content types with a model that does not support them results in an API error from the provider. Always check the LangChain4j integration page for your provider for the current supported content types.

What Java class wraps a Base64-encoded audio clip for inclusion in a LangChain4j multi-modal UserMessage?AudioContent

✓ Well done — AudioContent.from(base64, mimeType) is the LangChain4j content type for including audio data alongside text in a user message.

MediaContent with type=AUDIO

✗ Try again — there is no generic MediaContent class in LangChain4j. Audio uses the specific AudioContent type.

BinaryContent with contentType set to audio/mp3

✗ Try again — BinaryContent is not a LangChain4j content type. Use AudioContent for audio data.

What happens if you use PdfFileContent with a provider that doesn't natively support PDF document input?LangChain4j automatically falls back to text extraction from the PDF

✗ Try again — LangChain4j does not silently fall back. Unsupported content types result in an API error from the provider.

The provider returns an API error — you must verify provider support before using provider-specific content types

✓ Well done — multi-modal content types are provider-specific. No automatic fallback exists; the provider simply rejects unsupported content types.

LangChain4j throws a compile-time error indicating the model does not support PDF

✗ Try again — there is no compile-time model capability check. The error surfaces at runtime when the provider API rejects the request.

36. How do you implement a custom Tool with complex parameter types in LangChain4j?

LangChain4j tools support complex parameter types beyond simple strings and primitives. When a tool method accepts a custom POJO, enum, or collection, LangChain4j automatically generates a JSON schema from the parameter type and includes it in the tool specification sent to the LLM. The model uses this schema to understand what JSON structure it should produce for the tool call arguments, and LangChain4j deserializes them via Jackson before invoking the method.

// Enum parameter
enum Priority { LOW, MEDIUM, HIGH, CRITICAL }

// Complex POJO parameter
record TaskFilter(
    String assignee,
    Priority minPriority,
    @P("Filter to tasks due before this date (ISO-8601)") String dueBefore,
    boolean includeCompleted
) {}

class ProjectTools {

    @Tool("Search project tasks by multiple filter criteria")
    List<Task> searchTasks(
        @P("Filter criteria for the task search") TaskFilter filter
    ) {
        return taskRepository.search(
            filter.assignee(),
            filter.minPriority(),
            LocalDate.parse(filter.dueBefore()),
            filter.includeCompleted()
        );
    }

    @Tool("Update the priority of a specific task")
    void updatePriority(
        @P("Task ID to update") String taskId,
        @P("New priority level") Priority newPriority
    ) {
        taskRepository.updatePriority(taskId, newPriority);
    }
}

The LLM sees the fully expanded JSON schema for TaskFilter including field types and the @P descriptions. Good @P descriptions on nested fields are critical — without them the model may misinterpret the date format, the priority semantics, or which fields are required vs. optional. The return type of tool methods is also automatically serialized to JSON before being added to the conversation as a tool result.

How does LangChain4j communicate complex POJO tool parameter schemas to the LLM?It automatically generates a JSON schema from the Java type and includes it in the tool specification sent with every LLM request

✓ Well done — automatic JSON schema generation from Java types is how LangChain4j communicates complex tool parameter structures to the model.

It sends the POJO's fully qualified class name and the LLM generates matching JSON from its training knowledge

✗ Try again — the LLM has no knowledge of your custom classes. LangChain4j sends an explicit JSON schema derived from the Java type.

Complex POJOs are not supported — only String and primitive parameters are allowed in @Tool methods

✗ Try again — LangChain4j fully supports complex types including nested POJOs, enums, and collections as tool parameters.

Why are detailed @P descriptions on nested POJO fields particularly important in complex tool definitions?@P descriptions replace Jackson deserialization and directly map JSON to Java fields

✗ Try again — @P provides natural language descriptions to the LLM; Jackson still handles the actual JSON-to-Java deserialization.

Without them the LLM may misinterpret field semantics — e.g., date formats, enum meanings, or which fields are required vs optional

✓ Well done — @P descriptions are the LLM's only guide to field semantics beyond the field name and type. Poor descriptions lead to incorrect tool invocation arguments.

@P descriptions are only used for Javadoc generation; they do not affect LLM behavior

✗ Try again — @P descriptions are included in the tool schema sent to the LLM and directly influence how the model populates tool arguments.

37. What is the HypotheticalDocumentEmbedder (HyDE) technique and how does LangChain4j support it?

HyDE (Hypothetical Document Embedder) is a query enhancement technique for RAG that improves retrieval quality by addressing a fundamental mismatch: the user's question is short and query-like, while the stored documents are long and answer-like. Embedding a question and a document paragraph in the same vector space often produces sub-optimal similarity scores because their styles differ.

The HyDE solution: before embedding the user's query, ask the LLM to generate a hypothetical document that would answer the question — essentially a plausible answer written in the style of the stored documents. Then embed this hypothetical document instead of the question. The resulting vector is much more similar to real matching documents.

// Custom HyDE QueryTransformer
class HydeQueryTransformer implements QueryTransformer {
    private final ChatLanguageModel languageModel;

    @Override
    public Collection<Query> transform(Query originalQuery) {
        String hypothetical = languageModel.generate(
            "Write a short paragraph that would answer this question: "
            + originalQuery.text()
        );
        return List.of(Query.from(hypothetical));
    }
}

// Wire into the RAG pipeline
RetrievalAugmentor augmentor = DefaultRetrievalAugmentor.builder()
    .queryTransformer(new HydeQueryTransformer(chatModel))
    .contentRetriever(EmbeddingStoreContentRetriever.from(store))
    .build();

Assistant assistant = AiServices.builder(Assistant.class)
    .chatLanguageModel(chatModel)
    .retrievalAugmentor(augmentor)
    .build();

HyDE adds one additional LLM call per user query (to generate the hypothetical), which increases latency and cost. It is most effective for complex technical queries against document corpora where direct question embedding produces poor recall. Simpler query rewriting (compressing conversation context into a standalone question) is usually the better default trade-off.

What does the HyDE technique embed when performing similarity search instead of the original user query?A summarized version of the user query compressed to 10 words

✗ Try again — HyDE does not compress the query. It generates a full hypothetical answer paragraph and embeds that instead of the question.

An LLM-generated hypothetical document that would answer the question — its embedding is more similar to real matching documents

✓ Well done — HyDE bridges the style gap between questions and documents by embedding a document-style hypothetical answer rather than the question itself.

A keyword extraction of the query passed through a TF-IDF vectorizer

✗ Try again — HyDE uses semantic embedding of an LLM-generated text, not keyword extraction.

What is the primary cost trade-off of using HyDE in a LangChain4j RAG pipeline?HyDE doubles the vector store size by storing two embeddings per chunk

✗ Try again — HyDE affects the query path only, not the stored embeddings. The cost is one additional LLM call per user query.

One additional LLM call per user query to generate the hypothetical document — increasing latency and cost

✓ Well done — each user query triggers an extra LLM call for hypothesis generation before retrieval, which is the trade-off for improved retrieval quality.

HyDE requires a separate fine-tuned model that must be deployed alongside the main LLM

✗ Try again — HyDE uses the same existing ChatLanguageModel to generate the hypothetical. No separate fine-tuned model is needed.

38. How do you handle LLM output parsing failures gracefully in LangChain4j?

When LangChain4j requests structured output (returning a POJO from an AI Services method), the LLM occasionally produces malformed JSON despite format instructions — especially with smaller models or complex schemas. Without explicit error handling, this surfaces as a OutputParsingException or JsonParseException from Jackson. Graceful handling is critical for production reliability.

There are three layers where you can handle parsing failures:

1. Return Optional to signal missing/failed results:

interface ReviewExtractor {
    Optional<ProductReview> extractReview(String rawText);
}
// Returns Optional.empty() if parsing fails (safer than exception-based control flow)

2. Catch OutputParsingException at the call site and fall back:

try {
    ProductReview review = extractor.extractReview(text);
    return review;
} catch (OutputParsingException e) {
    log.warn("Failed to parse review structure: {}. Falling back to raw text.", e.getMessage());
    return ProductReview.unparsed(text); // your fallback model
}

3. Retry with an explicit correction prompt:

@Retryable(retryFor = OutputParsingException.class, maxAttempts = 2)
ProductReview extractWithRetry(String text) {
    return extractor.extractReview(text);
}

Reducing parsing failures proactively:

Use providers with native JSON mode (OpenAI's response_format: json_object) — configure via OpenAiChatModelName and set responseFormat on the model builder
Add few-shot examples of correct JSON structure in the system message
Use simpler schemas — fewer fields, no deeply nested objects, enums instead of free-text strings for constrained values
Use a more capable model for extraction tasks where schema adherence is critical

Which return type change makes a LangChain4j AI Services extraction method signal parsing failure without throwing an exception?Change the return type from ProductReview to Optional<ProductReview>

✓ Well done — Optional return types signal absence without exception-based flow control, which is cleaner for partial success scenarios.

Change the return type to String and parse manually

✗ Try again — returning String reverts to manual parsing, eliminating LangChain4j's auto-extraction benefit.

Annotate the method with @SuppressOutputErrors

✗ Try again — @SuppressOutputErrors is not a LangChain4j annotation. Optional is the idiomatic signal for potentially absent results.

Which provider-level configuration reduces structured output parsing failures in LangChain4j?Setting maxTokens to match the expected JSON response size exactly

✗ Try again — truncating tokens can actually cause more malformed JSON. Provider JSON mode is the effective solution.

Enabling OpenAI's native JSON mode via responseFormat on the model builder, which constrains the model to valid JSON

✓ Well done — native JSON mode forces the model to produce syntactically valid JSON, dramatically reducing parsing failures compared to prompt-instruction-only approaches.

Using a lower temperature (0.0) which always produces valid JSON

✗ Try again — temperature 0.0 reduces randomness but does not guarantee JSON validity. Native JSON mode is the reliable solution.

39. What is LangChain4j's support for graph-based RAG or knowledge graph integration?

Standard vector similarity RAG retrieves semantically similar text chunks, but it struggles with multi-hop reasoning — questions like "What are all the direct reports of the manager of the product that had the most returns in Q3?" require traversing multiple relationships, not just finding similar text. Graph-based RAG addresses this by integrating a knowledge graph (like Neo4j) as a content retriever alongside or instead of a vector store.

LangChain4j supports this through the ContentRetriever abstraction. You can implement a Neo4jContentRetriever (or similar) that translates the user's natural language query into a Cypher query using the LLM, executes it against Neo4j, and returns the structured results as text for context injection:

class Neo4jContentRetriever implements ContentRetriever {
    private final Driver neo4jDriver;
    private final ChatLanguageModel queryGeneratorModel;

    @Override
    public List<Content> retrieve(Query query) {
        // Step 1: LLM generates Cypher from natural language
        String cypher = queryGeneratorModel.generate(
            "Convert this to a Cypher query: " + query.text()
        );

        // Step 2: Execute against Neo4j
        try (Session session = neo4jDriver.session()) {
            Result result = session.run(cypher);
            String resultText = result.list().toString();
            return List.of(Content.from(resultText));
        }
    }
}

// Use alongside vector retrieval
RetrievalAugmentor augmentor = DefaultRetrievalAugmentor.builder()
    .contentRetriever(new Neo4jContentRetriever(driver, chatModel))
    .build();

The pattern is often called "GraphRAG" or "Text2Cypher RAG". For production, add query validation (reject Cypher that includes WRITE operations), result size limits, and retry logic for LLM-generated invalid Cypher. LangChain4j's modular ContentRetriever design makes this a clean extension point — no framework modification required.

What problem does graph-based RAG solve that standard vector similarity RAG cannot handle well?Graph RAG stores embeddings faster than vector databases

✗ Try again — storage speed is not the issue. Graph RAG addresses multi-hop reasoning across entity relationships, which vector similarity cannot do.

Multi-hop reasoning queries that require traversing multiple relationships between entities

✓ Well done — graph traversal naturally handles questions like "find all direct reports of the manager of X" which require multi-step relationship following.

Semantic search across documents written in multiple languages simultaneously

✗ Try again — multilingual search is a vector embedding concern. Graph RAG is about traversing entity relationships, not language handling.

What security risk must you guard against in an LLM-to-Cypher RAG implementation?The LLM may generate embeddings with incorrect dimensionality

✗ Try again — dimensionality is an embedding concern. The security risk in Text2Cypher is the LLM generating WRITE operations (CREATE, DELETE) instead of read-only MATCH queries.

The LLM may generate Cypher WRITE operations (CREATE, MERGE, DELETE) instead of read-only MATCH queries, allowing data modification

✓ Well done — always validate generated Cypher to reject write operations. A compromised prompt could instruct the model to produce database-modifying queries.

Neo4j driver credentials could be exposed in the generated Cypher output

✗ Try again — Cypher queries do not contain credentials. The risk is write-operation injection via adversarial prompt engineering.

40. What is the LangChain4j EvaluationResult API and how do you measure RAG pipeline quality?

RAG pipeline quality is notoriously hard to measure because "good retrieval" and "good answers" are context-dependent and partially subjective. LangChain4j does not provide a built-in RAG evaluation framework, but the ecosystem approach involves using LLMs themselves as evaluators (LLM-as-judge) combined with ground-truth question-answer test sets.

The standard evaluation dimensions for RAG systems are:

RAG Evaluation Metrics
Metric	What It Measures	How to Compute
Context Recall	Were the relevant documents retrieved?	Compare retrieved chunks vs. ground-truth relevant docs
Context Precision	What fraction of retrieved docs are actually relevant?	LLM-as-judge scores each retrieved chunk for relevance
Answer Faithfulness	Is the answer grounded in the retrieved context?	LLM judge checks if every claim in answer appears in context
Answer Relevance	Does the answer address the question?	LLM judge rates how directly the answer responds to the query

A practical evaluation approach in LangChain4j:

record EvalCase(String question, String groundTruthAnswer, List<String> relevantDocIds) {}

interface RagEvaluator {
    @SystemMessage("You are a factual accuracy judge. Rate 0-10.")
    @UserMessage("Question: {{question}}\nGenerated Answer: {{answer}}\nContext: {{context}}")
    int rateAnswerFaithfulness(String question, String answer, String context);
}

// Run evaluation on a test set
for (EvalCase testCase : testCases) {
    String generatedAnswer = ragAssistant.answer(testCase.question());
    List<Content> retrieved = contentRetriever.retrieve(Query.from(testCase.question()));
    int score = evaluator.rateAnswerFaithfulness(testCase.question(), generatedAnswer, retrieved.toString());
    // Aggregate scores across test cases
}

For more comprehensive RAG evaluation, integrate LangChain4j with Python-based frameworks like RAGAS or DeepEval via their REST APIs, or use Azure AI Studio's evaluation workflows which support Java-generated answer datasets.

What does Answer Faithfulness measure in a RAG pipeline evaluation?Whether the answer is grammatically correct and well-formed

✗ Try again — grammar quality is a different concern. Faithfulness measures whether every claim in the answer is grounded in the retrieved context (no hallucination).

Whether every factual claim in the generated answer is supported by the retrieved context documents

✓ Well done — faithfulness specifically detects hallucination: claims in the answer that have no basis in the retrieved context are faithfulness failures.

Whether the user's question was correctly understood by the retrieval component

✗ Try again — query understanding is part of context recall/precision. Faithfulness evaluates answer-to-context alignment specifically.

What evaluation approach does LangChain4j enable for RAG pipelines without a separate evaluation framework?Automatic unit tests via @RagTest annotations on AI Services interfaces

✗ Try again — @RagTest is not a LangChain4j annotation. LLM-as-judge using an AI Services evaluator interface is the practical approach.

LLM-as-judge: define a separate AI Services interface that rates answer quality on a test set, using the LLM itself as the evaluator

✓ Well done — LLM-as-judge is pragmatic and fits LangChain4j's AI Services model naturally: a rating interface with judge-role system prompts scores generated answers.

BLEU/ROUGE score comparison using LangChain4j's built-in TextSimilarity utility

✗ Try again — BLEU/ROUGE are n-gram overlap metrics poorly suited for open-ended LLM evaluation. LangChain4j does not have a built-in TextSimilarity utility.

Database

	Interviews Questions Java Spring Hibernate Maven Testing API BigData Web DataStructures AI Database MuleESB Cloud Scala Tools	About Javapedia.net Javapedia.net is for Java and J2EE developers, technologist and college students who prepare of interview. Also this site includes many practical examples. This site is developed using J2EE technologies by Steve Antony, a senior Developer/lead at one of the logistics based company.
	contact: javatutorials2016[at]gmail[dot]com
Kindly consider donating for maintaining this website. Thanks.
	Copyright © 2026, javapedia.net, all rights reserved. privacy policy.

AI / LangChain4j interview questions

Comments & Discussions

Recently added...