In the latest Dataiku GenAI Bootcamp webinar on “How to Get 100% Well-Formatted LLM Responses,” Vivien Tran-Thien, Director of AI Consulting at Dataiku, and Christina Hsiao, Director of Product Marketing, showcased the path to achieve such well-formatted LLM responses and addressed potential challenges that could arise amid the journey.
Implementing Formatting Constraints in LLMs
Vivien began by explaining four main categories of formatting constraints that are relevant for those wishing to generate well-formed, predictable content using LLMs:
- A predefined set of output options (e.g. “1”, “2”, “3”, “4”, and “5”)
- Regular expressions or structured formats such as a specific JSON schema
- Formal grammars (such as syntactically-correct code)
- “Fill in the blank”-style templates
For AI engineers, the real-world benefits of enforcing these constraints include guaranteed validity of responses, effortless parsing, and enhanced reasoning.
He then unveiled the science behind formatting constraints and how they can amplify the reasoning abilities of LLMs, and outlined three primary approaches for structured text generation:
- Prompt engineering
- Supervised fine-tuning
- Constraint decoding
Vivien walked through the pros and cons of each approach. Prompt engineering is the most straightforward to implement; however, while you can explicitly state the formatting constraints in the prompt, it may not guarantee stringent compliance in the generated outputs.
By contrast, supervised fine-tuning uses properly formatted examples to retrain an LLM, yielding better compliance to a given task without compromising quality. However, fine-tuning requires considerable high-quality labeled training data, as well as specialized expertise and computational resources, and a fine-tuned model doesn’t generalize well.
Constraint decoding, which could be considered the most powerful of the three approaches, ensures 100% compliance even for complex constraints without the need for fine-tuning.
Delving Deeper: Constraint Decoding
Constraint decoding works by creating a filter — called a "mask" — that blocks out any words or tokens in the underlying token probability distribution that don’t fit the required criteria. For instance, if we need an LLM to generate a score from 1 to 5, constrained decoding uses a mask that only allows these specific numbers, excluding all other words. This ensures that the model’s response will comply 100% with the requirements, eliminating any chance of generating out-of-scope responses.
While constrained decoding requires some setup (especially with more complex rules), it doesn’t require additional training data and can speed up the generation process since it narrows down options for each word the model generates. The main limitation is that it relies on having access to an LLM provider that supports this feature through their API.
Potential Pitfalls and How to Overcome Them
Vivien also discussed potential pitfalls to avoid when using constraint decoding.
Pitfall 1: Independent Computation of Probability and Mask
LLMs operate by computing probabilities for potential tokens that may come next in a sentence. One of the challenges is that probability computation and mask computation happen independently. This decoupling may lead to incorrect responses when constraints are applied, as the mask computation may not accurately reflect the output derived from computations of probabilities.
To navigate this obstacle, Vivien suggests specifying the constraint in the prompt. This approach is enveloped in the concept of prompt engineering, enhancing the LLM's chances of conforming to the constraint.
Pitfall 2: Constraints Impacting Response Quality
Another pitfall identified by Vivien is the trade-off between enforcing constraints and maintaining response quality. The directives created to influence the behavior of LLMs can sometimes decrease the quality of responses. In such cases, Vivien advises splitting the task into two stages: first generating a response without the constraints, and then structuring the response in the anticipated format as a subsequent step.
This two-step approach will allow LLMs to focus specifically on generating a response and formatting it, reducing the likelihood of compromising the response's quality by trying to do both simultaneously.
Pitfall 3: Overzealous Constraints Limiting Useful Information
While constraints are essential in guiding LLMs to generate accurate and meaningful content, overly-restrictive constraints can hinder the model from providing useful information. To handle this issue, Vivien accentuates temporarily turning off the constraints. This step, accompanied by a little tweak in the prompt, would push the model into generating richer and more insightful responses.
Pitfall 4: Limitations of Constraint Decoding Solutions
Not all structured generation AI services are created equal. For example, each LLM provider covers different aspects of the JSON schema specification in its service. In instances where a particular model doesn't support a certain feature, it may lead to unsatisfactory outcomes. Vivien emphasizes the importance of thoroughly checking the documentation of the LLM provider or open-source solution targeted for use. By doing so, one can avoid unexpected failures and incompatibilities that might arise from gaps in the schema coverage.
Do It Now in Dataiku!
Vivien clearly underlined the importance of structured text generation techniques. Good news — you can get started right now in Dataiku! Dataiku provides seamless access to both top LLM services and self-hosted models, along with tools like function calling, JSON mode, and soon, structured output. Whether leveraging open-source tools or the latest APIs, these features allow you to turn LLM potential into structured, impactful content ready for enterprise-grade applications.