Ask a question to IRS.gov

Using RAG to Talk to Drupal Site Content

What is RAG?

RAG stands for Retrieval-Augmented Generation, a system designed to improve large language models (LLMs). LLMs generate text based on an input. Most models are trained on a general dataset, meaning an LLM can never really know anything except what it has been trained on. Their are two main solutions to this issue, train on top of the model with new data (fine-tuning) or to feed the knowledge as input to the LLM (RAG). The LLM (generator) is augmented by retrieving data from a knowledge base.

RAG can be summarized to 4 steps:

Vector embed the content
Vector embed the user query
Get the closest vectors to the user query
Feed the LLM the related content with the user's query

We will be implementing RAG in Drupal in order to "talk" to the content inside. This can be very helpful for LLM site assistants that need lots of context in order to be useful. It can also be useful for general semantic search as that is basically the first half of RAG.

Limitations

Since we have opted for a fully self hosted implementation this means we are limited in options for LLM models. The GPU in our current server can support at max an 8B model (8 billion parameter).
To properly timebox ourselves we stuck to a more simplistic implementation of content embedding without any form of chunking (breaking the content into semantically related pieces).

Implementation

To get full RAG this project implements:

LLM Host (Ollama)
- I have self-hosted Ollama on a server with a dedicated GPU (Nvidia RTX 3080 10GB)
ChromaDB (ChromaDB)
- There is no official PHP API for accessing ChromaDB so I will be using their recommended Python API and giving Drupal access through a REST API using FastAPI (A Python library for building REST API’s).
Drupal AI Contrib Module (Drupal AI)
- This module is the official drupal module combining several important AI modules for other plugins to use.
Custom Drupal Module
- Our custom Drupal module ties the above items together to get fully implement RAG
- Each pages's text gets sent to ChromaDB to be embedded and stored.
- User queries get embedded and compared to documents in ChromaDB to get related content
- User Query is fed to LLM along with contextual documents

Testing

Our implementation of RAG performs quite exceptionally given the limitations of our system. To judge the quality of our implementation, I will be rating the answers of the LLM on 2 criteria (on a scale of 0 - 5):

Quality: Does the response clearly answer the question?
Hallucination: Does the response make up the answer?

QUESTION #1: What types of insurance policies do you offer?

This is a very broad question where several pages could be relevant context.

ANSWER: Main Street America insurance provides a variety of insurance policies to meet the needs of our clients. In addition to personal auto coverage, we also offer homeowners, renters, commercial, and life insurance options. Our independent insurance agents can assist you in choosing the best combination of coverage and service for your unique situation. Some of the coverages we provide include property, liability, workers' compensation, umbrella, disability income, long-term care, health, dental, vision, and annuities. Contact us today to learn more about our policies and find an agent near you.
RATINGS:

Quality: 4
- We are immediately given the list of possible insurance policies.
- It reads a bit like an advertisement instead of an answer.
Hallucination: 2
- Toward the end of the response we get some made-up coverages, I cannot find annuities in any of the context documents

QUESTION #2: What types of surety bonds are there?

This question dives into specific content on a page, meant to see how well the LLM parses through contexts.

ANSWER: 1. Notary Public Bond: Required for notaries to legally notarize documents in some states. 2. Business Services Bond: Also known as a third-party dishonesty bond, this is necessary for businesses that perform services inside other people's properties or premises. 3. ERISA Bond: Mandatory for all employee benefit plans under the Employee Retirement Income Security Act (ERISA). Protects against losses caused by fraud or dishonesty on part of plan officials or fiduciaries. 4. License and Permit Bonds: Required for various types of licenses or permits in some cities, counties or states. 5. Contractor Bonds: Usually needed for contractors working on a new job in a different city or state to ensure compliance with the local laws. 6. Probate Bond: May be required by a court for individuals appointed as an administrator, conservator, or guardian of minors or incapacitated adults.
RATINGS:

Quality: 5
- No faults here, direct answer describing each policy
Hallucination: 3
- The response was pretty hit & miss here with some bonds either being made up or being unrelated. There was a specific document in mind with this question and while it is always the most relevant document there are always a few that pertain to bonds that mix in with the context given to the LLM

QUESTION #3: Are drunk workers covered by my workers comp?

Another very specific question where it is directly answered in a page in which the answer is NO it is not covered.

ANSWER #1: Workers compensation insurance typically covers employees who are injured on the job, including those who become impaired by alcohol or drugs. If a worker gets into an accident or suffers injuries while under the influence of alcohol or other substances, workers comp should cover their medical expenses and lost wages. However, it's important to note that policies vary by state and coverage limits, so it's always best to consult with your insurance agent to understand the specific details of your policy. In some cases, certain industries or job types may have additional restrictions or exemptions related to intoxication while on the job, so it's important to fully understand your policy and any potential limitations or exclusions.
ANSWER #2: No, drunk employees may not be covered under workers compensation insurance for injuries sustained as a result of being intoxicated at work. While workers comp typically covers on-the-job accidents and injuries, intoxication is generally considered a voluntary act by the employee and is therefore excluded from coverage. However, in certain circumstances, such as when an employer knowingly served or allowed an employee to become intoxicated, there may be exceptions. It's best to consult with a legal or insurance professional for specific guidance regarding your company's policy and liability in these situations.
RATINGS (answer #1):

Quality: 2
- Was very general like it didn't know (and was completely wrong)
Hallucination: 5
- I have left an alternate response here to show that sometimes (it seems about 50/50) It will give the correct answer.
- Even when correct it seems to make up its justification for why they are not covered.

QUESTION #4: My house was marked as a flood zone, should I get flood insurance?

I have left this one a bit open-ended to see how much detail the LLM will fill in from other documents

ANSWER: Yes, if your house has been designated as being in a flood zone, you should definitely consider getting flood insurance. While standard homeowners' insurance policies typically don't cover damage caused by flooding, flood insurance is designed to provide protection against flood-related losses. This supplemental policy can help cover the costs of repairing or rebuilding your home, as well as replacing personal belongings and furnishings that may have been damaged by flooding. Your local independent insurance agent can provide you with more information about flood insurance and help you determine if it's the right coverage for your needs based on your specific location and risk level. It's also important to note that flood insurance is typically available through the National Flood Insurance Program (NFIP), which is administered by the Federal Emergency Management Agency (FEMA). As an NFIP policyholder, you'll have access to a network of qualified contractors and resources to help you recover from a flood-related loss. So if your house falls within a flood zone, taking out flood insurance should be a top priority in order to protect both your property and your peace of mind.
RATINGS (0-5):

Quality: 5
- Bravo! An immediate answer followed up by justification and information from the content.
Hallucination: 0
- Answers have been quite consistent.

Conclusion

Embedding is critical

How content is embedded plays a critical role in good RAG. In our current implementation the pages's entire content is a single document sent to ChromaDB. This is dead simple to implement but has the shortcoming of degrading the relevance of the content. A whole page likely covers several topics in a subject which can be unrelated to a users query about a specific topic. The solution to this is breaking up the page contents into semantically related pieces (called chunking). This solution also helps with the following issue about information overload as there is less context to parse.

Context amount needs to be tailored to the LLM

In our implementation I have limited the context fed to the LLM to 5 documents. That is, I have limited it to 5 WHOLE pages. These pages have a high relevancy but still, it appears the LLM has difficulty parsing good information from all this content. When experimenting with question #2 (surety bonds) I found that when limiting the relevancy to the point where only the surety bonds page was the only context answers were consistently perfect with no hallucination. I believe the solution to this (aside from chunking) would be something like a relevance difference filter. Say we get a very related document (0.7 relevance) and the next document is of less relevance (1.1 and greater), it is highly likely the information with really want is in the former document. If we remove the less related documents from the context we can prevent some hallucination / mixing of information.

Quality budget RAG is very possible!

The current (very basic) implementation already works quite well with a mid-range LLM (8 Billion parameter) on a relatively inexpensive machine. With the changes above implemented I believe perfection here can be achieved!

Extra Notes

In future iterations I would like to switch from Ollama to LM Studio (https://lmstudio.ai/) for its ability to use any Hugging Face models. There are several models that seem to be fine-tuned for RAG purposes, none of which are available in Ollama.

Ask a question to IRS.gov