[Discussion] Anyone using RAG setups to ground LLMs in trusted data?
By Noah ChenPosted: 27/04/2025
Tags: RAG, Vector Databases, Knowledge Base
We’re looking into retrieval-augmented generation to make answers more accurate. Curious who’s done this and if it actually works well in prod?
Upvotes: 60
Downvotes: 0
Comments: 15
Comments
I've been using Pinecone with great success. The hybrid search capability lets us combine semantic search with metadata filtering, which has been perfect for our document retrieval needs.
By: User #13
That's an interesting approach. What framework are you using for the LLM abstraction layer? I'd love to take a look at something similar for our projects.
By: User #11
We batch updates every 30 seconds for efficiency, but critical updates can be flagged for immediate processing. It's a good balance of freshness and cost control.
By: User #11
Do you have any issues with embedding latency for real-time updates? Or are you queueing and processing asynchronously?
By: User #7
We started with their cloud but moved to self-hosted for cost reasons as we scaled. Their Docker setup made it relatively painless.
By: User #9
One challenge we're facing with RAG is keeping the retrieved context up-to-date. Anyone solve the realtime ingestion problem elegantly?
By: User #7
Thanks for the suggestion! Did you self-host or use their cloud offering?
By: User #11
We're using Qdrant and it's handling our load well (about 10M documents). Worth checking out if you hit scaling issues.
By: User #6
We include 3-5 chunks depending on their size, and yes, we do a second-stage reranking using a cross-encoder. The two-stage approach really improved relevance.
By: User #5
How many chunks do you typically include in a prompt context? Are you doing any reranking after the initial vector search?
By: User #10
What vector database are you using? We tried Pinecone but ran into scaling issues with large document collections.
By: User #12
We experimented a lot with chunking and found that semantic chunking works better than fixed size. We use section headers and paragraph boundaries as natural break points.
By: User #18
That's impressive! How are you handling the chunking strategy for your documents? We're struggling with finding the right balance between context size and relevance.
By: User #22
We implemented RAG in production last quarter and it was a game-changer for accuracy. Using embeddings of our documentation and knowledge base reduced hallucinations by about 75%.
Comments