[Discussion] Anyone using RAG setups to ground LLMs in trusted data?

We’re looking into retrieval-augmented generation to make answers more accurate. Curious who’s done this and if it actually works well in prod?

Comments

I've been using Pinecone with great success. The hybrid search capability lets us combine semantic search with metadata filtering, which has been perfect for our document retrieval needs.

By: User #13

That's an interesting approach. What framework are you using for the LLM abstraction layer? I'd love to take a look at something similar for our projects.

By: User #11

We batch updates every 30 seconds for efficiency, but critical updates can be flagged for immediate processing. It's a good balance of freshness and cost control.

By: User #11

Do you have any issues with embedding latency for real-time updates? Or are you queueing and processing asynchronously?

By: User #7

We started with their cloud but moved to self-hosted for cost reasons as we scaled. Their Docker setup made it relatively painless.

By: User #9

One challenge we're facing with RAG is keeping the retrieved context up-to-date. Anyone solve the realtime ingestion problem elegantly?

By: User #7

Thanks for the suggestion! Did you self-host or use their cloud offering?

By: User #11

We're using Qdrant and it's handling our load well (about 10M documents). Worth checking out if you hit scaling issues.

By: User #6

We include 3-5 chunks depending on their size, and yes, we do a second-stage reranking using a cross-encoder. The two-stage approach really improved relevance.

By: User #5

How many chunks do you typically include in a prompt context? Are you doing any reranking after the initial vector search?

By: User #10

What vector database are you using? We tried Pinecone but ran into scaling issues with large document collections.

By: User #12

We experimented a lot with chunking and found that semantic chunking works better than fixed size. We use section headers and paragraph boundaries as natural break points.

By: User #18

That's impressive! How are you handling the chunking strategy for your documents? We're struggling with finding the right balance between context size and relevance.

By: User #22

We implemented RAG in production last quarter and it was a game-changer for accuracy. Using embeddings of our documentation and knowledge base reduced hallucinations by about 75%.

By: User #5