I am rather new here, so please excuse basic questions or requests for clarification.
We are a small foundation and we operate an observatory providing information on political parties and their funding. We were recently granted a VPS as an in-kind donation and we are trying to use it to run a chatbot that would help answer users’ questions.
The idea is that, since the data can be complex, users could directly quiz the bot to get specific answers and, ideally, to draw charts based on this data.
For now we have AnythingLLM set up on the VPS and we are starting to play with it. Now we are trying to figure out what model can use (with an inference point, right?) to answer the queries.
Thanks for the replies and links, @John6666 . A couple of follow-up questions, then:
using the search link you provide in your first message, what is the next search criteria to find the right model? inference available is a requirement, right?
for the model you list in your second message, is there a free way to use it? I tried to input "unsloth/functiongemma-270m-it-GGUF” in AnythingLLM where the previous model’s was, but when I try the agent, it says "400 The requested model ‘unsloth/functiongemma-270m-it-GGUF’ is not supported by any provider you have enabled.”
You are on the right track Starting with a VPS and AnythingLLM is smart, especially for structured stuff like political funding. Honestly, clean and well-organized data matters way more than the model itself. AI won’t make charts on its own you just need a simple layer to turn questions into visuals. And since it’s political data, always show your sources to stay trustworthy. Later, you can explore setups like CustomGPT where AI sticks to verified info. For now, keep it simple, clear, and just keep improving you’ve got this.
Thanks @liam255 and sorry for the delayed reply – we were off for a bit. However, we are actually a bit stuck at the moment. Running our own LLM backend server locally for inference, as suggested by @John6666 , feels like a pretty tall order (and would most likely require a GPU, wouldn’t it?). Short of access to an API, what would be available options?
Can route by uptime/price/latency; supports tool-aware routing. (OpenRouter)
Excellent fallbacks and provider diversity; one endpoint for many models
Slight overhead vs direct-to-provider; pricing has an extra “platform fee” layer when buying credits. (OpenRouter)
What I would do in your exact situation
You said “low cost or free tier” and “OpenAI compatible endpoint” and you are building a RAG chatbot that reads from a database.
Phase 1: build and iterate cheaply
Groq for the chat model (fast and dev-friendly). (Groq Community)
Together or Fireworks for embeddings/rerank (because their OpenAI-compatible examples include embeddings, and Together explicitly covers embeddings + function calling patterns via OpenAI client). (Together.ai Docs)
If you want to avoid committing early, use HF router first, then pin a provider later. (Hugging Face)
Phase 2: cut unit cost once behavior is correct
Switch answer-synthesis to DeepInfra Llama 3.3 70B Turbo when you can tolerate slightly higher latency and want the lowest published $/token on that 70B tier.
Add caching/batching if your workflow has repetition (Fireworks makes these savings explicit).
Phase 3: harden reliability
Put OpenRouter (or HF router) in front when you need provider failover and routing policies (tools-aware routing, latency/price sorting).
Practical pitfalls and tips for RAG cost and quality
RAG is input-token heavy.
Most spend is “prompt tokens” because you inject retrieved chunks. This is why caching and reranking matter more than shaving output price.
Rerank before you stuff context.
Even a cheap reranker can cut your prompt size a lot. Together explicitly surfaces a rerank category in pricing, which is a hint they expect this pattern.
Use tool/function calling only when needed.
If your “database” is structured, function calling can route: interpret question → generate SQL → execute → summarize. Together’s OpenAI-compat docs show function calling patterns.
Prefer routers when you are unsure.
HF router gives pass-through pricing and small credits, and OpenRouter offers routing and fallbacks.
High-quality docs and references (directly relevant)