Buckle up, the AI experiments continue. Sometimes AI struggle with something that is rather obvious, so now I want to extend AI with tools. In this part, we’ll be looking at Model Context Protocol (MCP). We’ll cover the features (tools, resources, prompts, roots, sampling), how it works, how to communicate with it (both locally and remotely). Building upon the simple Mistral AI CLI tool, we’ll extend it with a python weather MCP server, so it can answer weather questions. In the end, I’ll think back to the days of WSDL.
The AI is powerful, but it has some limitations that are rather frustrating to deal with. Consider the following (the app is self-hosted librechat, check footnotes for setup details1).
This answer is simple, authoritative and looks good. However, the result is actually 59 629 701. These tiny confident inaccuracies can quickly spiral out of control with production usage. If only there was a way to tell the AI - don’t be cocky, here you can access a calculator, use it!
AI Is Trained To Use Tools
The first thought I had was this one:
To me, this demonstrates the understanding of tool use. I was surprised to see it working at all, like what? In which text on reddit you’d find something like this?
Later I’ve found out that LLMs were specifically trained for tool use (even more in-depth research paper). After the initial training was done, there was another round of training that kept the original model somewhat intact and added these capabilities. This is a fascinating field of study, alas it’s not for me - I won’t even pretend to understand what’s going on. Good luck to the researchers and thank you for your work.
It’s also very fortunate that this capability is built into the APIs, including Mistral (named function calling), so you can just specify the tools available and the LLM will generate tool calls for which you’ll provide responses for LLM to process.
Model Context Protocol (MCP)
From another angle, there was a lot of buzz about the Model Context Protocol. It is an emerging de-facto standard of interacting with these LLMs. So of course, I’ll be trying it out.
The plan is simple - from the previous article, I’ve got a simple CLI tool to interact with the Mistral AI. I’d like to extend it with MCP capabilities. Fortunately, I am late to the party, which means that there are capable libraries already established.
The MCP uses the well-tested client-server architecture, the client is the LLM, the server provides extra functionality. Note, that the server is responsible for tool execution, the LLM only asks questions and gets answers. The messages are JSON-RPC 2.0 which just a particular way of describing functions and parameters. It supports batching though, which is nice.
The MCP supports more than just tools - you can use:
resources - read-only data available to LLMs,
prompts - parametrized prompt templates,
roots - note down things on which the LLM should focus to prevent unintended consequences,
sampling - sounds like a head scratcher, but the promise is simple. The MCP server can ask back the LLM questions to include in their answer. Couple of use-case ideas are outlined here.
However, many clients are lacking these features. At least for resources we can use a workaround - it’s basically a tool that returns a constant.
In the python world, we have an official MCP python SDK (mcp
package). It has a complicated relationship with FastMCP. FastMCP has mcp
as a dependency.
I think that FastMCP is more intuitive, less clunky and with a superior documentation. So I’ve opted to use FastMCP directly and I’ll recommend you to do the same. I am paying for this decision by not having the Streamable HTTP transport available (as of today, May the fourth be with you).
The MCP Server
This is the part that provides the executes the external tools. Using the FastMCP, a simple but useful MCP server looks something like this:
import os | |
import requests | |
from fastmcp import FastMCP | |
mcp_server = FastMCP(name="Example MCP Server") | |
WEATHER_API_KEY = os.environ.get("WEATHER_API_KEY") | |
@mcp_server.tool() | |
def weather(location: Annotated[str, Field(description="City (London) or ZIP (10001) or IATA (DXB) or coordinates (48.8567,2.3508) to get current weather for")]): | |
"""Get current weather for location specified by City (London) or ZIP (10001) or IATA (DXB) or coordinates (48.8567,2.3508)""" | |
if not WEATHER_API_KEY: | |
return "No weather API key provided" | |
return requests.get(f"https://api.weatherapi.com/v1/current.json?q={location}&key={WEATHER_API_KEY}").json().get("current", {}) | |
if __name__ == "__main__": | |
mcp_server.run() |
Of course, it’s for illustration. But I wanted to provide something working to demonstrate the concept. Shout out to WeatherAPI and their free tier.
I like the annotation approach - you need to tell the LLM how to call your functions, what parameters you need to provide and as well as some description to figure out how the tool can help LLMs with their tasks. Have we finally figured out how to force programmers to document their code?
The MCP server exposes several capabilities - you need to get the list of the available tools and then you need a way to call that tool. As how the tool is implemented, that’s out of scope of the MCP protocol. In our case, we have simple python functions. I’ll cover the protocol details in the client section.
We also have options when connecting to the MCP server. Let’s cover them briefly.
Local Connections - STDIO and In-memory
Let’s get the in-memory covered first, because it’s simple. Both the client and the server functionalities are within the same process, so they are just connected. Simple, doesn’t scale and it’s hard to extend.
As a shorthand of standard input/output, stdio connection really means running the server in a process and then exposing standard input and output to other programs.
This is very useful for integrated tools like Claude desktop and VS Code, where these servers can be running on the workstation and integrate with any tool on the same machine. Usually, the client starts the server as a child process. Which is not at all intuitive (client starts its own server, wtf?), but it’s the simplest way to setup.
Should you want to enable multiple clients to connect, it’s a fantastic rabbit hole to fall into. In linux, the easier methods are named pipes and terminal tools like screen or tmux. But you can also try inserting the text to a selected file that’s representing the process input file in the proc directory and then using obscure system calls. In windows, you can either use the official SDK (which was created by embracing, extending and extinguishing), or this or I don’t even.
Remote Connections - SSE, WebSockets and Streaming HTTP
How did a brand new emerging MCP dug up a semi-obscure spec of Server-Sent events from 2006 I can’t explain. It is based on HTTP, so let’s just fire up wireshark (see the pcap) and take a look at what’s up. A good intro can be found here.
To briefly cover it, you have one connection that needs to be long-lived, where you are receiving the responses. Then you issue various requests as POST requests.
I am covering it briefly, because it has problems - you need to have a really long-lived connection over the public internet, which is problematic.
To address this, people have turned to the web-sockets implementation, which is something more standard - the environment is ready, the developers have more experience, and so on.
Right now, we seem to be moving in the direction of “normal HTTP API calls” as proposed in the Streamable HTTP spec (merged on 24th March 2025). The implementations are somewhat lagging still, but we’ll get there. Cloudflare is already up though.
The MCP Client
We’ll need some glue code to put the pieces together client-side. These parts were particularly interesting for me.
MCP Server Tools to Mistral Tools
Mistral supports the function calls as they call it. However, the specifications are slightly misaligned. Never fear, a simple function transforming JSON A to JSON B is what we need:
from mcp.types import Tool | |
def tool_mcp_to_mistral(tool: Tool): | |
json_tool = vars(tool) | |
mistral_tool = dict() | |
# might as well throw an exception if we don't have a name and a description | |
mistral_tool["name"] = json_tool["name"] | |
mistral_tool["description"] = json_tool["description"] | |
# inputSchema maps to parameters, but there are optional fields | |
mistral_tool["parameters"] = dict() | |
mistral_tool["parameters"]["properties"] = json_tool["inputSchema"].get("properties", {}) | |
mistral_tool["parameters"]["required"] = json_tool["inputSchema"].get("required",[]) | |
return {"type": "function", "function": mistral_tool} |
Mistral Function Call Protocol
Mistral is very particular about what message types you can send to the chat completion endpoint regarding their types and order. It’s about ensuring that User messages, Assisstant messages and Tool response messages go in particular order.
This is described in the docs, just don’t be surprised if you’d like to try it and receive 400 errors from the API - tinker with the message order until it works.
Vibing Queues
Originally, I’ve written synchronous code. These libraries are however using async calls (which is the better design), so I had to implement queues. The plan is simple, we have a queue for messages from the user (input queue), we have a queue of messages to display to the user (output queue). Then the client can move all of the objects asynchronously internally for all I care.
I thought, this is something that can be vibe coded. And since vibe coding depends on the vibes, here is the photo setting the stage:
Sadly, after a few shots it still wasn’t working, and the errors were weird. I figured I have a better shot of understanding the queues and details. Oh well. But it did get me started, so I’d say it helped a bit.
My Thoughts - Remember SOAP?
First of all, it works!
My reaction was along the lines of: Great, I now only need to write down a bit of generic code and then we can use well-specified MCP servers to integrate with the LLM! Now I can do whatever!
Suddenly I’ve recalled a couple words from a lecture about Simple Object Access Protocol (SOAP). SOAP was supposed to do this - all of the APIs would be specified enough so that the application can just discover new functionality and start using it immediately. Or perhaps you could say that MCP is WSDL.
We now know it didn’t happen. REST APIs won for one reason or another. There are more fun reads about the topic. But I wonder, are we coming back to this? Have we learned our lesson that if we want to have our functionality described we need to describe our functionality? It remains to be seen.
Also, just providing a tiny bit of deterministic tooling to achieve our goals reminds me of… normal programming. Instead of understanding the human requirements, you need to understand the AI requirements. At one point, you’ll face QA issues, security issues, scaling issues and all of the nerdy issues you are hoping to avoid. Maybe it’s copium, but I am slightly more hopeful about the future of IT.
In the next part, I finally want to get into security of this whole thing. These preliminary steps were important for me to discover the technology, prepare the understanding as well as a testing sandbox and setup.
Librechat Setup
This one was quite simple, so a footnote will suffice. I’ve used the docker compose route. Head to the official site, check your versions of docker if they’re new enough and fire it up. If you’d use OpenAI or Anthropic, you’re up and working locally.
With Mistral you have one more step, you need to explain that to librechat. It can be done in the librechat.yaml
file. However, the default docker file that’s actually used doesn’t include this file. So, create it, include it, fill it with the provided example, set the env var.
Don’t forget to set your own secrets and disable registrations! There’s still a way to create user accounts manually with registration disabled. The rest is technique as they say - host, route, DNS, nginx
reverse proxy, getSSL, etc.
Et voilà - your own private chat interface to pretty much any model you’d like.