MCP Tools for Reading arXiv Papers
by Takanori Maehara on 30th August 2025
Motivation
I want LLM to
- discover/introduce introduce papers published on arXiv
- discuss about the content of the paper where possible
Problem 1: Can't Find Papers
Let's start by naively asking an LLM and see what happens (I'm asking LLM to put their thought process in { ... }).

It tries hard to fetch results by web searching things like "graphon papers arXiv this week 2025" or "graphon arXiv August 2025", but doesn't return satisfactory results. The reason is clear: Claude Desktop doesn't search arXiv directly. This fundamentally means newer papers or those with fewer views will be missed.
So, first I'll implement an MCP Tool that searches arXiv.
Solution 1: Implementing an arXiv Search MCP Tool
arXiv provides a search API, so I simply wrapped it as an MCP. After adjusting parameters appropriately for LLM use, trying the same query yields this:

Clear improvement. Instead of awkward web searches, it's searching arXiv with natural parameters. It even found two papers that were missed without the MCP Tool.
Problem 2: Won't Read Paper Contents
Now for the main issue. I was curious about the details of the second paper, so I asked:

The output looks plausible at first glance, but there are significant problems. As you can see from "Failed to fetch...", the LLM tried to retrieve the full text but failed, so it combined the abstract with its own knowledge to produce something plausible.
As researchers know well, there's often a gap between what's written in the abstract and what's actually shown in the body (over-interpreting results, omitting assumptions, etc.). And since LLMs basically affirm what's written, they start filling gaps on their own and saying things that aren't in the paper or aren't theoretically correct. This is a straight path to hallucination.
The only way to prevent this is to have the LLM actually read the paper.
Solution 2: Implementing an arXiv Content Retrieval MCP Tool
So I added a tool that lets the LLM read arXiv paper contents. The most naive implementation would be feeding the entire paper to the LLM, but this didn't work well in my experiments. Recent LLMs have large context windows that can handle paper data (50kb-100kb), but when I tried this, response accuracy for specific questions decreased.
Instead, I mimicked what humans do. When humans read papers, they typically read section by section. Specifically: first skim the whole paper to grasp the overall section structure, then selectively read sections that look interesting (usually in the order: Introduction → Conclusion → Method/Experiment). Let's have the LLM do the same. The tool has two modes:
- Overview mode: Returns the paper's section structure (specifically: section names, first paragraph, and total character count)
- Section mode: Returns the contents of specified sections
Implementing this and having the LLM use it yields:

As expected, the first Tool Call retrieves the section structure, and subsequent Tool Calls read sections that likely contain the details. The explanation after reading looks like this:



This generates proper detailed explanations rather than just paraphrasing the abstract. Since I've read this paper, I can confirm this is a reasonably accurate explanation.
For more details, you can ask follow-up questions and get answers like this:



Comparison with Existing Packages
A well-known MCP tool for interacting with arXiv is ArXiv MCP Server, but there are at least two points I find unsatisfactory:
The
searchdate filter implementation is poor. This is a problem with the dependent arxiv library. The arXiv API itself properly implements date filters, but the arxiv library doesn't. So clients must fetch extra results and filter themselves. In fields with many papers, this means either fetching enormous amounts or missing papers.read_paperfeeds the entire paper at once. In my experiments, this yields insufficient accuracy (differences become apparent when having it write definitions or follow proofs). Having it specify where to read is essentially making the LLM do Context Engineering itself + nudging toward the task, so I think this approach is better for close-reading tasks.
Summary
- Providing proper tools yields visibly better performance. Hallucinations also decrease.
- Having the LLM mimic "what humans do" via tools is somewhat of a standard technique.
Implementation Example
https://gitlab.com/-/snippets/4884254
The author's Personal MCP Server contains multiple tools beyond these, making it hard to read, so I had Claude Code extract only the parts relevant to this article.
The differences between the actual code and the above are:
- Publishing the MCP Server itself on Cloudflare Workers (+ Cloudflare Access) enables secure access from multiple devices
- Routing arXiv access through a home lab API gateway strictly enforces arXiv's required rate limit of 1 request per 3 seconds
The above code is released under the Unlicense, so feel free to use it however you like, including redistribution.