MCP Tools for Reading arXiv Papers

by Takanori Maehara on 30th August 2025

Motivation

I want LLM to

  1. discover/introduce introduce papers published on arXiv
  2. discuss about the content of the paper where possible

Problem 1: Can't Find Papers

Let's start by naively asking an LLM and see what happens (I'm asking LLM to put their thought process in { ... }).

Screenshot 2025-08-30 at 12.01.33

It tries hard to fetch results by web searching things like "graphon papers arXiv this week 2025" or "graphon arXiv August 2025", but doesn't return satisfactory results. The reason is clear: Claude Desktop doesn't search arXiv directly. This fundamentally means newer papers or those with fewer views will be missed.

So, first I'll implement an MCP Tool that searches arXiv.

Solution 1: Implementing an arXiv Search MCP Tool

arXiv provides a search API, so I simply wrapped it as an MCP. After adjusting parameters appropriately for LLM use, trying the same query yields this:

Screenshot 2025-08-30 at 12.05.48

Clear improvement. Instead of awkward web searches, it's searching arXiv with natural parameters. It even found two papers that were missed without the MCP Tool.

Problem 2: Won't Read Paper Contents

Now for the main issue. I was curious about the details of the second paper, so I asked:

Screenshot 2025-08-30 at 12.02.01

The output looks plausible at first glance, but there are significant problems. As you can see from "Failed to fetch...", the LLM tried to retrieve the full text but failed, so it combined the abstract with its own knowledge to produce something plausible.

As researchers know well, there's often a gap between what's written in the abstract and what's actually shown in the body (over-interpreting results, omitting assumptions, etc.). And since LLMs basically affirm what's written, they start filling gaps on their own and saying things that aren't in the paper or aren't theoretically correct. This is a straight path to hallucination.

The only way to prevent this is to have the LLM actually read the paper.

Solution 2: Implementing an arXiv Content Retrieval MCP Tool

So I added a tool that lets the LLM read arXiv paper contents. The most naive implementation would be feeding the entire paper to the LLM, but this didn't work well in my experiments. Recent LLMs have large context windows that can handle paper data (50kb-100kb), but when I tried this, response accuracy for specific questions decreased.

Instead, I mimicked what humans do. When humans read papers, they typically read section by section. Specifically: first skim the whole paper to grasp the overall section structure, then selectively read sections that look interesting (usually in the order: Introduction → Conclusion → Method/Experiment). Let's have the LLM do the same. The tool has two modes:

  1. Overview mode: Returns the paper's section structure (specifically: section names, first paragraph, and total character count)
  2. Section mode: Returns the contents of specified sections

Implementing this and having the LLM use it yields:

Screenshot 2025-08-30 at 16.01.28

As expected, the first Tool Call retrieves the section structure, and subsequent Tool Calls read sections that likely contain the details. The explanation after reading looks like this:

Screenshot 2025-08-30 at 16.01.37

Screenshot 2025-08-30 at 16.01.43

Screenshot 2025-08-30 at 16.01.56

This generates proper detailed explanations rather than just paraphrasing the abstract. Since I've read this paper, I can confirm this is a reasonably accurate explanation.

For more details, you can ask follow-up questions and get answers like this:

Screenshot 2025-08-30 at 16.02.02

Screenshot 2025-08-30 at 16.02.12

Screenshot 2025-08-30 at 16.02.26

Comparison with Existing Packages

A well-known MCP tool for interacting with arXiv is ArXiv MCP Server, but there are at least two points I find unsatisfactory:

  1. The search date filter implementation is poor. This is a problem with the dependent arxiv library. The arXiv API itself properly implements date filters, but the arxiv library doesn't. So clients must fetch extra results and filter themselves. In fields with many papers, this means either fetching enormous amounts or missing papers.

  2. read_paper feeds the entire paper at once. In my experiments, this yields insufficient accuracy (differences become apparent when having it write definitions or follow proofs). Having it specify where to read is essentially making the LLM do Context Engineering itself + nudging toward the task, so I think this approach is better for close-reading tasks.

Summary

  1. Providing proper tools yields visibly better performance. Hallucinations also decrease.
  2. Having the LLM mimic "what humans do" via tools is somewhat of a standard technique.

Implementation Example

https://gitlab.com/-/snippets/4884254

The author's Personal MCP Server contains multiple tools beyond these, making it hard to read, so I had Claude Code extract only the parts relevant to this article.

The differences between the actual code and the above are:

  1. Publishing the MCP Server itself on Cloudflare Workers (+ Cloudflare Access) enables secure access from multiple devices
  2. Routing arXiv access through a home lab API gateway strictly enforces arXiv's required rate limit of 1 request per 3 seconds

The above code is released under the Unlicense, so feel free to use it however you like, including redistribution.