MCP のリリースと仕様をよく読む - フォトシンスエンジニアブログ

どうも daikw - Qiita です。

先週 Anthropic が発表し、話題になっている MCP のリリースと仕様をよく読みました。 MCP は大体 GATT (BLE通信における Service と Characteristic のようなもの) だと理解しました。サンプルもいくつか試して、実装しやすそうでした。何か作ってみたいですね。

Introducing the Model Context Protocol

Introducing the Model Context Protocol \ Anthropic から抜粋して翻訳しました。

Today, we're open-sourcing the Model Context Protocol (MCP), a new standard for connecting AI assistants to the systems where data lives, including content repositories, business tools, and development environments. Its aim is to help frontier models produce better, more relevant responses.

本日、Model Context Protocol（MCP）をオープンソース化します。これは、コンテンツリポジトリ、ビジネスツール、開発環境など、データが存在するシステムにAIアシスタントを接続するための新しい標準です。これにより、最先端のモデルがより優れた、より適切な応答を生成できるようになります。

これまでも Dify や Toolhouse を始め、コンテキストを基盤モデルに渡す方法はいろんな実装があったような気がしますが、それぞれ独自の実装でした。実装が複数乱立したタイミングで、他モデルプロバイダに先んじて抽象化レイヤを OSS にする戦略と読みました。

As AI assistants gain mainstream adoption, the industry has invested heavily in model capabilities, achieving rapid advances in reasoning and quality. Yet even the most sophisticated models are constrained by their isolation from data—trapped behind information silos and legacy systems. Every new data source requires its own custom implementation, making truly connected systems difficult to scale.

AIアシスタントが主流になるにつれ、業界はモデルの能力に多大な投資を行い、推論や品質において急速な進歩を遂げました。しかし、最も高度なモデルでさえ、データから隔離され、情報のサイロやレガシーシステムの背後に閉じ込められているという制約があります。新しいデータソースごとに独自の実装が必要となり、真に接続されたシステムのスケーリングが困難になっています。

基盤モデルに限らずツールハブ系の製品はたくさんあって、良し悪しあれど一定のシェアをとっていると思います。いろいろなツール・データ源をうまく基盤モデルに渡す方法はそれぞれ必要なのですが、都度開発するのはしんどいです。

MCP addresses this challenge. It provides a universal, open standard for connecting AI systems with data sources, replacing fragmented integrations with a single protocol. The result is a simpler, more reliable way to give AI systems access to the data they need. The Model Context Protocol is an open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools. The architecture is straightforward: developers can either expose their data through MCP servers or build AI applications (MCP clients) that connect to these servers.

MCPはこの課題に対処します。AIシステムとデータソースを接続するための普遍的でオープンな標準を提供し、断片化された統合を単一のプロトコルに置き換えます。その結果、AIシステムが必要とするデータにアクセスするための、よりシンプルで信頼性の高い方法が実現します。 Model Context Protocolは、開発者がデータソースとAI搭載ツールとの間に安全な双方向接続を構築できるオープンな標準です。アーキテクチャはシンプルで、開発者はMCPサーバーを通じてデータを公開するか、これらのサーバーに接続するAIアプリケーション（MCPクライアント）を構築することができます。

いろんなデータプロバイダが MCP サーバを公開することで、インターネットが人間より AI に優しくなりそう。

Claude 3.5 Sonnet is adept at quickly building MCP server implementations, making it easy for organizations and individuals to rapidly connect their most important datasets with a range of AI-powered tools. To help developers start exploring, we’re sharing pre-built MCP servers for popular enterprise systems like Google Drive, Slack, GitHub, Git, Postgres, and Puppeteer.

Claude 3.5 Sonnetは、MCPサーバーの実装を迅速に構築するのに優れており、組織や個人が最も重要なデータセットをさまざまなAI搭載ツールと迅速に接続することを容易にします。開発者が探索を開始できるよう、Google Drive、Slack、GitHub、Git、Postgres、Puppeteerなどの一般的なエンタープライズシステム向けの事前構築済みMCPサーバーを共有しています。

ここを字義通りに捉えると、「MCPサーバの実装を Claude 3.5 Sonnet に任せる」という意味でしょうか。MCPサーバは必ずしも言語モデルを内蔵しないはずで、 Claude を MCP サーバにする訳ではないと思います。

何をどうしたら「MCP サーバの実装に長けている」状態になるのかわからない。。。プロンプトを工夫すれば他の基盤モデルでもそれほど苦労せず実装できそう。特定のキーワードの場合に MCP サーバ生成用のプロンプトに切り替えているとか？

Early adopters like Block and Apollo have integrated MCP into their systems, while development tools companies including Zed, Replit, Codeium, and Sourcegraph are working with MCP to enhance their platforms—enabling AI agents to better retrieve relevant information to further understand the context around a coding task and produce more nuanced and functional code with fewer attempts. "At Block, open source is more than a development model—it’s the foundation of our work and a commitment to creating technology that drives meaningful change and serves as a public good for all,” said Dhanji R. Prasanna, Chief Technology Officer at Block. “Open technologies like the Model Context Protocol are the bridges that connect AI to real-world applications, ensuring innovation is accessible, transparent, and rooted in collaboration. We are excited to partner on a protocol and use it to build agentic systems, which remove the burden of the mechanical so people can focus on the creative.”

BlockやApolloなどの早期導入者は、MCPをシステムに統合しています。一方、Zed、Replit、Codeium、Sourcegraphなどの開発ツール企業は、MCPを活用してプラットフォームを強化し、AIエージェントが関連情報をより適切に取得してコーディングタスクの文脈を深く理解し、より微妙で機能的なコードを少ない試行回数で生成できるように取り組んでいます。「Blockにおいて、オープンソースは単なる開発モデル以上のものであり、私たちの仕事の基盤であり、意義のある変化を促進し、すべての人々の公共の利益となる技術を創造するというコミットメントです」と、Blockの最高技術責任者であるDhanji R. Prasanna氏は述べています。「Model Context Protocolのようなオープン技術は、AIと現実世界のアプリケーションをつなぐ橋渡しであり、イノベーションがアクセス可能で透明性があり、協働に根ざしていることを保証します。私たちは、このプロトコルに協力し、機械的な負担を取り除き、人々が創造的なことに集中できるエージェンティックシステムを構築するためにこれを使用することに興奮しています。」

Instead of maintaining separate connectors for each data source, developers can now build against a standard protocol. As the ecosystem matures, AI systems will maintain context as they move between different tools and datasets, replacing today's fragmented integrations with a more sustainable architecture.

各データソースごとに別々のコネクタを維持するのではなく、開発者は標準プロトコルに基づいて構築できるようになりました。エコシステムが成熟するにつれて、AIシステムは異なるツールやデータセット間を移動する際にも文脈を維持し、現在の断片的な統合をより持続可能なアーキテクチャに置き換えることができます。

他のモデルプロバイダが直接 MCP に対応するリリースはまだないですが、個々のデータソース側は MCP のような共通プロトコルに対応するデメリットはなさそうです。よって LLM ツールハブ系の製品は、それぞれのデータソースに対応するコネクタを開発するよりも、それぞれの MCP サーバを開発するようになりそうです。

ChatGPT 側の Function Calling を利用して、擬似 MCP にすることはできるはずなので、そうやって抽象化していく流れになったかも。

An open community We’re committed to building MCP as a collaborative, open-source project and ecosystem, and we’re eager to hear your feedback. Whether you’re an AI tool developer, an enterprise looking to leverage existing data, or an early adopter exploring the frontier, we invite you to build the future of context-aware AI together.

オープンなコミュニティ私たちは、MCPを協力的なオープンソースプロジェクトおよびエコシステムとして構築することに尽力しており、皆様からのフィードバックを心待ちにしています。AIツール開発者であれ、既存データを活用したい企業であれ、最前線を探るアーリーアダプターであれ、文脈認識型AIの未来を共に構築することをお誘いします。

これは ... 当てつけかもw そういえば、 ClosedAI に名前を変えろって言ってる某ツイ廃がいましたね。

Specification – Model Context Protocol Specification

Specification – Model Context Protocol Specification から、気になったところを抜粋しました。 MCP は LLM 時代におけるプロトコルの提案としては初めての試みでしょうか。 Human-in-the-loop が SHOULD になっているなど、特徴的な部分がいくつかありました。

Overview

MCP provides a standardized way for applications to:

Share contextual information with language models

Expose tools and capabilities to AI systems

Build composable integrations and workflows

The protocol uses JSON-RPC 2.0 messages to establish communication between:

Hosts: LLM applications that initiate connections

Clients: Connectors within the host application

Servers: Services that provide context and capabilities

MCP takes some inspiration from the Language Server Protocol, which standardizes how to add support for programming languages across a whole ecosystem of development tools. In a similar way, MCP standardizes how to integrate additional context and tools into the ecosystem of AI applications.

MCPは、アプリケーションが以下を実現するための標準化された方法を提供します：

言語モデルとコンテキスト情報を共有する

AIシステムにツールと機能を公開する

組み合わせ可能な統合やワークフローを構築する

このプロトコルは、JSON-RPC 2.0メッセージを使用して以下の間で通信を確立します：

Hosts: 接続を開始するLLMアプリケーション

Clients: ホストアプリケーション内のコネクター

Servers: コンテキストや機能を提供するサービス

MCPは、開発ツールのエコシステム全体でプログラミング言語のサポートを追加する方法を標準化したLanguage Server Protocolに一部インスピレーションを得ています。同様に、MCPはAIアプリケーションのエコシステムに追加のコンテキストやツールを統合する方法を標準化します。

MCP では、 Hosts/Clients/Servers の3つのコンポーネントに分離して、それらの間の通信を JSON-RPC 2.0 で規定しています。 MCP と聞いてなんとなく LSP を想起していたのですが、そのまま仕様書に書いてありました。

Architecture

Architecture – Model Context Protocol Specification

MCP is built on several key design principles that inform its architecture and implementation: - Servers should be extremely easy to build - Servers should be highly composable - Servers should not be able to read the whole conversation, nor “see into” other servers - Features can be added to servers and clients progressively

MCPは、そのアーキテクチャと実装を支えるいくつかの重要な設計原則に基づいて構築されています： - サーバーは非常に簡単に構築できるべき - サーバーは高い合成性を持つべき - サーバーは全会話を読み取ったり、他のサーバーを「覗き見る」ことができてはならない - サーバーとクライアントに機能を段階的に追加できる

以下の2点の表現は MCP に特徴的かもしれません

サーバーは全会話を読み取ったり、他のサーバーを「覗き見る」ことができてはならない
- これ自体は当たり前の話だと思いますが、 conversation という表現は言語モデル特有ですね
サーバーは高い合成性を持つべき
- マルチエージェントシステムを意識してそう。ドメインを特化した個別のエージェントの組み合わせが高い精度を出すと言われていますね: AIマルチエージェントの設計・構築(入門編)｜はち

Capability Negotiation

サーバ - クライアント間の通信の初期に、利用するサーバリソースと機能を交渉するフェーズがあり、 Capability Negotiation と呼ばれています。なんとなく、 BLE 通信、特に GATT における Service と Characteristic の関係に似ています。

BLE 通信では Central (~Hosts) が複数の異なる能力を持つ Peripheral (~Servers) のサービスを選択して、その機能を組み合わせてユーザに機能提供するので、似たような構成になるのは自然かもしれません。

参考: BLE(Bluetooth Low Energy)とは？GAPとGATT)

Capability のリストは Lifecycle#Capability Negotiation に記載されています。

Category	Capability	Description
クライアント	roots	ファイルシステムのルートを提供する機能
クライアント	sampling	LLMサンプリングリクエストのサポート
クライアント	experimental	非標準の実験的機能のサポートを説明
サーバー	prompts	プロンプトテンプレートを提供
サーバー	resources	読み取り可能なリソースを提供
サーバー	tools	呼び出し可能なツールを公開
サーバー	logging	構造化されたログメッセージを発行
サーバー	experimental	非標準の実験的機能のサポートを説明

Client Features

クライアント側の機能(Capability)の定義は現状二つあり、 Capability Negotiation の際にサーバに開示されます。

Roots: ファイルシステムを公開する / 現状は listChanged イベントだけサポートしているようです
Sampling: 基盤モデルによる生成(generation)・補完(completion)のことを sampling と呼ぶことがあります。クライアントが、サーバに対して、クライアント側で利用している基盤モデルへの sampling リクエストを許可するかどうかを制御できます。

Client / Sampling

Sampling – Model Context Protocol Specification

サンプリングリクエストは、MCP サーバからのプロンプトインジェクションを誘発しうるため、それを防ぐための仕様・記述が豊富に書かれていました。特に Human-in-the-loop の実装を SHOULD で記載しています。

For trust & safety and security, there SHOULD always be a human in the loop with the ability to deny sampling requests.

Applications SHOULD:

Provide UI that makes it easy and intuitive to review sampling requests

Allow users to view and edit prompts before sending

Present generated responses for review before delivery
信頼性、安全性、セキュリティのために、サンプリングリクエストを拒否する能力を持つ人間が常に介在するべきです。

アプリケーションは以下を行うべきです：

サンプリングリクエストを簡単かつ直感的に確認できるUIを提供する

プロンプトを送信する前にユーザーが確認および編集できるようにする

生成された応答を配信前にレビュー用に提示する

Server Features

Server Features – Model Context Protocol Specification

サーバは、クライアントに 3 つの機能を提供できます。 Capability Negotiation の際にクライアントに開示されます。

最初は Resources と Tools を分離した意図がよくつかめませんでしたが、 MCPサーバに対するリクエストをする主体で整理していると理解しました。以下の表に記載の通り、アプリケーションが要求するであろう情報は Resources、基盤モデルが要求するであろう情報は Tools ということですね。

Primitive	Control	Description	Example
Prompts	User-controlled	ユーザーの選択によって呼び出されるインタラクティブなテンプレート	スラッシュコマンド、メニューオプション
Resources	Application-controlled	クライアントによって添付および管理されるコンテキストデータ	ファイルの内容、git履歴
Tools	Model-controlled	モデルによって公開されるツール	API POST リクエスト、ファイルの書き込み