Responses API を使用して応答を生成する

10 分

OoenAI Responses API は、以前に分離された 2 つの API (ChatCompletions と アシスタント) の機能を統合されたエクスペリエンスにまとめます。ステートフルなマルチターン応答生成を提供し、会話型 AI アプリケーションに最適です。 Foundry SDK または OpenAI SDK を使用して、OpenAI 互換クライアントを介して Responses API にアクセスできます。

Responses API について

Responses API には、従来のチャット完了機能よりもいくつかの利点があります。

ステートフルな対話: 複数のターンにわたって対話のコンテキストを維持する
統合されたエクスペリエンス: チャットの完了と Assistants API パターンを組み合わせたもの
Foundry ダイレクトモデル: Azure OpenAI モデルだけでなく、Microsoft Foundry で直接ホストされているモデルと連携します
簡単な統合: OpenAI互換クライアントを利用したアクセス

注

Responses API は、Microsoft Foundry アプリケーションで AI 応答を生成するための推奨されるアプローチです。ほとんどのシナリオでは、古い ChatCompletions API が置き換えられます。

単純な応答の生成

OpenAI 互換クライアントでは、 responses.create() メソッドを使用して応答を生成できます。

# Generate a response using the OpenAI-compatible client
response = openai_client.responses.create(
    model="gpt-4.1",  # Your model deployment name
    input="What is Microsoft Foundry?"
)

# Display the response
print(response.output_text)

入力パラメーターは、プロンプトを含むテキスト文字列を受け入れます。モデルは、この入力に基づいて応答を生成します。

応答構造について

応答オブジェクトには、いくつかの便利なプロパティが含まれています。

output_text: 生成されたテキスト応答
id: この応答の一意の識別子
status: 応答の状態 ("completed" など)
usage: トークンの使用状況情報 (入力、出力、および合計トークン)
model: 応答の生成に使用されるモデル

これらのプロパティをaccessして、応答を効果的に処理できます。

response = openai_client.responses.create(
    model="gpt-4.1",
    input="Explain machine learning in simple terms."
)

print(f"Response: {response.output_text}")
print(f"Response ID: {response.id}")
print(f"Tokens used: {response.usage.total_tokens}")
print(f"Status: {response.status}")

指示を追加する

ユーザー入力に加えて、モデルの動作をガイドする命令 (多くの場合 、システムプロンプトと呼ばれます) を指定できます。

response = client.responses.create(
    model="gpt-4.1",
    instructions="You are a helpful AI assistant that answers questions clearly and concisely.",
    input="Explain neural networks."
)

print(response.output_text)

応答生成の制御

応答の生成は、次の追加パラメーターを使用して制御できます。

response = openai_client.responses.create(
    model="gpt-4.1",
    instructions="You are a helpful AI assistant that answers questions clearly and concisely.",
    input="Write a creative story about AI.",
    temperature=0.8,  # Higher temperature for more creativity
    max_output_tokens=200  # Limit response length
)

print(response.output_text)

temperature: ランダム性 (0.0 から 2.0) を制御します。値を大きくすると、出力がよりクリエイティブで多様になります
max_output_tokens: 応答内のトークンの最大数を制限します
top_p:ランダム性を制御するための温度に代わるもの

Foundryの直接モデルの使用

FoundrySDK または AzureOpenAI クライアントを使用して プロジェクト エンドポイントに接続する場合、Responses API は、Azure OpenAI モデルと Foundry ダイレクトモデル (Microsoft Phi、DeepSeek、または Microsoft Foundry で直接ホストされている他のモデル) の両方で動作します。

# Using a Foundry direct model
response = openai_client.responses.create(
    model="microsoft-phi-4",  # Example Foundry direct model
    instructions="You are a helpful AI assistant that answers questions clearly and concisely.",
    input="What are the benefits of small language models?"
)

print(response.output_text)

会話エクスペリエンスの作成

より複雑な会話シナリオでは、システム命令を提供し、複数ターンの会話を作成できます。

# First turn in the conversation
response1 = openai_client.responses.create(
    model="gpt-4.1",
    instructions="You are a helpful AI assistant that explains technology concepts clearly.",
    input="What is machine learning?"
)

print("Assistant:", response1.output_text)

# Continue the conversation
response2 = openai_client.responses.create(
    model="gpt-4.1",
    instructions="You are a helpful AI assistant that explains technology concepts clearly.",
    input="Can you give me an example?",
    previous_response_id=response1.id
)

print("Assistant:", response2.output_text)

実際には、実装は、ユーザーがモデルから受信した各応答に基づいて対話形式でメッセージを入力できるループとして構築される可能性があります。

# Track responses
last_response_id = None

# Loop until the user wants to quit
print("Assistant: Enter a prompt (or type 'quit' to exit)")
while True:
    input_text = input('\nYou: ')
    if input_text.lower() == "quit":
        print("Assistant: Goodbye!")
        break

    # Get a response
    response = openai_client.responses.create(
                model=model_name,
                instructions="You are a helpful AI assistant that explains technology concepts clearly.",
                input=input_text,
                previous_response_id=last_response_id
    )
    assistant_text = response.output_text
    print("\nAssistant:", assistant_text)
    last_response_id = response.id

この例の出力は次のようになります。

Assistant: Enter a prompt (or type 'quit' to exit)

You: What is machine learning?

Assistant: Machine learning is a type of artificial intelligence (AI) that enables computers to learn from data and improve their performance over time without being explicitly programmed. It involves training algorithms on large datasets to recognize patterns, make predictions, or take actions based on those patterns. This allows machines to become more accurate and efficient in their tasks as they are exposed to more data.

You: Can you give me an example?

Assistant: Certainly! Let's look at a simple example of supervised learning—predicting house prices based on features like size, location, and number of rooms.
Imagine you want to build a machine learning model that can predict the price of a house based on various factors.
...
    { the example provided in the model response may be extensive}
...

You: quit

Assistant: Goodbye!

ユーザーが各ターンで新しい入力を入力すると、モデルに送信されるデータには 、Instructions システムメッセージ、ユーザーからの入力、およびモデルから受信した以前の応答が含まれます。このようにして、新しい入力は、前の入力に対して生成されたモデルの応答によって提供されるコンテキストで接地されます。

代替手段: 会話の手動連結

メッセージ履歴を自分で作成することで、会話を手動で管理できます。この方法では、含まれるコンテキストをより詳細に制御できます。

try:
    # Start with initial message
    conversation_history = [
        {
            "type": "message",
            "role": "user",
            "content": "What is machine learning?"
        }
    ]
    
    # First response
    response1 = openai_client.responses.create(
        model="gpt-4.1",
        input=conversation_history
    )
    
    print("Assistant:", response1.output_text)
    
    # Add assistant response to history
    conversation_history += response1.output
    
    # Add new user message
    conversation_history.append({
        "type": "message",
        "role": "user", 
        "content": "Can you give me an example?"
    })
    
    # Second response with full history
    response2 = openai_client.responses.create(
        model="gpt-4.1",
        input=conversation_history
    )
    
    print("Assistant:", response2.output_text)

except Exception as ex:
    print(f"Error: {ex}")

この手動アプローチは、次の操作を行う必要がある場合に便利です。

コンテキストに含めるメッセージをカスタマイズする
会話の排除を実装してトークンの制限を管理する
データベースからの会話履歴の保存と復元

特定の以前の応答の取得

Responses API は応答履歴を保持するため、以前の応答を取得できます。

try:   
   
    # Retrieve a previous response
    response_id = "resp_67cb61fa3a448190bcf2c42d96f0d1a8"  # Example ID
    previous_response = openai_client.responses.retrieve(response_id)
    
    print(f"Previous response: {previous_response.output_text}")

except Exception as ex:
    print(f"Error: {ex}")

コンテキストウィンドウに関する考慮事項

previous_response_id パラメーターは応答をリンクし、複数の API 呼び出し間で会話コンテキストを維持します。

会話履歴を保持すると、トークンの使用量が増加する可能性があることに注意してください。 1 回の実行の場合、アクティブなコンテキストウィンドウには次のものが含まれます。

システムの指示 (指示、安全規則)
現在のプロンプト
会話履歴 (以前のユーザー + アシスタントメッセージ)
ツールスキーマ (関数、OpenAPI 仕様、MCP ツールなど)
ツール出力 (検索結果、コードインタープリター出力、ファイル)
取得されたメモリまたはドキュメント (メモリストア、RAG、ファイル検索から)

これらのすべてが連結され、トークン化され、すべての要求で一緒にモデルに送信されます。 SDK は状態の管理に役立ちますが、トークンの使用量が自動的に安くなるわけではありません。

レスポンシブチャットアプリの作成

モデルからの応答は、使用されている特定のモデル、コンテキストウィンドウのサイズ、プロンプトのサイズなどの要因に応じて、生成に時間がかかる場合があります。応答を待っている間にアプリが "フリーズ" しているように見える場合、ユーザーが不満を感じる可能性があるため、実装でアプリの応答性を考慮することが重要です。

ストリーミング応答

長い応答の場合は、ストリーミングを使用して出力を増分的に受信できるため、出力が使用可能になると、ユーザーは部分的に完全な応答を確認できます。

stream = openai_client.responses.create(
    model="gpt-4.1",
    input="Write a short story about a robot learning to paint.",
    stream=True
)

for event in stream:
    print(event, end="", flush=True)

ストリーミング時に会話履歴を追跡している場合は、次のように、ストリームの終了時に応答 ID を取得できます。

stream = openai_client.responses.create(
    model="gpt-4.1",
    input="Write a short story about a robot learning to paint.",
    stream=True
)
for event in stream:
                if event.type == "response.output_text.delta":
                    print(event.delta, end="")
                elif event.type == "response.completed":
                    response_id = event.response.id

非同期の使用

高パフォーマンスのアプリケーションでは、非ブロッキング API 呼び出しを行うことができる非同期クライアントを使用できます。非同期の使用は、実行時間の長い要求や、アプリケーションをブロックせずに複数の要求を同時に処理する場合に最適です。これを使用するには、AsyncOpenAIではなくOpenAIをインポートし、各 API 呼び出しでawaitを使用します。

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://<resource-name>.openai.azure.com/openai/v1/",
    api_key=token_provider,
)

async def main():
    response = await client.responses.create(
        model="gpt-4.1",
        input="Explain quantum computing briefly."
    )
    print(response.output_text)

asyncio.run(main())

非同期ストリーミングも同じように動作します。

async def stream_response():
    stream = await client.responses.create(
        model="gpt-4.1",
        input="Write a haiku about coding.",
        stream=True
    )
    
    async for event in stream:
        print(event, end="", flush=True)

asyncio.run(stream_response())

Microsoft Foundry SDK を介して Responses API を使用すると、コンテキストを維持し、複数のモデルの種類をサポートし、応答性の高いユーザーエクスペリエンスを提供する高度な会話型 AI アプリケーションを構築できます。

フィードバック

このページはお役に立ちましたか?