ビジョンベースのチャットアプリを開発する

5 分

マルチモーダルモデルを使用してビジョンベースのチャットに関与するクライアントアプリを開発するには、テキストベースのチャットに使用されるのと同じ基本的な手法を使用できます。モデルがデプロイされているエンドポイントへの接続が必要であり、そのエンドポイントを使用して、メッセージで構成されるプロンプトをモデルに送信し、応答を処理します。

主な違いは、視覚ベースのチャットのプロンプトには、 テキスト コンテンツ項目と画像コンテンツアイテムの両方を含むマルチパートユーザーメッセージが含まれていることです。

モデルに送信されるマルチパートプロンプトの図。

Responses API を使用して画像ベースのプロンプトを送信する

Responses API を使用してプロンプトに画像を含めるには、Web ベースのイメージファイルの URL を指定するか、ローカルイメージを読み込んで Base64 形式でデータをエンコードし、data:image/jpeg;base64,{image_data}形式で URL を送信します (必要に応じて、"jpeg" を "png" 形式に置き換えます)。

次の Python の例は、 Responses API を使用してプロンプトで画像を送信する方法を示しています。

# Read the image data from a local file
image_path = Path("dragon-fruit.jpeg")
image_format = "jpeg"
with open(image_path, "rb") as image_file:
    image_data = base64.b64encode(image_file.read()).decode("utf-8")

data_url = f"data:image/{image_format};base64,{image_data}" # You can also use a web URL

# Send the image data in a prompt to the model
response = client.responses.create(
    model="gpt-4.1",
    input=[
        {"role": "developer", "content": "You are an AI assistant for chefs planning recipes."},
        {"role": "user", "content": [  
            { "type": "input_text", "text": "What desserts could I make with this?"},
            { "type": "input_image", "image_url": data_url}
        ] } 
    ]
)
print(response.output_text)

ChatCompletions API を使用して画像ベースのプロンプトを送信する

Azure OpenAI エンドポイントを使用して Responses API をサポートしていないモデルにプロンプトを送信する場合は、 CatCompletions API を使用できます。このように：

# Read the image data from a local file
image_path = Path("orange.jpeg")
image_format = "jpeg"
with open(image_path, "rb") as image_file:
    image_data = base64.b64encode(image_file.read()).decode("utf-8")

data_url = f"data:image/{image_format};base64,{image_data}" # You can also use a web URL

# Send the image data in a prompt to the model
response = client.chat.completions.create(
    model="Phi-4-multimodal-instruct",
    messages=[
        {"role": "system", "content": "You are an AI assistant for chefs planning recipes."},
        { "role": "user", "content": [  
            { "type": "text", "text": "What can I make with this fruit?"},
            { "type": "image_url", "image_url": {"url": data_url}}
        ] }
    ]
)
print(response.choices[0].message.content)

フィードバック

このページはお役に立ちましたか?

ビジョンベースのチャット アプリを開発する

Responses API を使用して画像ベースのプロンプトを送信する

ChatCompletions API を使用して画像ベースのプロンプトを送信する

フィードバック

ビジョンベースのチャットアプリを開発する