跳过到主内容

使用多模态内容运行评估

LangSmith 允许您创建带有文件附件(如图像、音频文件或文档)的数据集示例,以便在评估使用多模态输入或输出的应用程序时引用它们。

虽然您可以通过 base64 编码将多模态数据包含在示例中,但这种方法效率低下——编码后的数据比原始二进制文件占用更多空间,导致与 LangSmith 之间的传输速度变慢。使用附件则提供两个主要优势

  1. 由于更高效的二进制文件传输,上传和下载速度更快
  2. 在 LangSmith UI 中增强不同文件类型的可视化效果

1. 创建带附件的示例

要使用 SDK 上传带附件的示例,请使用 create_examples / update_examples Python 方法或 uploadExamplesMultipart / updateExamplesMultipart TypeScript 方法。

要求 langsmith>=0.3.13

import requests
import uuid
from pathlib import Path
from langsmith import Client

# Publicly available test files
pdf_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
wav_url = "https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav"
img_url = "https://www.w3.org/Graphics/PNG/nurbcup2si.png"

# Fetch the files as bytes
pdf_bytes = requests.get(pdf_url).content
wav_bytes = requests.get(wav_url).content
img_bytes = requests.get(img_url).content

# Create the dataset
ls_client = Client()
dataset_name = "attachment-test-dataset"
dataset = ls_client.create_dataset(
dataset_name=dataset_name,
description="Test dataset for evals with publicly available attachments",
)

inputs = {
"audio_question": "What is in this audio clip?",
"image_question": "What is in this image?",
}

outputs = {
"audio_answer": "The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years.",
"image_answer": "A mug with a blanket over it.",
}

# Define an example with attachments
example_id = uuid.uuid4()
example = {
"id": example_id,
"inputs": inputs,
"outputs": outputs,
"attachments": {
"my_pdf": {"mime_type": "application/pdf", "data": pdf_bytes},
"my_wav": {"mime_type": "audio/wav", "data": wav_bytes},
"my_img": {"mime_type": "image/png", "data": img_bytes},
# Example of an attachment specified via a local file path:
# "my_local_img": {"mime_type": "image/png", "data": Path(__file__).parent / "my_local_img.png"},
},
}

# Create the example
ls_client.create_examples(
dataset_id=dataset.id,
examples=[example],
# Uncomment this flag if you'd like to upload attachments from local files:
# dangerously_allow_filesystem=True
)
从文件系统上传

除了以字节形式传入外,附件还可以指定为本地文件的路径。为此,请为附件的 data 值传入路径,并指定参数 dangerously_allow_filesystem=True

client.create_examples(..., dangerously_allow_filesystem=True)

2. 运行评估

定义目标函数

现在我们有了一个包含带附件示例的数据集,我们可以定义一个目标函数来运行这些示例。以下示例仅使用 OpenAI 的 GPT-4o 模型来回答有关图像和音频剪辑的问题。

您正在评估的目标函数必须有两个位置参数才能使用与示例关联的附件,第一个参数必须命名为 inputs,第二个参数必须命名为 attachments

  • inputs 参数是一个字典,其中包含示例的输入数据,不包括附件。
  • attachments 参数是一个字典,它将附件名称映射到一个包含预签名 URL、mime_type 和文件字节内容读取器的字典。您可以使用预签名 URL 或读取器来获取文件内容。附件字典中的每个值都是一个具有以下结构的字典
{
    "presigned_url": str,
    "mime_type": str,
    "reader": BinaryIO
}
from langsmith.wrappers import wrap_openai

import base64
from openai import OpenAI

client = wrap_openai(OpenAI())

# Define target function that uses attachments
def file_qa(inputs, attachments): # Read the audio bytes from the reader and encode them in base64
audio_reader = attachments["my_wav"]["reader"]
audio_b64 = base64.b64encode(audio_reader.read()).decode('utf-8')
audio_completion = client.chat.completions.create(
model="gpt-4o-audio-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": inputs["audio_question"]
},
{
"type": "input_audio",
"input_audio": {
"data": audio_b64,
"format": "wav"
}
}
]
}
]
)

# Most models support taking in an image URL directly in addition to base64 encoded images
# You can pipe the image pre-signed URL directly to the model
image_url = attachments["my_img"]["presigned_url"]
image_completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": inputs["image_question"]},
{
"type": "image_url",
"image_url": {
"url": image_url,
},
},
],
}
],
)

return {
"audio_answer": audio_completion.choices[0].message.content,
"image_answer": image_completion.choices[0].message.content,
}

定义自定义评估器

确定评估器是否应接收附件的规则与上述完全相同。

下面的评估器使用 LLM 来判断推理和答案是否一致。要了解有关如何定义基于 LLM 的评估器的更多信息,请参阅此指南

# Assumes you've installed pydantic
from pydantic import BaseModel

def valid_image_description(outputs: dict, attachments: dict) -> bool:
"""Use an LLM to judge if the image description and images are consistent."""

instructions = """
Does the description of the following image make sense?
Please carefully review the image and the description to determine if the description is valid."""

class Response(BaseModel):
description_is_valid: bool

image_url = attachments["my_img"]["presigned_url"]
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{
"role": "system",
"content": instructions
},
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": image_url}},
{"type": "text", "text": outputs["image_answer"]}
]
}
],
response_format=Response
)

return response.choices[0].message.parsed.description_is_valid

ls_client.evaluate(
file_qa,
data=dataset_name,
evaluators=[valid_image_description],
)

更新带附件的示例

在上面的代码中,我们展示了如何向数据集添加带附件的示例。也可以使用 SDK 更新这些相同的示例。

与现有示例一样,当您使用附件更新数据集时,数据集会进行版本控制。因此,您可以导航到数据集版本历史记录以查看对每个示例所做的更改。要了解更多信息,请参阅此指南

更新带附件的示例时,您可以通过几种不同的方式更新附件

  • 传入新附件
  • 重命名现有附件
  • 删除现有附件

请注意

  • 任何未明确重命名或保留的现有附件都将被删除。
  • 如果您向 retainrename 传入不存在的附件名称,将引发错误。
  • 如果 attachmentsattachment_operations 字段中出现相同的附件名称,新附件将优先于现有附件。
example_update = {
"id": example_id,
"attachments": {
# These are net new attachments
"my_new_file": ("text/plain", b"foo bar"),
},
"inputs": inputs,
"outputs": outputs,
# Any attachments not in rename/retain will be deleted.
# In this case, that would be "my_img" if we uploaded it.
"attachments_operations": {
# Retained attachments will stay exactly the same
"retain": ["my_pdf"],
# Renaming attachments preserves the original data
"rename": {
"my_wav": "my_new_wav",
}
},
}

ls_client.update_examples(dataset_id=dataset.id, updates=[example_update])

此页面有帮助吗?


您可以提供详细反馈 在 GitHub 上.