配置 LangSmith 以实现规模化

自托管的 LangSmith 实例可以处理大量 traces。部署的默认配置可以处理相当大的负载，您可以配置您的部署以实现更高的规模。

下表包含 LangSmith 配置的高级指导，其中规模以每秒 traces 数 (TPS) 衡量。配置建议还假设您正在使用相对较新版本的 SDK 和 LangSmith（至少 0.10.69）。强烈建议在高负载下使用 Kubernetes 部署 LangSmith。

预期每秒 traces 数 (TPS)	队列 Pod 副本数	前端副本数	平台后端副本数	ClickHouse 配置	Redis 缓存大小
10 TPS	3（默认）	1（默认）	3（默认）	4 vCPU，16 GB 内存（默认）	2 GB（默认）
100 TPS	10	2	3（默认）	4 vCPU，16 GB 内存（默认）	13+ GB
1000 TPS	170	4	20	10 vCPU，48 GB 内存	200+ GB

注意

您产品的具体使用模式可能需要对资源进行更多调优。如果是这种情况，请联系 LangSmith 团队以获取任何问题解答。

每秒 10 条 Trace (TPS)

默认 LangSmith 配置可以处理 10 TPS。

100 TPS

要实现 100 TPS，请按如下方式更新您的配置：

至少 13 GB 的 Redis 缓存，TTL 为 1 小时
10 个队列 Pod 副本
增加 ClickHouse PVC 以存储 traces
2 个前端 Pod，用作入站请求的反向代理
启用 Blob 存储

扩展到此级别大约需要您的 Kubernetes 集群中的 Pod 拥有 24 个 vCPU 和 64 GB 内存。

以下是此配置的 values.yaml 片段示例：

config:
  blobStorage:
    ## Please also set the other keys to connect to your blob storage. See configuration section.
    enabled: true
  settings:
    redisRunsExpirySeconds: "3600"

frontend:
  deployment:
    replicas: 2

queue:
  deployment:
    replicas: 10

redis:
  statefulSet:
    resources:
      requests:
        memory: 13Gi
      limits:
        memory: 13Gi

  # -- For external redis instead use something like below --
  # external:
  #   enabled: true
  #   connectionUrl: "<URL>" OR existingSecretName: "<SECRET-NAME>"

clickhouse:
  statefulSet:
    persistence:
      # This may depend on your configured TTL.
      # We recommend 60Gi for every shortlived TTL day if operating at this scale constantly.
      size: 420Gi # This assumes 7 days TTL and operating a this scale constantly.

注意

建议使用外部 Redis 缓存。您需要确保您的 Redis 缓存配置为至少 13 GB，而不是上面所示的 values 文件中的资源配置。

1000 TPS

要在自托管的 Kubernetes LangSmith 部署上实现 1000 TPS，请按如下方式更新您的配置：

至少 200 GB 的外部 Redis 缓存，TTL 为 1 小时
您的 ClickHouse 实例拥有 10 个 vCPU 和 48 GB 内存
170 个队列 Pod
20 个平台后端 Pod
4 个前端 Pod，用作入站请求的反向代理
启用 Blob 存储

扩展到此级别大约需要您的 Kubernetes 集群中的 Pod 拥有 220 个 vCPU 和 350 GB 内存。

以下是配置上述建议的 values.yaml 片段：

frontend:
  deployment:
    replicas: 4 # OR enable autoscaling to this level (example below)
# autoscaling:
#   enabled: true
#   maxReplicas: 4
#   minReplicas: 2

platformBackend:
  deployment:
    replicas: 20 # OR enable autoscaling to this level (example below)
    resources:
      requests:
        cpu: "1600m"
# autoscaling:
#   enabled: true
#   maxReplicas: 20
#   minReplicas: 8

## Note that we are actively working on improving performance of this service to reduce the number of replicas.
queue:
  deployment:
    replicas: 170 # OR enable autoscaling to this level (example below)
    resources:
      requests:
        memory: "1.5Gi"
# autoscaling:
#   enabled: true
#   maxReplicas: 170
#   minReplicas: 40

## Ensure your Redis cache is at least 200 GB
redis:
  external:
    enabled: true
    existingSecretName: langsmith-redis-secret # Set the connection url for your external Redis instance (200+ GB)

clickhouse:
  statefulSet:
    persistence:
      # This may depend on your configured TTL (see config section).
      # We recommend 600Gi for every shortlived TTL day if operating at this scale constantly.
      size: 4200Gi # This assumes 7 days TTL and operating a this scale constantly.
    resources:
      requests:
        cpu: "10"
        memory: "48Gi"
      limits:
        cpu: "16"
        memory: "64Gi"

config:
  blobStorage:
    ## Please also set the other keys to connect to your blob storage. See configuration section.
    enabled: true
  settings:
    redisRunsExpirySeconds: "3600"
# ttl:
#   enabled: true
#   ttl_period_seconds:
#     longlived: "7776000"  # 90 days
#     shortlived: "604800"  # 7 days

# These are important environment variables to set.
commonEnv:
  - name: "CLICKHOUSE_ASYNC_INSERT_WAIT_PCT_FLOAT"
    value: "0"

重要提示

请确保 Kubernetes 集群配置有足够的资源以扩展到建议的大小。部署后，Kubernetes 集群中的所有 Pod 都应处于 Running 状态。卡在 Pending 状态的 Pod 可能表明您已达到节点池限制或需要更大的节点。

此外，请确保部署在集群上的任何 Ingress 控制器都能够处理所需的负载，以防止出现瓶颈。

配置 LangSmith 以实现规模化

每秒 10 条 Trace (TPS)

100 TPS

1000 TPS

此页面有帮助吗？

您可以留下详细反馈在 GitHub 上.

每秒 10 条 Trace (TPS)​

100 TPS​

1000 TPS​

此页面有帮助吗？

您可以留下详细反馈 在 GitHub 上.

每秒 10 条 Trace (TPS)

100 TPS

1000 TPS

您可以留下详细反馈在 GitHub 上.