Building Triton Inference Server from Source with S3 Support
The official Docker image of Triton Inference Server does not include S3 support by default for the igpu images. To enable S3 support, we need to build the server from source.
Here is the docker command example without S3 support
#!/usr/bin/env bashTRITON_VERSION="${1}"[[ -z "${TRITON_VERSION}" ]] && TRITON_VERSION="24.08"IMAGE_NAME="tritonserver"OFFICIAL_MIN_IMAGE_TAG="${TRITON_VERSION}-py3-igpu-min"CUSTOM_IMAGE_TAG="${TRITON_VERSION}-igpu-s3"# Create a directory for Triton and clone the repositoryrm -rf tritonmkdir triton && cd tritongit clone --recurse-submodules https://github.com/triton-inference-server/server.gitcd server# Checkout the desired Triton versiongit checkout "r${TRITON_VERSION}"# Build the Triton Inference Serversudo python3 build.py \ --build-parallel 10 \ --no-force-clone \ --target-platform igpu \ --target-machine aarch64 \ --filesystem s3 \ --enable-gpu \ --enable-mali-gpu \ --enable-metrics \ --enable-logging \ --enable-stats \ --enable-cpu-metrics \ --enable-nvtx \ --backend onnxruntime \ --backend pytorch \ --backend tensorflow \ --backend python \ --backend tensorrt \ --endpoint http \ --endpoint grpc \ --min-compute-capability "5.3" \ --image "base,nvcr.io/nvidia/${IMAGE_NAME}:${OFFICIAL_MIN_IMAGE_TAG}" \ --image "gpu-base,nvcr.io/nvidia/${IMAGE_NAME}:${OFFICIAL_MIN_IMAGE_TAG}"# Tag the image locally without pushing to a registrydocker tag "${IMAGE_NAME}:latest" "${IMAGE_NAME}:${CUSTOM_IMAGE_TAG}"echo "Docker image '${IMAGE_NAME}:${CUSTOM_IMAGE_TAG}' created successfully."
得到一个支持 vllm 后端的 triton server docker image 构建脚本 build_triton.sh :
#!/usr/bin/env bashTRITON_VERSION="${1}"[[ -z "${TRITON_VERSION}" ]] && TRITON_VERSION="25.04"IMAGE_NAME="tritonserver"OFFICIAL_MIN_IMAGE_TAG="${TRITON_VERSION}-py3-igpu-min"CUSTOM_IMAGE_TAG="${TRITON_VERSION}-igpu-vllm-py3"# Create a directory for Triton and clone the repositoryrm -rf tritonmkdir triton && cd tritongit clone --recurse-submodules https://github.com/triton-inference-server/server.gitcd server# Checkout the desired Triton versiongit checkout "r${TRITON_VERSION}"# Build the Triton Inference Serversudo python3 build.py \ --build-parallel 10 \ --no-force-clone \ --target-platform igpu \ --target-machine aarch64 \ --enable-gpu \ --enable-mali-gpu \ --enable-metrics \ --enable-logging \ --enable-stats \ --enable-cpu-metrics \ --enable-nvtx \ --backend python:r${TRITON_VERSION} \ --backend vllm:r${TRITON_VERSION} \ --endpoint http \ --endpoint grpc \ --min-compute-capability "5.3" \ --image "base,nvcr.io/nvidia/${IMAGE_NAME}:${OFFICIAL_MIN_IMAGE_TAG}" \ --image "gpu-base,nvcr.io/nvidia/${IMAGE_NAME}:${OFFICIAL_MIN_IMAGE_TAG}"# Tag the image locally without pushing to a registrydocker tag "${IMAGE_NAME}:latest" "${IMAGE_NAME}:${CUSTOM_IMAGE_TAG}"echo "Docker image '${IMAGE_NAME}:${CUSTOM_IMAGE_TAG}' created successfully."
编译的时候会出现网络的问题,需要做一些
准备工作
修改 triton-inference-server/server/build.py 文件
代理打开 ALLOW LAN
构建
./build_triton.sh 25.04
又出现了
问题 1 (看后面的结论,这个问题及问题 2 都不会出现的)
156.2 ERROR: Could not find a version that satisfies the requirement triton==3.2.0; platform_machine != "ppc64le" (from vllm) (from versions: none) 156.2 ERROR: No matching distribution found for triton==3.2.0; platform_machine != "ppc64le"
python3 -m vllm.entrypoints.openai.api_server --model "google/gemma-3-1b-it"INFO 05-15 15:10:19 [__init__.py:239] Automatically detected platform cuda.Traceback (most recent call last): File "<frozen runpy>", line 189, in _run_module_as_main File "<frozen runpy>", line 112, in _get_module_details File "/usr/local/lib/python3.12/dist-packages/vllm/__init__.py", line 12, in <module> from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 16, in <module> from vllm.config import (CacheConfig, CompilationConfig, ConfigFormat, File "/usr/local/lib/python3.12/dist-packages/vllm/config.py", line 30, in <module> from vllm.model_executor.layers.quantization import (QUANTIZATION_METHODS, File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/__init__.py", line 3, in <module> from vllm.model_executor.parameter import (BasevLLMParameter, File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/parameter.py", line 9, in <module> from vllm.distributed import get_tensor_model_parallel_rank File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/__init__.py", line 3, in <module> from .communication_op import * File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/communication_op.py", line 8, in <module> from .parallel_state import get_tp_group File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 122, in <module> from vllm.platforms import current_platform File "/usr/local/lib/python3.12/dist-packages/vllm/platforms/__init__.py", line 271, in __getattr__ _current_platform = resolve_obj_by_qualname( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2009, in resolve_obj_by_qualname module = importlib.import_module(module_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module return _bootstrap._gcd_import(name[level:], package, level) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/vllm/platforms/cuda.py", line 15, in <module> import vllm._C # noqa ^^^^^^^^^^^^^^ModuleNotFoundError: No module named 'vllm._C'
File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist File "/usr/local/lib/python3.12/dist-packages/transformers/utils/import_utils.py", line 1955, in __getattr__ module = self._get_module(self._class_to_module[name]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/utils/import_utils.py", line 1969, in _get_module raise RuntimeError(RuntimeError: Failed to import transformers.processing_utils because of the following error (look up to see its traceback):operator torchvision::nms does not exist
找到了这个相关的资料:
PyTorch and Torvision version issue: RuntimeError: operator torchvision::nms does not exist