API Reference

This page provides the complete API documentation for the VLLMClient class.

This module provides a high-level client interface for interacting with vision-language model to perform multimodal inference.

Supported Features

Image-based chat
- image_chat: Uses an image loaded from a file path.
- rgb_image_chat: Uses an in-memory RGB image array.
Video-based chat
- video_chat: Samples frames from a video according to configurable strategies before generating a context-aware textual response.

General information about API usages

- Input Validation
Please validate all inputs before passing them to the SDK and ensure that the content of the path is trustworthy. Invalid inputs may cause errors, and the SDK operates under the assumption that data has already been verified.

- Return Values
By default, the API returns only the generated text. When optional values are enabled such as return_history, return_sampling_info, the function returns a tuple in the following order text, history, sampling_info. Please ensure to unpack the returned tuple accordingly based on the enabled options.

`VLLMClient`

High-level API for running Visual Language Model (VLM) inference based on user prompts. Supports image- and video-based queries, enabling multimodal conversational interactions.

`init`

Initializes the VLLMClient

Parameters:

Name	Type	Description	Default
`device_id`	`int`	Identifier for the computation device (e.g., GPU ID) to use for inference. If None, the default device is selected automatically.	`0`

`image_chat`

Process an image with a user prompt to produce relevant text output.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	Text instruction or question that guides the interpretation of the image and shapes the generated response.	required
`image_path`	`str`	Path to the input image file. Supported formats include .png, .jpg, .jpeg.	required
`chat_history`	`list`	A list of prior conversation entries. Defaults to None.	`None`
`generation_config`	`dict`	Parameters that control the text generation process (e.g., temperature, top_k). Defaults to None and uses system settings.	`None`
`return_history`	`bool`	If True, also returns the updated chat history. Defaults to False.	`False`

Returns:

Name	Type	Description
`text`	`str`	AI-generated text derived from the video and prompt.
`history`	`list`	Updated chat history; returned only if `return_history` is True.

`rgb_image_chat`

Process an image with a user prompt to produce relevant text output.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	Instruction or question that guides the interpretation of the image and shapes the generated response.	required
`image`	`ndarray`	RGB image in HWC format (Height, Width, Channels) as a NumPy array, with channels ordered RGB.	required
`chat_history`	`list`	A list of prior conversation entries. Defaults to None.	`None`
`generation_config`	`dict`	Parameters that control the text generation process (e.g., temperature, top_k). Defaults to None and uses system settings.	`None`
`return_history`	`bool`	If True, also returns the updated chat history. Defaults to False.	`False`

Returns:

Name	Type	Description
`text`	`str`	AI-generated text derived from the video and prompt.
`history`	`list`	Updated chat history; returned only if `return_history` is True.

`rgb_images_chat`

Performs multimodal inference by analyzing an in-memory list of RGB image arrays alongside a user-provided prompt, generating a contextually relevant textual output.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	Instruction or question that guides the interpretation of the image and shapes the generated response.	required
`images`	`list[ndarray]`	List of RGB images, where each image is a NumPy array in HWC (Height, Width, Channels) format with channels ordered as RGB.	required
`chat_history`	`list`	A list of prior conversation entries. Defaults to None.	`None`
`generation_config`	`dict`	Parameters that control the text generation process (e.g., temperature, top_k). Defaults to None and uses system settings.	`None`
`return_history`	`bool`	If True, also returns the updated chat history. Defaults to False.	`False`

Returns:

Name	Type	Description
`text`	`str`	AI-generated text derived from the video and prompt.
`history`	`list`	Updated chat history; returned only if `return_history` is True.

`video_chat`

Process an input video by sampling frames and analyzing them in context with a user prompt to produce text output.

Parameters:

Name	Type	Description	Default
`prompt`	`str`	Text instruction or question used to produce a tailored response from the video.	required
`video_path`	`str`	Path to the input video file. Supported formats include .mp4, .mov, .avi, .3gp, .mkv, and .wmv.	required
`sampling_method`	`str`	Method used for sampling frames from the video. Valid values are: "duration", "rand", "middle", "fps". Defaults to "duration". Please see detail at Sampling Methods section.	`'duration'`
`sampling_fps`	`float`	Number of frames to select in 1 second. Only required when `sampling_method` is "fps".	`None`
`min_num_frames`	`int`	Minimum number of frames to sample from the video, in multiples of 8. Defaults to 64. Specify when `sampling_method` is "duration", "rand", or "middle".	`64`
`max_num_frames`	`int`	Maximum number of frames to sample from the video, in multiples of 8. Defaults to 512. Specify when `sampling_method` is "duration", "rand", or "middle".	`512`
`generation_config`	`dict`	Configuration parameters for the text generation behavior control. Defaults to None and use system settings.	`None`
`chat_history`	`list`	A list of prior conversation entries. Defaults to None.	`None`
`return_history`	`bool`	If True, returns the updated chat history. Defaults to False.	`False`
`return_sampling_info`	`bool`	If True, returns metadata of the video sampling process. Defaults to False. Please see detail at Sampling Info section.	`False`

Returns:

Name	Type	Description
`text`	`str`	AI-generated text derived from the video and prompt.
`history`	`list`	Updated chat history; returned only if `return_history` is True.
`sampling_info`	`dict`	Sampling information; returned only if `return_sampling_info` is True.

← Previous: SDK Installation

Next: Generation Configuration →

API Reference

API Reference

Supported Features

VLLMClient

__init__

image_chat

rgb_image_chat

rgb_images_chat

video_chat

`VLLMClient`

`init`

`image_chat`

`rgb_image_chat`

`rgb_images_chat`

`video_chat`