Skip to content

API Reference

API Reference

This page provides the complete API documentation for the VLLMClient class.

This module provides a high-level client interface for interacting with vision-language model to perform multimodal inference.

Supported Features

  • Image-based chat

    • image_chat: Uses an image loaded from a file path.
    • rgb_image_chat: Uses an in-memory RGB image array.
  • Video-based chat

    • video_chat: Samples frames from a video according to configurable strategies before generating a context-aware textual response.

General information about API usages

- Input Validation
Please validate all inputs before passing them to the SDK and ensure that the content of the path is trustworthy. Invalid inputs may cause errors, and the SDK operates under the assumption that data has already been verified.

- Return Values
By default, the API returns only the generated text. When optional values are enabled such as return_history, return_sampling_info, the function returns a tuple in the following order text, history, sampling_info. Please ensure to unpack the returned tuple accordingly based on the enabled options.

VLLMClient

High-level API for running Visual Language Model (VLM) inference based on user prompts. Supports image- and video-based queries, enabling multimodal conversational interactions.

__init__

Initializes the VLLMClient

Parameters:

Name Type Description Default
device_id int

Identifier for the computation device (e.g., GPU ID) to use for inference. If None, the default device is selected automatically.

0

image_chat

Process an image with a user prompt to produce relevant text output.

Parameters:

Name Type Description Default
prompt str

Text instruction or question that guides the interpretation of the image and shapes the generated response.

required
image_path str

Path to the input image file. Supported formats include .png, .jpg, .jpeg.

required
chat_history list

A list of prior conversation entries. Defaults to None.

None
generation_config dict

Parameters that control the text generation process (e.g., temperature, top_k). Defaults to None and uses system settings.

None
return_history bool

If True, also returns the updated chat history. Defaults to False.

False

Returns:

Name Type Description
text str

AI-generated text derived from the video and prompt.

history list

Updated chat history; returned only if return_history is True.

rgb_image_chat

Process an image with a user prompt to produce relevant text output.

Parameters:

Name Type Description Default
prompt str

Instruction or question that guides the interpretation of the image and shapes the generated response.

required
image ndarray

RGB image in HWC format (Height, Width, Channels) as a NumPy array, with channels ordered RGB.

required
chat_history list

A list of prior conversation entries. Defaults to None.

None
generation_config dict

Parameters that control the text generation process (e.g., temperature, top_k). Defaults to None and uses system settings.

None
return_history bool

If True, also returns the updated chat history. Defaults to False.

False

Returns:

Name Type Description
text str

AI-generated text derived from the video and prompt.

history list

Updated chat history; returned only if return_history is True.

rgb_images_chat

Performs multimodal inference by analyzing an in-memory list of RGB image arrays alongside a user-provided prompt, generating a contextually relevant textual output.

Parameters:

Name Type Description Default
prompt str

Instruction or question that guides the interpretation of the image and shapes the generated response.

required
images list[ndarray]

List of RGB images, where each image is a NumPy array in HWC (Height, Width, Channels) format with channels ordered as RGB.

required
chat_history list

A list of prior conversation entries. Defaults to None.

None
generation_config dict

Parameters that control the text generation process (e.g., temperature, top_k). Defaults to None and uses system settings.

None
return_history bool

If True, also returns the updated chat history. Defaults to False.

False

Returns:

Name Type Description
text str

AI-generated text derived from the video and prompt.

history list

Updated chat history; returned only if return_history is True.

video_chat

Process an input video by sampling frames and analyzing them in context with a user prompt to produce text output.

Parameters:

Name Type Description Default
prompt str

Text instruction or question used to produce a tailored response from the video.

required
video_path str

Path to the input video file. Supported formats include .mp4, .mov, .avi, .3gp, .mkv, and .wmv.

required
sampling_method str

Method used for sampling frames from the video. Valid values are: "duration", "rand", "middle", "fps". Defaults to "duration". Please see detail at Sampling Methods section.

'duration'
sampling_fps float

Number of frames to select in 1 second. Only required when sampling_method is "fps".

None
min_num_frames int

Minimum number of frames to sample from the video, in multiples of 8. Defaults to 64. Specify when sampling_method is "duration", "rand", or "middle".

64
max_num_frames int

Maximum number of frames to sample from the video, in multiples of 8. Defaults to 512. Specify when sampling_method is "duration", "rand", or "middle".

512
generation_config dict

Configuration parameters for the text generation behavior control. Defaults to None and use system settings.

None
chat_history list

A list of prior conversation entries. Defaults to None.

None
return_history bool

If True, returns the updated chat history. Defaults to False.

False
return_sampling_info bool

If True, returns metadata of the video sampling process. Defaults to False. Please see detail at Sampling Info section.

False

Returns:

Name Type Description
text str

AI-generated text derived from the video and prompt.

history list

Updated chat history; returned only if return_history is True.

sampling_info dict

Sampling information; returned only if return_sampling_info is True.


Previous: SDK Installation