API Usage Examples

Learn how to use the SDK with practical examples for video analysis and multi-turn conversations.

Best Practices

Follow these guidelines to develop efficient applications:

Load the model once at startup
Loading large models is time-consuming
Load once and keep in memory for the application lifetime
Warm up the model for real-time applications
Run a few dummy inferences after loading to initialize internal buffers
Reduces first-request latency, especially for large videos or multiple images
Reuse the loaded model for multiple requests
Do not reload the model for each request
Store the model instance in a persistent object or global variable

Getting Started

Production Deployment

The examples below demonstrate single inference calls. For production applications, maintain a persistent process that keeps the model in memory across multiple requests. Use frameworks like FastAPI or Flask to load the model at startup and serve inference requests without reloading.

Example Video Input:

Example 1: Basic Inference

Python

from woven.vision.ai.wave_<model>.vllm_client import VLLMClient

# Initialize model (GPU Device ID defaults to 0)
client = VLLMClient()

video_path = "your/test/video.mp4"
prompt = "Describe the video in detail, including objects, actions, and context."
generated_text = client.video_chat(
    prompt=prompt, video_path=video_path
)
print(generated_text)

Output:

Text

The video takes place in a bustling coffee shop with a modern, industrial aesthetic.
A barista, dressed in a black t-shirt and grey apron, is seen preparing coffee.
The barista is focused on the task, using a coffee grinder and a portafilter.
The environment is well-lit with natural light streaming in, highlighting the clean,
organized counter and the array of coffee-making equipment.
The barista's movements are precise and practiced, indicating experience.
In the background, another barista is visible, adding to the busy atmosphere.
The colors are warm and inviting,
with the reds of the coffee cups and the metallic sheen of the equipment contrasting against
the wooden shelves and brick walls.

Example 2: Multi-Turn Conversations

Maintain conversation context using chat history:

Python

import json
from woven.vision.ai.wave_<model>.vllm_client import VLLMClient

# Initialize model
client = VLLMClient()

# Set return_history=True to get chat history
# The function returns (response, chat_history) tuple

video_path = "your/test/video.mp4"
chat_history = None  # Maintained across turns

# Turn 1: Initial question
prompt1 = "What is happening in this video?"
response1, chat_history = client.video_chat(
    prompt=prompt1,
    video_path=video_path,
    return_history=True,
    chat_history=chat_history,
)
print(f"Response 1:\n{response1}")

# Turn 2: Follow-up question
prompt2 = "Can you describe the main character's actions in more detail?"
response2, chat_history = client.video_chat(
    prompt=prompt2,
    video_path=video_path,
    chat_history=chat_history,
    return_history=True,
)
print(f"Response 2:\n{response2}")

# Turn 3: Another follow-up
prompt3 = "What emotions do you think the character is expressing?"
response3, chat_history = client.video_chat(
    prompt=prompt3,
    video_path=video_path,
    chat_history=chat_history,
    return_history=True,
)
print(f"Response 3:\n{response3}")
print(f"Chat History:\n{json.dumps(chat_history, indent=2, ensure_ascii=False)}")

Output (chat_history):

JSON

[
  {
    "role": "<|im_start|>user",
    "content": "<image>\nThe video lasts 21.56s. 64 frames sampled (sample times ≈ ['0.1', '0.4', '0.8', '1.1', '1.5', '1.8', '2.1', '2.5', '2.8', '3.2']s...). Total frames: 64. What is happening in this video?"
  },
  {
    "role": "<|im_start|>assistant",
    "content": "A woman in a coffee shop is preparing a drink. She is wearing a black shirt and a gray apron. She is standing behind a counter with a coffee grinder, a coffee maker, and other coffee-making equipment. She is pouring coffee grounds into the grinder and then grinding them. A man in a white shirt and a gray apron is standing behind her, preparing a drink."
  },
  {
    "role": "<|im_start|>user",
    "content": "Can you describe the main character's actions in more detail?"
  },
  {
    "role": "<|im_start|>assistant",
    "content": "The woman is carefully pouring coffee grounds from a bag into the grinder. She then places the lid on the grinder and turns it on. She watches the coffee grounds being ground, then removes the lid and sets it aside."
  },
  {
    "role": "<|im_start|>user",
    "content": "What emotions do you think the character is expressing?"
  },
  {
    "role": "<|im_start|>assistant",
    "content": "The woman appears to be focused and attentive as she prepares the drink. She seems to be enjoying her work and is taking her time to ensure that the drink is made correctly."
  }
]

← Previous: Sampling Methods

Next: FAQ →