Animaker Project Portfolio

A. Talking-Head Video (AI Lip Sync of Animated Characters, Expressions, and Actions)

Link: https://www.steve.ai/
Role: Research Engineer at Animaker

The existing lip sync feature for animated characters lacked realism and dynamic adaptability to voiceovers, making them less human-like.

Identifying the right model to generate realistic lip movements that could be transformed into animated characters.
Model optimization and performance – improving inference speed, throughput/latency, and GPU consumption.

Model Optimization – enhanced efficiency for deployment.
Model Inference – ensured accurate and efficient execution.
Latency & Throughput Optimization – improved performance for real-time processing.
Scalability on AWS GPU – optimized the model for large-scale deployment.
Containerization – created Docker images and managed containerization.
API Development – developed APIs and integrated them with the backend.
Deployment & Scalability – ensured system-wide scalability and efficient resource utilization.

Achieved realistic lip sync for animated characters in sync with voiceovers.
Enabled adoption across business verticals – L&D, marketing, and content creation, supporting Talking-Head videos and conversational videos.
Model Optimization (Knowledge Distillation) led to ~60% improvement in inference speed, enabling real-time processing.
Reduced model size to 1.1GB on GPU RAM, increasing model workers and enabling parallel processing.
Significant AWS GPU cost savings through optimized resource utilization.
High adoption rate – with a 15M+ daily user base, an average of ~250 projects of this category are downloaded daily.
Scalable and future-proof – the model can be adapted for any existing or future animated character designed at Animaker.

Link: https://www.steve.ai/
Role: Research Engineer at Animaker

Deployment on AWS GPU instances – optimizing for faster inference and higher throughput.
Load balancing and scalability – ensuring the model scales efficiently with increasing demand.

Utilized Segmind/SSD-1B model – a 50% smaller distilled version of SDXL, achieving a 60% speedup while maintaining high-quality text-to-image generation.
Integrated Tiny AutoEncoder for Stable Diffusion – reduced computational overhead while preserving output quality.
Optimized model inference using HuggingFace Diffusion Pipeline – improved efficiency and streamlined processing.

Successfully launched in Steve.ai (Steve 2.0) on Product Hunt as a key AI-powered feature.
Second highest category in terms of projects created and exported since launch.
With a 15M+ daily user base, an average of ~1,500 projects of this category are generated daily.
Model optimizations and use of distilled/tiny models led to:
- Reduced AWS costs
- Faster inference and higher throughput
- Scalable solutions to support increasing load