Animaker Project Portfolio
A. Talking-Head Video (AI Lip Sync of Animated Characters, Expressions, and Actions)
Link: https://www.steve.ai/
Role: Research Engineer at Animaker
Project Inception
The existing lip sync feature for animated characters lacked realism and dynamic adaptability to voiceovers, making them less human-like.
Challenges Faced
- Identifying the right model to generate realistic lip movements that could be transformed into animated characters.
- Model optimization and performance – improving inference speed, throughput/latency, and GPU consumption.
Roles and Responsibilities
1. PoC (Proof of Concept) Phase
- Conducted research and literature review on lip sync techniques.
- Identified and analyzed relevant research papers for implementation.
- Evaluated implementation results and guided further improvements.
2. Production Deployment Phase
- Model Optimization – enhanced efficiency for deployment.
- Model Inference – ensured accurate and efficient execution.
- Latency & Throughput Optimization – improved performance for real-time processing.
- Scalability on AWS GPU – optimized the model for large-scale deployment.
- Containerization – created Docker images and managed containerization.
- API Development – developed APIs and integrated them with the backend.
- Deployment & Scalability – ensured system-wide scalability and efficient resource utilization.
Technologies Used
- GPU Hardware: AWS GPU instances
- Model Optimization: Knowledge Distillation (Teacher-Student Training)
- Containerization: Docker
Project Impact
- Achieved realistic lip sync for animated characters in sync with voiceovers.
- Enabled adoption across business verticals – L&D, marketing, and content creation, supporting Talking-Head videos and conversational videos.
- Model Optimization (Knowledge Distillation) led to ~60% improvement in inference speed, enabling real-time processing.
- Reduced model size to 1.1GB on GPU RAM, increasing model workers and enabling parallel processing.
- Significant AWS GPU cost savings through optimized resource utilization.
- High adoption rate – with a 15M+ daily user base, an average of ~250 projects of this category are downloaded daily.
- Scalable and future-proof – the model can be adapted for any existing or future animated character designed at Animaker.
B. Text to GenAI Video
Link: https://www.steve.ai/
Role: Research Engineer at Animaker
Challenges Faced
- Deployment on AWS GPU instances – optimizing for faster inference and higher throughput.
- Load balancing and scalability – ensuring the model scales efficiently with increasing demand.
Solutions Implemented
- Utilized Segmind/SSD-1B model – a 50% smaller distilled version of SDXL, achieving a 60% speedup while maintaining high-quality text-to-image generation.
- Integrated Tiny AutoEncoder for Stable Diffusion – reduced computational overhead while preserving output quality.
- Optimized model inference using HuggingFace Diffusion Pipeline – improved efficiency and streamlined processing.
Technologies Used
- Stable Diffusion Text-to-Image Model: SDXL1.0
- Segmind/SSD-1B Model: optimized, distilled version of SDXL
- Tiny AutoEncoder for Stable Diffusion
- HuggingFace Diffusion Pipeline with optimization steps
- GPU Hardware: AWS GPU instances
- Containerization: Docker
Project Impact
- Successfully launched in Steve.ai (Steve 2.0) on Product Hunt as a key AI-powered feature.
- Second highest category in terms of projects created and exported since launch.
- With a 15M+ daily user base, an average of ~1,500 projects of this category are generated daily.
- Model optimizations and use of distilled/tiny models led to:
- Reduced AWS costs
- Faster inference and higher throughput
- Scalable solutions to support increasing load