KlingAI Video Generation - Ultra-Detailed Workflow for Claude Code¶
🎯 Purpose¶
This document provides step-by-step instructions for Claude Code to automate KlingAI video generation with proper verification. This is specifically for the USA-based Claude instance that needs more detailed guidance.
⚠️ CRITICAL CONCEPTS¶
1. Why Incremental (Not Batch)?¶
- Cost Protection: Each video = 35 credits. Verify after 1 video, not 6.
- Quality Control: Catch design/text problems at video 1, not after wasting 210 credits on 6 bad videos.
- Proven Failures:
- Oct 28, 2025: Generated all 6 videos → All had wrong mascot → 210 credits wasted
- Oct 30, 2025: B12 video used "zoom in" → Text disappeared → Caught early, only 35 credits lost
2. Two Verification Types (BOTH REQUIRED)¶
A. Mascot Design Verification (Compare to reference image) - Face color: Cream/beige (NOT purple) - Eyebrows: Brown (NOT purple) - Eye outlines: White circles - Cheeks: Soft pink blush
B. Text Visibility Verification (Check 3 frames) - Frame 1 (0s): Text visible? - Frame 2 (2.5s): Text STILL visible? - Frame 3 (5s): Text STILL visible at end?
🛠️ TOOLS YOU HAVE AVAILABLE¶
Playwright Tools (Browser Automation)¶
mcp__playwright__playwright_navigate - Open KlingAI in browser
mcp__playwright__playwright_screenshot - Take screenshot and SEE the page
mcp__playwright__playwright_fill - Type into text fields
mcp__playwright__playwright_click - Click buttons
mcp__playwright__playwright_upload_file - Upload image files
mcp__playwright__playwright_get_visible_text - Read text from page
mcp__playwright__playwright_close - Close browser
File Tools (Frame Extraction & Verification)¶
YOUR SUPERPOWER: Multimodal Vision¶
- When you use
Readon an image, you can SEE it visually - You can compare mascot design to reference
- You can check if text is readable in video frames
📋 COMPLETE WORKFLOW (DO THIS FOR EACH SYMPTOM)¶
SYMPTOM 1 (Do this first, then verify before moving to Symptom 2)¶
Step 1: Generate Image in ChatGPT¶
YOU CANNOT AUTOMATE THIS - The user must do this manually in ChatGPT because: - ChatGPT in Norway gives better text rendering - Regional differences in DALL-E quality - User has the Nutri-E Brand Assets project with mascot reference
What user provides you:
- Downloaded image file path (e.g., /Users/post/Downloads/vitamin-d-fatigue.png)
Step 2: Upload Image to KlingAI and Generate Video¶
2.1: Open KlingAI
Tool: mcp__playwright__playwright_navigate
Parameters:
url: "https://klingai.com"
headless: false (so user can see what's happening)
width: 1920
height: 1080
2.2: Take Screenshot to See Current State
Tool: mcp__playwright__playwright_screenshot
Parameters:
name: "klingai-homepage"
savePng: true
downloadsDir: "/Users/post/Downloads/klingai-screenshots"
IMPORTANT: After screenshot, USE YOUR EYES: - Can you see the upload button? - Is user logged in? - Any error messages?
2.3: Navigate to Video Generation Based on what you SEE in the screenshot, click the appropriate button.
Common selectors (try in order):
Tool: mcp__playwright__playwright_click
Parameters:
selector: "button:has-text('Generate Video')"
OR: selector: "a[href*='video']"
OR: selector: ".video-generation-button"
If click fails: Take another screenshot and analyze what's visible.
2.4: Upload Image
Tool: mcp__playwright__playwright_upload_file
Parameters:
selector: "input[type='file']"
filePath: "/Users/post/Downloads/vitamin-d-fatigue.png"
2.5: Fill in Prompt (CRITICAL TEXT VISIBILITY RULES)
PROMPT TEMPLATE (USE THIS EXACTLY):
static camera, no camera movement, keep all text visible throughout entire video,
[CHARACTER]: Nutri-E mascot (purple hooded character with cream/beige face)
[EMOTION]: showing [tired/weak/anxious/etc.] expression,
[MOVEMENT]: subtle [yawning/drooping/shaking/etc.] animation,
NO zoom, NO camera pan, text must remain fully visible from start to end,
5 seconds, vertical video, maintain all text readability
WHY THESE RULES: - ❌ "zoom in" / "camera zoom" → Text disappears (proven failure Oct 30) - ✅ "static camera" + "keep all text visible" → Text stays readable - ✅ "no camera movement" → Prevents accidental text cropping
Tool: mcp__playwright__playwright_fill
Parameters:
selector: "textarea[name='prompt']" (or find via screenshot)
value: "[Your detailed prompt from template above]"
2.6: Start Generation
2.7: Monitor Progress
Set up a loop to check every 30 seconds:
Loop:
1. Take screenshot
2. Read visible text
3. Look for:
- "Generating..." → Keep waiting
- "Complete" → Proceed to download
- "Error" → Alert user
- Progress percentage → Report to user
4. Wait 30 seconds
5. Repeat
2.8: Download Video
When generation complete:
Tool: mcp__playwright__playwright_click
Parameters:
selector: "button:has-text('Download')"
OR: selector: "a[download]"
Video will download to: /Users/post/Downloads/
Step 3: Extract Frames for Verification¶
3.1: Find Downloaded Video
3.2: Extract 3 Frames (Start, Middle, End)
Tool: Bash
Command: ffmpeg -i ~/Downloads/[video-name].mp4 -vf "select='eq(t\,0)+eq(t\,2.5)+eq(t\,5)'" -vsync 0 ~/Downloads/klingai-frames/symptom1-frame-%d.png -y
Description: Extract frames at 0s, 2.5s, 5s
This creates:
- symptom1-frame-1.png (0 seconds)
- symptom1-frame-2.png (2.5 seconds)
- symptom1-frame-3.png (5 seconds)
Step 4: VERIFICATION (CRITICAL - DO NOT SKIP)¶
4.1: Get Mascot Reference Image
NOW YOU CAN SEE THE REFERENCE - Remember these details: - Face: Cream/beige color INSIDE purple hood - Eyebrows: Brown (not purple!) - Eyes: White circular outlines with black pupils - Cheeks: Soft pink blush - Body: Purple hoodie/robe
4.2: Verify Frame 1 (Start of Video)
CHECK THESE THINGS: 1. Mascot Design: - ✓ Face cream/beige? (NOT purple face) - ✓ Eyebrows brown? (NOT purple eyebrows) - ✓ White eye outlines? (NOT missing) - ✓ Pink cheeks? (NOT missing)
- Text Visibility:
- ✓ All text fully visible?
- ✓ No text cut off at edges?
- ✓ Text sharp and readable?
4.3: Verify Frame 2 (Middle of Video)
CHECK: - ✓ Text STILL fully visible? (Check for zoom issues) - ✓ No text moved out of frame?
4.4: Verify Frame 3 (End of Video)
CHECK: - ✓ Text STILL fully visible at end? - ✓ Mascot still looks correct?
Step 5: DECISION POINT (STOP HERE IF FAIL)¶
IF ALL CHECKS PASS ✅:
- Report to user: "Symptom 1 verified ✅. Ready for Symptom 2."
- Save video with proper name: vitamin-d-fatigue-verified.mp4
- PROCEED TO SYMPTOM 2
IF ANY CHECK FAILS ❌: - STOP IMMEDIATELY - Report specific failures to user: - "Mascot face is purple (should be cream/beige)" - "Text disappeared at 2.5s (zoom issue)" - "Eyebrows are purple (should be brown)" - DO NOT PROCEED TO SYMPTOM 2 - Wait for user to fix prompt or provide new image - Cost: Only 35 credits lost (not 210)
SYMPTOM 2 (Only after Symptom 1 passes)¶
Repeat Steps 1-5 above with new image for Symptom 2.
DO NOT BATCH - Verify Symptom 2 before moving to Symptom 3.
SYMPTOM 3-6 (One at a time)¶
Repeat the same workflow for each remaining symptom, verifying each one individually.
🚨 COMMON MISTAKES TO AVOID¶
Mistake 1: Batching Videos¶
❌ WRONG: Generate all 6 videos → Download all → Verify all ✅ RIGHT: Generate 1 → Download 1 → Verify 1 → Proceed
Mistake 2: Skipping Frame Extraction¶
❌ WRONG: Just download video and assume it's good ✅ RIGHT: Extract 3 frames and verify text visibility throughout
Mistake 3: Not Using Visual Verification¶
❌ WRONG: Check file exists and call it done
✅ RIGHT: Use Read tool to actually SEE frames and check mascot/text
Mistake 4: Ignoring Prompt Rules¶
❌ WRONG: Use "zoom in" or "camera movement" in prompt ✅ RIGHT: Use "static camera, no zoom, keep all text visible"
Mistake 5: Not Comparing to Reference¶
❌ WRONG: Trust that mascot looks right without checking ✅ RIGHT: Read reference image first, then compare each frame
🔧 TROUBLESHOOTING¶
Problem: Can't Find Upload Button¶
Solution:
1. Take screenshot
2. Use playwright_get_visible_text to see all text
3. Look for alternative selectors
4. Try: button:has-text('Upload'), input[type='file'], .upload-btn
Problem: Video Download Not Starting¶
Solution: 1. Take screenshot to see current state 2. Check for error messages 3. Look for "Share" or "Export" buttons instead of "Download" 4. Try right-click context menu on video player
Problem: Frame Extraction Fails¶
Solution:
# Check video file exists
ls -lh ~/Downloads/*.mp4
# Check video duration
ffmpeg -i ~/Downloads/video.mp4 2>&1 | grep Duration
# Try different frame selection (if video is shorter)
ffmpeg -i video.mp4 -vf "select='eq(n\,0)+eq(n\,75)+eq(n\,150)'" -vsync 0 frames/frame-%d.png -y
Problem: Can't See Images with Read Tool¶
Solution:
- Make sure file path is absolute (starts with /Users/)
- Check file exists: ls -lh /path/to/image.png
- Verify it's a valid image: file /path/to/image.png
📊 SUCCESS CHECKLIST¶
After completing ALL 6 symptoms, you should have:
- ✅ 6 verified videos (1 per symptom)
- ✅ 18 frame images (3 per video)
- ✅ All mascot designs match reference
- ✅ All text visible throughout all videos
- ✅ NIH citations visible and readable
- ✅ Total cost: 210 credits (if all passed first try)
- ✅ OR: 210 + (35 × regenerations) if some needed fixes
💡 KEY TAKEAWAYS FOR USA CLAUDE¶
- You can SEE images - Use the
Readtool liberally - Take screenshots constantly - Before and after every action
- Verify incrementally - One symptom at a time
- Use the prompt template - Proven to prevent text disappearance
- Compare to reference - Always check mascot against
/website/images/nutri-e-mascot.png - Stop on first failure - Don't waste credits on remaining videos if one fails
- Report clearly - Tell user exactly what passed/failed with specific details
📝 EXAMPLE VERIFICATION REPORT¶
SYMPTOM 1: Vitamin D Fatigue
✅ Video generated successfully
✅ Frame 1 (0s): Mascot design correct, text fully visible
✅ Frame 2 (2.5s): Text still fully visible, no zoom issues
✅ Frame 3 (5s): Text remains visible, mascot consistent
STATUS: VERIFIED ✅ - Ready for Symptom 2
SYMPTOM 2: Vitamin D Weakness
✅ Video generated successfully
❌ Frame 1 (0s): Mascot face is PURPLE (should be cream/beige)
❌ Frame 2 (2.5s): Eyebrows are PURPLE (should be brown)
STATUS: FAILED ❌ - STOP HERE
ACTION: Regenerate with corrected mascot design
COST: 35 credits (saved 140 by not generating Symptoms 3-6)
🎓 UNDERSTANDING YOUR CAPABILITIES¶
What You CAN Do:¶
- Control browsers with Playwright
- Take screenshots and SEE them visually
- Read images and compare designs
- Extract video frames with ffmpeg
- Make intelligent decisions based on visual feedback
- Adapt to UI changes dynamically
What You CANNOT Do:¶
- Generate images in ChatGPT (user must do this in Norway for better text)
- Access KlingAI API directly (must use browser automation)
- Skip verification (this is critical to prevent credit waste)
Why You're Better Than Hardcoded Scripts:¶
- You have VISION - can see and verify quality
- You can ADAPT - handle UI changes dynamically
- You can DECIDE - stop on failures intelligently
- You can REPORT - explain what went wrong specifically
🔗 RELATED DOCUMENTATION¶
- Main workflow:
/MARKETING_GUIDE.md(KlingAI + CapCut section) - Mascot reference:
/website/images/nutri-e-mascot.png - Project instructions:
/CLAUDE.md
Remember: The incremental workflow exists to protect against costly mistakes. Trust the process, verify thoroughly, and stop immediately if anything fails. Your vision capabilities are your superpower - use them! 👁️✨