Skip to content

KlingAI Video Generation - Ultra-Detailed Workflow for Claude Code

🎯 Purpose

This document provides step-by-step instructions for Claude Code to automate KlingAI video generation with proper verification. This is specifically for the USA-based Claude instance that needs more detailed guidance.


⚠️ CRITICAL CONCEPTS

1. Why Incremental (Not Batch)?

  • Cost Protection: Each video = 35 credits. Verify after 1 video, not 6.
  • Quality Control: Catch design/text problems at video 1, not after wasting 210 credits on 6 bad videos.
  • Proven Failures:
  • Oct 28, 2025: Generated all 6 videos → All had wrong mascot → 210 credits wasted
  • Oct 30, 2025: B12 video used "zoom in" → Text disappeared → Caught early, only 35 credits lost

2. Two Verification Types (BOTH REQUIRED)

A. Mascot Design Verification (Compare to reference image) - Face color: Cream/beige (NOT purple) - Eyebrows: Brown (NOT purple) - Eye outlines: White circles - Cheeks: Soft pink blush

B. Text Visibility Verification (Check 3 frames) - Frame 1 (0s): Text visible? - Frame 2 (2.5s): Text STILL visible? - Frame 3 (5s): Text STILL visible at end?


🛠️ TOOLS YOU HAVE AVAILABLE

Playwright Tools (Browser Automation)

mcp__playwright__playwright_navigate - Open KlingAI in browser
mcp__playwright__playwright_screenshot - Take screenshot and SEE the page
mcp__playwright__playwright_fill - Type into text fields
mcp__playwright__playwright_click - Click buttons
mcp__playwright__playwright_upload_file - Upload image files
mcp__playwright__playwright_get_visible_text - Read text from page
mcp__playwright__playwright_close - Close browser

File Tools (Frame Extraction & Verification)

Bash - Run ffmpeg commands to extract frames
Read - View images visually (YOU CAN SEE IMAGES!)

YOUR SUPERPOWER: Multimodal Vision

  • When you use Read on an image, you can SEE it visually
  • You can compare mascot design to reference
  • You can check if text is readable in video frames

📋 COMPLETE WORKFLOW (DO THIS FOR EACH SYMPTOM)

SYMPTOM 1 (Do this first, then verify before moving to Symptom 2)

Step 1: Generate Image in ChatGPT

YOU CANNOT AUTOMATE THIS - The user must do this manually in ChatGPT because: - ChatGPT in Norway gives better text rendering - Regional differences in DALL-E quality - User has the Nutri-E Brand Assets project with mascot reference

What user provides you: - Downloaded image file path (e.g., /Users/post/Downloads/vitamin-d-fatigue.png)

Step 2: Upload Image to KlingAI and Generate Video

2.1: Open KlingAI

Tool: mcp__playwright__playwright_navigate
Parameters:
  url: "https://klingai.com"
  headless: false  (so user can see what's happening)
  width: 1920
  height: 1080

2.2: Take Screenshot to See Current State

Tool: mcp__playwright__playwright_screenshot
Parameters:
  name: "klingai-homepage"
  savePng: true
  downloadsDir: "/Users/post/Downloads/klingai-screenshots"

IMPORTANT: After screenshot, USE YOUR EYES: - Can you see the upload button? - Is user logged in? - Any error messages?

2.3: Navigate to Video Generation Based on what you SEE in the screenshot, click the appropriate button.

Common selectors (try in order):

Tool: mcp__playwright__playwright_click
Parameters:
  selector: "button:has-text('Generate Video')"

OR: selector: "a[href*='video']"
OR: selector: ".video-generation-button"

If click fails: Take another screenshot and analyze what's visible.

2.4: Upload Image

Tool: mcp__playwright__playwright_upload_file
Parameters:
  selector: "input[type='file']"
  filePath: "/Users/post/Downloads/vitamin-d-fatigue.png"

2.5: Fill in Prompt (CRITICAL TEXT VISIBILITY RULES)

PROMPT TEMPLATE (USE THIS EXACTLY):

static camera, no camera movement, keep all text visible throughout entire video,
[CHARACTER]: Nutri-E mascot (purple hooded character with cream/beige face)
[EMOTION]: showing [tired/weak/anxious/etc.] expression,
[MOVEMENT]: subtle [yawning/drooping/shaking/etc.] animation,
NO zoom, NO camera pan, text must remain fully visible from start to end,
5 seconds, vertical video, maintain all text readability

WHY THESE RULES: - ❌ "zoom in" / "camera zoom" → Text disappears (proven failure Oct 30) - ✅ "static camera" + "keep all text visible" → Text stays readable - ✅ "no camera movement" → Prevents accidental text cropping

Tool: mcp__playwright__playwright_fill
Parameters:
  selector: "textarea[name='prompt']"  (or find via screenshot)
  value: "[Your detailed prompt from template above]"

2.6: Start Generation

Tool: mcp__playwright__playwright_click
Parameters:
  selector: "button:has-text('Generate')"

2.7: Monitor Progress

Set up a loop to check every 30 seconds:

Loop:
  1. Take screenshot
  2. Read visible text
  3. Look for:
     - "Generating..." → Keep waiting
     - "Complete" → Proceed to download
     - "Error" → Alert user
     - Progress percentage → Report to user
  4. Wait 30 seconds
  5. Repeat

2.8: Download Video

When generation complete:

Tool: mcp__playwright__playwright_click
Parameters:
  selector: "button:has-text('Download')"

OR: selector: "a[download]"

Video will download to: /Users/post/Downloads/

Step 3: Extract Frames for Verification

3.1: Find Downloaded Video

Tool: Bash
Command: ls -lt ~/Downloads/*.mp4 | head -1
Description: Find most recent video file

3.2: Extract 3 Frames (Start, Middle, End)

Tool: Bash
Command: ffmpeg -i ~/Downloads/[video-name].mp4 -vf "select='eq(t\,0)+eq(t\,2.5)+eq(t\,5)'" -vsync 0 ~/Downloads/klingai-frames/symptom1-frame-%d.png -y
Description: Extract frames at 0s, 2.5s, 5s

This creates: - symptom1-frame-1.png (0 seconds) - symptom1-frame-2.png (2.5 seconds) - symptom1-frame-3.png (5 seconds)

Step 4: VERIFICATION (CRITICAL - DO NOT SKIP)

4.1: Get Mascot Reference Image

Tool: Read
Parameters:
  file_path: "website/images/nutri-e-mascot.png"

NOW YOU CAN SEE THE REFERENCE - Remember these details: - Face: Cream/beige color INSIDE purple hood - Eyebrows: Brown (not purple!) - Eyes: White circular outlines with black pupils - Cheeks: Soft pink blush - Body: Purple hoodie/robe

4.2: Verify Frame 1 (Start of Video)

Tool: Read
Parameters:
  file_path: "/Users/post/Downloads/klingai-frames/symptom1-frame-1.png"

CHECK THESE THINGS: 1. Mascot Design: - ✓ Face cream/beige? (NOT purple face) - ✓ Eyebrows brown? (NOT purple eyebrows) - ✓ White eye outlines? (NOT missing) - ✓ Pink cheeks? (NOT missing)

  1. Text Visibility:
  2. ✓ All text fully visible?
  3. ✓ No text cut off at edges?
  4. ✓ Text sharp and readable?

4.3: Verify Frame 2 (Middle of Video)

Tool: Read
Parameters:
  file_path: "/Users/post/Downloads/klingai-frames/symptom1-frame-2.png"

CHECK: - ✓ Text STILL fully visible? (Check for zoom issues) - ✓ No text moved out of frame?

4.4: Verify Frame 3 (End of Video)

Tool: Read
Parameters:
  file_path: "/Users/post/Downloads/klingai-frames/symptom1-frame-3.png"

CHECK: - ✓ Text STILL fully visible at end? - ✓ Mascot still looks correct?

Step 5: DECISION POINT (STOP HERE IF FAIL)

IF ALL CHECKS PASS ✅: - Report to user: "Symptom 1 verified ✅. Ready for Symptom 2." - Save video with proper name: vitamin-d-fatigue-verified.mp4 - PROCEED TO SYMPTOM 2

IF ANY CHECK FAILS ❌: - STOP IMMEDIATELY - Report specific failures to user: - "Mascot face is purple (should be cream/beige)" - "Text disappeared at 2.5s (zoom issue)" - "Eyebrows are purple (should be brown)" - DO NOT PROCEED TO SYMPTOM 2 - Wait for user to fix prompt or provide new image - Cost: Only 35 credits lost (not 210)


SYMPTOM 2 (Only after Symptom 1 passes)

Repeat Steps 1-5 above with new image for Symptom 2.

DO NOT BATCH - Verify Symptom 2 before moving to Symptom 3.


SYMPTOM 3-6 (One at a time)

Repeat the same workflow for each remaining symptom, verifying each one individually.


🚨 COMMON MISTAKES TO AVOID

Mistake 1: Batching Videos

WRONG: Generate all 6 videos → Download all → Verify all ✅ RIGHT: Generate 1 → Download 1 → Verify 1 → Proceed

Mistake 2: Skipping Frame Extraction

WRONG: Just download video and assume it's good ✅ RIGHT: Extract 3 frames and verify text visibility throughout

Mistake 3: Not Using Visual Verification

WRONG: Check file exists and call it done ✅ RIGHT: Use Read tool to actually SEE frames and check mascot/text

Mistake 4: Ignoring Prompt Rules

WRONG: Use "zoom in" or "camera movement" in prompt ✅ RIGHT: Use "static camera, no zoom, keep all text visible"

Mistake 5: Not Comparing to Reference

WRONG: Trust that mascot looks right without checking ✅ RIGHT: Read reference image first, then compare each frame


🔧 TROUBLESHOOTING

Problem: Can't Find Upload Button

Solution: 1. Take screenshot 2. Use playwright_get_visible_text to see all text 3. Look for alternative selectors 4. Try: button:has-text('Upload'), input[type='file'], .upload-btn

Problem: Video Download Not Starting

Solution: 1. Take screenshot to see current state 2. Check for error messages 3. Look for "Share" or "Export" buttons instead of "Download" 4. Try right-click context menu on video player

Problem: Frame Extraction Fails

Solution:

# Check video file exists
ls -lh ~/Downloads/*.mp4

# Check video duration
ffmpeg -i ~/Downloads/video.mp4 2>&1 | grep Duration

# Try different frame selection (if video is shorter)
ffmpeg -i video.mp4 -vf "select='eq(n\,0)+eq(n\,75)+eq(n\,150)'" -vsync 0 frames/frame-%d.png -y

Problem: Can't See Images with Read Tool

Solution: - Make sure file path is absolute (starts with /Users/) - Check file exists: ls -lh /path/to/image.png - Verify it's a valid image: file /path/to/image.png


📊 SUCCESS CHECKLIST

After completing ALL 6 symptoms, you should have:

  • ✅ 6 verified videos (1 per symptom)
  • ✅ 18 frame images (3 per video)
  • ✅ All mascot designs match reference
  • ✅ All text visible throughout all videos
  • ✅ NIH citations visible and readable
  • ✅ Total cost: 210 credits (if all passed first try)
  • ✅ OR: 210 + (35 × regenerations) if some needed fixes

💡 KEY TAKEAWAYS FOR USA CLAUDE

  1. You can SEE images - Use the Read tool liberally
  2. Take screenshots constantly - Before and after every action
  3. Verify incrementally - One symptom at a time
  4. Use the prompt template - Proven to prevent text disappearance
  5. Compare to reference - Always check mascot against /website/images/nutri-e-mascot.png
  6. Stop on first failure - Don't waste credits on remaining videos if one fails
  7. Report clearly - Tell user exactly what passed/failed with specific details

📝 EXAMPLE VERIFICATION REPORT

SYMPTOM 1: Vitamin D Fatigue
✅ Video generated successfully
✅ Frame 1 (0s): Mascot design correct, text fully visible
✅ Frame 2 (2.5s): Text still fully visible, no zoom issues
✅ Frame 3 (5s): Text remains visible, mascot consistent
STATUS: VERIFIED ✅ - Ready for Symptom 2

SYMPTOM 2: Vitamin D Weakness
✅ Video generated successfully
❌ Frame 1 (0s): Mascot face is PURPLE (should be cream/beige)
❌ Frame 2 (2.5s): Eyebrows are PURPLE (should be brown)
STATUS: FAILED ❌ - STOP HERE
ACTION: Regenerate with corrected mascot design
COST: 35 credits (saved 140 by not generating Symptoms 3-6)

🎓 UNDERSTANDING YOUR CAPABILITIES

What You CAN Do:

  • Control browsers with Playwright
  • Take screenshots and SEE them visually
  • Read images and compare designs
  • Extract video frames with ffmpeg
  • Make intelligent decisions based on visual feedback
  • Adapt to UI changes dynamically

What You CANNOT Do:

  • Generate images in ChatGPT (user must do this in Norway for better text)
  • Access KlingAI API directly (must use browser automation)
  • Skip verification (this is critical to prevent credit waste)

Why You're Better Than Hardcoded Scripts:

  • You have VISION - can see and verify quality
  • You can ADAPT - handle UI changes dynamically
  • You can DECIDE - stop on failures intelligently
  • You can REPORT - explain what went wrong specifically

  • Main workflow: /MARKETING_GUIDE.md (KlingAI + CapCut section)
  • Mascot reference: /website/images/nutri-e-mascot.png
  • Project instructions: /CLAUDE.md

Remember: The incremental workflow exists to protect against costly mistakes. Trust the process, verify thoroughly, and stop immediately if anything fails. Your vision capabilities are your superpower - use them! 👁️✨