KlingAI Video Generation - Ultra-Detailed Workflow for Claude Code¶

🎯 Purpose¶

This document provides step-by-step instructions for Claude Code to automate KlingAI video generation with proper verification. This is specifically for the USA-based Claude instance that needs more detailed guidance.

⚠️ CRITICAL CONCEPTS¶

1. Why Incremental (Not Batch)?¶

Cost Protection: Each video = 35 credits. Verify after 1 video, not 6.
Quality Control: Catch design/text problems at video 1, not after wasting 210 credits on 6 bad videos.
Proven Failures:
Oct 28, 2025: Generated all 6 videos → All had wrong mascot → 210 credits wasted
Oct 30, 2025: B12 video used "zoom in" → Text disappeared → Caught early, only 35 credits lost

2. Two Verification Types (BOTH REQUIRED)¶

A. Mascot Design Verification (Compare to reference image) - Face color: Cream/beige (NOT purple) - Eyebrows: Brown (NOT purple) - Eye outlines: White circles - Cheeks: Soft pink blush

B. Text Visibility Verification (Check 3 frames) - Frame 1 (0s): Text visible? - Frame 2 (2.5s): Text STILL visible? - Frame 3 (5s): Text STILL visible at end?

🛠️ TOOLS YOU HAVE AVAILABLE¶

Playwright Tools (Browser Automation)¶

mcp__playwright__playwright_navigate - Open KlingAI in browser
mcp__playwright__playwright_screenshot - Take screenshot and SEE the page
mcp__playwright__playwright_fill - Type into text fields
mcp__playwright__playwright_click - Click buttons
mcp__playwright__playwright_upload_file - Upload image files
mcp__playwright__playwright_get_visible_text - Read text from page
mcp__playwright__playwright_close - Close browser

File Tools (Frame Extraction & Verification)¶

Bash - Run ffmpeg commands to extract frames
Read - View images visually (YOU CAN SEE IMAGES!)

YOUR SUPERPOWER: Multimodal Vision¶

When you use Read on an image, you can SEE it visually
You can compare mascot design to reference
You can check if text is readable in video frames

📋 COMPLETE WORKFLOW (DO THIS FOR EACH SYMPTOM)¶

SYMPTOM 1 (Do this first, then verify before moving to Symptom 2)¶

Step 1: Generate Image in ChatGPT¶

YOU CANNOT AUTOMATE THIS - The user must do this manually in ChatGPT because: - ChatGPT in Norway gives better text rendering - Regional differences in DALL-E quality - User has the Nutri-E Brand Assets project with mascot reference

What user provides you: - Downloaded image file path (e.g., /Users/post/Downloads/vitamin-d-fatigue.png)

Step 2: Upload Image to KlingAI and Generate Video¶

2.1: Open KlingAI

Tool: mcp__playwright__playwright_navigate
Parameters:
  url: "https://klingai.com"
  headless: false  (so user can see what's happening)
  width: 1920
  height: 1080

2.2: Take Screenshot to See Current State

Tool: mcp__playwright__playwright_screenshot
Parameters:
  name: "klingai-homepage"
  savePng: true
  downloadsDir: "/Users/post/Downloads/klingai-screenshots"

IMPORTANT: After screenshot, USE YOUR EYES: - Can you see the upload button? - Is user logged in? - Any error messages?

2.3: Navigate to Video Generation Based on what you SEE in the screenshot, click the appropriate button.

Common selectors (try in order):

Tool: mcp__playwright__playwright_click
Parameters:
  selector: "button:has-text('Generate Video')"

OR: selector: "a[href*='video']"
OR: selector: ".video-generation-button"

If click fails: Take another screenshot and analyze what's visible.

2.4: Upload Image

Tool: mcp__playwright__playwright_upload_file
Parameters:
  selector: "input[type='file']"
  filePath: "/Users/post/Downloads/vitamin-d-fatigue.png"

2.5: Fill in Prompt (CRITICAL TEXT VISIBILITY RULES)

PROMPT TEMPLATE (USE THIS EXACTLY):

static camera, no camera movement, keep all text visible throughout entire video,
[CHARACTER]: Nutri-E mascot (purple hooded character with cream/beige face)
[EMOTION]: showing [tired/weak/anxious/etc.] expression,
[MOVEMENT]: subtle [yawning/drooping/shaking/etc.] animation,
NO zoom, NO camera pan, text must remain fully visible from start to end,
5 seconds, vertical video, maintain all text readability

WHY THESE RULES: - ❌ "zoom in" / "camera zoom" → Text disappears (proven failure Oct 30) - ✅ "static camera" + "keep all text visible" → Text stays readable - ✅ "no camera movement" → Prevents accidental text cropping

Tool: mcp__playwright__playwright_fill
Parameters:
  selector: "textarea[name='prompt']"  (or find via screenshot)
  value: "[Your detailed prompt from template above]"

2.6: Start Generation

Tool: mcp__playwright__playwright_click
Parameters:
  selector: "button:has-text('Generate')"

2.7: Monitor Progress

Set up a loop to check every 30 seconds:

Loop:
  1. Take screenshot
  2. Read visible text
  3. Look for:
     - "Generating..." → Keep waiting
     - "Complete" → Proceed to download
     - "Error" → Alert user
     - Progress percentage → Report to user
  4. Wait 30 seconds
  5. Repeat

2.8: Download Video

When generation complete:

Tool: mcp__playwright__playwright_click
Parameters:
  selector: "button:has-text('Download')"

OR: selector: "a[download]"

Video will download to: /Users/post/Downloads/

Step 3: Extract Frames for Verification¶

3.1: Find Downloaded Video

Tool: Bash
Command: ls -lt ~/Downloads/*.mp4 | head -1
Description: Find most recent video file

3.2: Extract 3 Frames (Start, Middle, End)

Tool: Bash
Command: ffmpeg -i ~/Downloads/[video-name].mp4 -vf "select='eq(t\,0)+eq(t\,2.5)+eq(t\,5)'" -vsync 0 ~/Downloads/klingai-frames/symptom1-frame-%d.png -y
Description: Extract frames at 0s, 2.5s, 5s

This creates: - symptom1-frame-1.png (0 seconds) - symptom1-frame-2.png (2.5 seconds) - symptom1-frame-3.png (5 seconds)

Step 4: VERIFICATION (CRITICAL - DO NOT SKIP)¶

4.1: Get Mascot Reference Image

Tool: Read
Parameters:
  file_path: "website/images/nutri-e-mascot.png"

NOW YOU CAN SEE THE REFERENCE - Remember these details: - Face: Cream/beige color INSIDE purple hood - Eyebrows: Brown (not purple!) - Eyes: White circular outlines with black pupils - Cheeks: Soft pink blush - Body: Purple hoodie/robe

4.2: Verify Frame 1 (Start of Video)

Tool: Read
Parameters:
  file_path: "/Users/post/Downloads/klingai-frames/symptom1-frame-1.png"

CHECK THESE THINGS: 1. Mascot Design: - ✓ Face cream/beige? (NOT purple face) - ✓ Eyebrows brown? (NOT purple eyebrows) - ✓ White eye outlines? (NOT missing) - ✓ Pink cheeks? (NOT missing)

Text Visibility:
✓ All text fully visible?
✓ No text cut off at edges?
✓ Text sharp and readable?

4.3: Verify Frame 2 (Middle of Video)

Tool: Read
Parameters:
  file_path: "/Users/post/Downloads/klingai-frames/symptom1-frame-2.png"

CHECK: - ✓ Text STILL fully visible? (Check for zoom issues) - ✓ No text moved out of frame?

4.4: Verify Frame 3 (End of Video)

Tool: Read
Parameters:
  file_path: "/Users/post/Downloads/klingai-frames/symptom1-frame-3.png"

CHECK: - ✓ Text STILL fully visible at end? - ✓ Mascot still looks correct?

Step 5: DECISION POINT (STOP HERE IF FAIL)¶

IF ALL CHECKS PASS ✅: - Report to user: "Symptom 1 verified ✅. Ready for Symptom 2." - Save video with proper name: vitamin-d-fatigue-verified.mp4 - PROCEED TO SYMPTOM 2

IF ANY CHECK FAILS ❌: - STOP IMMEDIATELY - Report specific failures to user: - "Mascot face is purple (should be cream/beige)" - "Text disappeared at 2.5s (zoom issue)" - "Eyebrows are purple (should be brown)" - DO NOT PROCEED TO SYMPTOM 2 - Wait for user to fix prompt or provide new image - Cost: Only 35 credits lost (not 210)

SYMPTOM 2 (Only after Symptom 1 passes)¶

Repeat Steps 1-5 above with new image for Symptom 2.

DO NOT BATCH - Verify Symptom 2 before moving to Symptom 3.

SYMPTOM 3-6 (One at a time)¶

Repeat the same workflow for each remaining symptom, verifying each one individually.

🚨 COMMON MISTAKES TO AVOID¶

Mistake 1: Batching Videos¶

❌ WRONG: Generate all 6 videos → Download all → Verify all ✅ RIGHT: Generate 1 → Download 1 → Verify 1 → Proceed

Mistake 2: Skipping Frame Extraction¶

❌ WRONG: Just download video and assume it's good ✅ RIGHT: Extract 3 frames and verify text visibility throughout

Mistake 3: Not Using Visual Verification¶

❌ WRONG: Check file exists and call it done ✅ RIGHT: Use Read tool to actually SEE frames and check mascot/text

Mistake 4: Ignoring Prompt Rules¶

❌ WRONG: Use "zoom in" or "camera movement" in prompt ✅ RIGHT: Use "static camera, no zoom, keep all text visible"

Mistake 5: Not Comparing to Reference¶

❌ WRONG: Trust that mascot looks right without checking ✅ RIGHT: Read reference image first, then compare each frame

🔧 TROUBLESHOOTING¶

Problem: Can't Find Upload Button¶

Solution: 1. Take screenshot 2. Use playwright_get_visible_text to see all text 3. Look for alternative selectors 4. Try: button:has-text('Upload'), input[type='file'], .upload-btn

Problem: Video Download Not Starting¶

Solution: 1. Take screenshot to see current state 2. Check for error messages 3. Look for "Share" or "Export" buttons instead of "Download" 4. Try right-click context menu on video player

Problem: Frame Extraction Fails¶

Solution:

# Check video file exists
ls -lh ~/Downloads/*.mp4

# Check video duration
ffmpeg -i ~/Downloads/video.mp4 2>&1 | grep Duration

# Try different frame selection (if video is shorter)
ffmpeg -i video.mp4 -vf "select='eq(n\,0)+eq(n\,75)+eq(n\,150)'" -vsync 0 frames/frame-%d.png -y

Problem: Can't See Images with Read Tool¶

Solution: - Make sure file path is absolute (starts with /Users/) - Check file exists: ls -lh /path/to/image.png - Verify it's a valid image: file /path/to/image.png

📊 SUCCESS CHECKLIST¶

After completing ALL 6 symptoms, you should have:

✅ 6 verified videos (1 per symptom)
✅ 18 frame images (3 per video)
✅ All mascot designs match reference
✅ All text visible throughout all videos
✅ NIH citations visible and readable
✅ Total cost: 210 credits (if all passed first try)
✅ OR: 210 + (35 × regenerations) if some needed fixes

💡 KEY TAKEAWAYS FOR USA CLAUDE¶

You can SEE images - Use the Read tool liberally
Take screenshots constantly - Before and after every action
Verify incrementally - One symptom at a time
Use the prompt template - Proven to prevent text disappearance
Compare to reference - Always check mascot against /website/images/nutri-e-mascot.png
Stop on first failure - Don't waste credits on remaining videos if one fails
Report clearly - Tell user exactly what passed/failed with specific details

📝 EXAMPLE VERIFICATION REPORT¶

SYMPTOM 1: Vitamin D Fatigue
✅ Video generated successfully
✅ Frame 1 (0s): Mascot design correct, text fully visible
✅ Frame 2 (2.5s): Text still fully visible, no zoom issues
✅ Frame 3 (5s): Text remains visible, mascot consistent
STATUS: VERIFIED ✅ - Ready for Symptom 2

SYMPTOM 2: Vitamin D Weakness
✅ Video generated successfully
❌ Frame 1 (0s): Mascot face is PURPLE (should be cream/beige)
❌ Frame 2 (2.5s): Eyebrows are PURPLE (should be brown)
STATUS: FAILED ❌ - STOP HERE
ACTION: Regenerate with corrected mascot design
COST: 35 credits (saved 140 by not generating Symptoms 3-6)

🎓 UNDERSTANDING YOUR CAPABILITIES¶

What You CAN Do:¶

Control browsers with Playwright
Take screenshots and SEE them visually
Read images and compare designs
Extract video frames with ffmpeg
Make intelligent decisions based on visual feedback
Adapt to UI changes dynamically

What You CANNOT Do:¶

Generate images in ChatGPT (user must do this in Norway for better text)
Access KlingAI API directly (must use browser automation)
Skip verification (this is critical to prevent credit waste)

Why You're Better Than Hardcoded Scripts:¶

You have VISION - can see and verify quality
You can ADAPT - handle UI changes dynamically
You can DECIDE - stop on failures intelligently
You can REPORT - explain what went wrong specifically

Main workflow: /MARKETING_GUIDE.md (KlingAI + CapCut section)
Mascot reference: /website/images/nutri-e-mascot.png
Project instructions: /CLAUDE.md

Remember: The incremental workflow exists to protect against costly mistakes. Trust the process, verify thoroughly, and stop immediately if anything fails. Your vision capabilities are your superpower - use them! 👁️✨