Creative Scoring Framework for DTC Brands: Rate Your Ads
A creative scoring framework for DTC brands is a structured system that evaluates ad performance across a consistent set of metrics at each stage of the viewer funnel, allowing objective comparison between creatives and data-driven decisions about what to scale, optimize, or retire.
Last updated: February 2026
Table of Contents
Why DTC Brands Need a Scoring Framework
Without a scoring framework, creative decisions are subjective. The founder's favorite ad gets scaled. The creative director's aesthetic preferences drive production. The agency presents what looks best in their portfolio.
All of these inputs have some value but none reliably predict which creative will drive the most profitable revenue at scale. A structured scoring framework replaces opinion with data.
The benefits:
- Objective comparison between creatives across campaigns and dates
- Clear criteria for graduation (which ads to scale) and retirement (which to pause)
- Institutional knowledge that builds over time as your database grows
- Removes emotional attachment to specific creative executions
- Helps identify patterns across winners and losers that inform future briefs
MHI Media implements creative scoring frameworks for every DTC client. The brands that adopt systematic scoring consistently improve creative performance over time compared to those making ad-hoc decisions.
The Four-Stage Creative Funnel
Every viewer who encounters your video ad goes through a sequential funnel:
Stage 1: Scroll Stop (Impression to 3-second view)
Metric: Thumb Stop Rate (TSR) = 3-second views / impressions
Stage 2: Sustained Engagement (3-second view to click consideration)
Metric: Hold Rate = average watch % of video
Stage 3: Intent (Impression to landing page click)
Metric: Link CTR = link clicks / impressions
Stage 4: Conversion (Landing page visit to purchase)
Metric: Purchase CVR = purchases / landing page views
A creative only scores well if it performs at each stage. Many creatives are strong in stage 1 but weak in stage 2 or 3. The scoring framework identifies exactly where in the funnel a creative is losing potential buyers.
Building Your Scoring Metrics
Establish category-specific benchmarks before scoring. Your benchmarks should reflect your actual account performance, not generic industry averages.
How to establish your benchmarks:
- Pull 90 days of ad-level data from Ads Manager
- Calculate each metric per creative
- Find the median and top quartile for each metric
- Set your scoring thresholds at: Below Median (0-1 points), Median to Top Quartile (2-3 points), Above Top Quartile (4-5 points)
Your benchmarks will differ from another brand's benchmarks based on your category, price point, and audience. A skincare brand's CTR benchmarks will not apply to a supplement brand.
The MHI Media Creative Scoring Card
Here is the scoring card framework we use for DTC clients:
METRIC 1: Thumb Stop Rate (TSR)
Points scoring (for 30-second video in Feed):
- 0-15%: 0 points (hook is failing completely)
- 15-25%: 1 point (weak hook, needs improvement)
- 25-35%: 3 points (acceptable, average performance)
- 35-45%: 4 points (good hook, reliably stopping scrolls)
- 45%+: 5 points (exceptional hook)
METRIC 2: Hold Rate
Points scoring (30-second video):
- Under 20%: 0 points
- 20-30%: 1 point
- 30-40%: 3 points
- 40-50%: 4 points
- 50%+: 5 points
METRIC 3: Link CTR
Points scoring:
- Under 0.5%: 0 points
- 0.5-1.0%: 1 point
- 1.0-1.8%: 3 points
- 1.8-2.5%: 4 points
- 2.5%+: 5 points
METRIC 4: CPA Ratio (Your CPA / Target CPA)
Points scoring:
- Above 2x target: 0 points
- 1.5-2x target: 1 point
- 1.0-1.5x target: 3 points
- 0.8-1.0x target (at or below target): 4 points
- Under 0.8x target (significantly below target): 5 points
Total Score: 0-20 points
- 0-5: Retire immediately
- 6-10: Underperformer, consider pause
- 11-14: Acceptable, optimize specific weak stages
- 15-17: Good, maintain and test variations
- 18-20: Winner, scale aggressively
Applying the Framework in Practice
Minimum data requirements before scoring:
- Minimum 7 days runtime
- Minimum 1,000 impressions
- Minimum 20 landing page views
- Minimum 5 purchases (for reliable CPA)
Without sufficient data, scores are unreliable. Extend the measurement window for lower-budget accounts before scoring.
Scoring cadence:
Review all active creatives weekly. Update scores based on the most recent 7-day data (not all-time data). A creative that was a winner 3 months ago but is fatiguing now should see its TSR and CTR scores decline, triggering a review.
Score normalization:
Scores should be calculated using consistent date ranges. Comparing a creative's 7-day score against another's 30-day score introduces selection bias (30-day data includes more periods and is more stable).
Track scores over time:
Record each creative's score weekly. A declining score over 3+ consecutive weeks is a strong signal of fatigue. A score that improves in week 2 and 3 after launch suggests a creative that needed time to find its audience.
Using Scores to Make Decisions
Scale decisions (score 15-20):
Increase budget on winner's ad set or campaign by 15-20%. Add winning creative directly to your main prospecting CBO or ASC campaign. Begin producing variations of the winning concept.
Optimize decisions (score 11-14):
Identify which stage is weakest (lowest score component). For low TSR: create hook variations. For low hold rate: edit mid-video content. For low CTR with good TSR/hold: strengthen offer communication and CTA.
Retire decisions (score 0-10):
Pause the creative. Document what you learned: what angle it tested, why you think it underperformed, and what you will try differently next time. Never delete ads (preserve data).
Replacement pipeline:
When a winner begins showing declining scores week-over-week (even if still above threshold), that is the trigger to accelerate production of the next batch. Do not wait until the score drops below threshold to produce replacements.
Common Scoring Mistakes
Scoring too early: Evaluating a creative after 3 days and 200 impressions produces unreliable data. Wait for minimum data thresholds.
Ignoring data normalization: Comparing a creative's Q4 performance with its Q1 performance as if conditions were identical.
Over-weighting a single metric: Brands that score on CPA alone miss early creative quality signals. Brands that score on TSR alone scale creative that stops the scroll but does not convert.
Not tracking score trends: A weekly score is a point-in-time assessment. The trend over 4+ weeks is more informative than any individual score.
FAQ
Should I create my own benchmarks or use industry averages?
Create your own account-specific benchmarks. Industry averages from other brands are directionally useful but your specific product, audience, and creative style will produce different natural performance levels.
How do I score static image ads without video metrics?
Replace TSR and hold rate with alternative metrics: CTR (primary attention metric for static), image-level engagement rate if available, and CPA as the primary outcome metric.
Can the scoring framework identify winning creative concepts vs executions?
Yes. If multiple executions of the same concept all score highly (e.g., three different founder story ads all score 16-18), the concept itself is validated. If only one execution of a concept performs well, the execution quality is the differentiator.
How many data points should be in my benchmark dataset?
A minimum of 20 creative evaluations over 90+ days provides a reliable benchmark. More data produces more accurate thresholds.
Should agency partners use the same scoring framework?
Yes. Sharing your scoring framework with any agency or creative partner ensures they are producing content optimized for your actual performance metrics, not for what looks good in their portfolio.
How do I handle brand awareness campaigns in the scoring framework?
Adjust the CPA metric: for awareness campaigns, use cost-per-brand-interaction or cost-per-thousand-reach instead of CPA. The framework adapts to different campaign objectives by adjusting the final stage metric.