BACK
19:02:17 UTC

PLUS is a personalized learning platform serving thousands of middle schoolers
internship
As the design system evolved to support diverse learning experiences, layout decisions—especially around spacing—began to lack consistency. Tokens overlapped in value, yet carried different intentions. Designers often reused tokens without considering whether their application was semantically appropriate.

As an organization that onboards 10+ new designers every year, we need a scalable token system to empower new designers to build with clarity.
Prior to this work, our layout token system was built around loosely defined categories like:
This division introduced overlapping values and unclear scopes.
For instance, spacer-2: 4px
was used for both within-component
and between-component
spacing, and designers would use them interchangeably. Tokens from other domains (e.g., radius, stroke) were sometimes applied where spacing should be scoped—resulting in inconsistent files and confusing handoff.
Rather than overhauling the entire system, we focused on tightening what already worked.
Shifted to a size-driven tiering system
Introduced fixed increments to remove token overlaps
Scoped tokens by property (e.g., padding tokens only show spacing variables)

A new token set was introduced—spacing-00
through spacing-1000
—with increments following a 2 or 4 pixel logic, removing the guesswork and excess training when onboarding new designers.
With the foundation solidified, we turned to the next problem: actual semantic usage across the system. Even with clean tokens, were they being used correctly?
Were we using the tokens in ways that respected their semantic intent?
As we scaled our design system, we started hearing the same questions from newly onboarded designers:
“Which spacing token am I supposed to use here?” “Why is this small component using Medium spacing?”
Nothing is visually wrong. A button using 12px
makes perfect sense.
However, a button is naturally seen as a small component — now that 12px
falls into our medium
tier, it appears semantically inconsistent, especially for new team members who interpreted token tiers as strict usage boundaries.
The disconnect between what looks right and what reads right in the system highlighted the fact that we had no mechanism to distinguish visual intent from semantic alignment. In order to troubleshoot this, I went through every atom and molecule on our current system to detect and visualize semantic misuses across the board.
Starting with a spreadsheet, I uncovered a pattern of semantic mismatches across the system.
I first audited every UI component manually, documenting its actual usage of horizontal and vertical spacing tokens. This gave me a ground truth of how spacing was applied in practice.
To make sense of that information, I wrote a script that categorized each token into semantic tiers, and assigned perceived semantic size to each component.
But manually checking if each usage matched the component’s perceived size would’ve been too time-consuming and inconsistent.
So I built a Jupyter Notebook to visualize mismatches. This let me quickly scan where components used tokens that didn’t align with their expected tier, revealing inconsistencies across padding, spacing, and size semantics.
43.9% of components showed semantic inconsistencies
This wasn't a small problem affecting a few edge cases. Nearly half of our design system was operating outside its intended semantic boundaries.
Small components often relied on Medium tokens to meet accessibility and visual requirements.
Horizontal mismatches dominated, especially padding, with
28.1%
in H-padding mismatches and17.5%
in H-spacing mismatches — suggesting that horizontal layout demands are more complex and varied than vertical ones.These mismatches weren’t errors — they were consistently applied across families and are intentional choices
We realized the issue wasn’t designers making mistakes. It was the system being too rigid. Tokens like space-200
were often the right choice for small components like buttons functionally.
Instead of forcing strict adherence, I reframed cross-semantic usage as a pattern worth documenting.
Creating a guideline for semantic usage can provide a structured way to legitimize cross-tier token usage. For each recurring pattern, we built component-family templates that clearly defined:
The base semantic size
Approved token exceptions
Why those exceptions existed
This let us shift from enforcing rules to systematizing exceptions.
The guideline documentation website is live.
Takeaways
A lot of what I thought were “mistakes” were actually smart choices made by designers trying to make components feel right.
Our system was too rigid — real components don’t always fit nicely into predefined size buckets.
Instead of forcing everything to follow the rules, I learned it’s more helpful to understand why people bend them.
Patterns in so-called “inconsistencies” can actually reveal what the system needs next.
The best systems aren’t the ones that enforce the most rules. They need to evolve with how people actually work.
SEE ALSO