Unicode Normalizer

Normalize Unicode text to NFC, NFD, NFKC, or NFKD form and optionally strip combining accents.

Runs locallyInstantPrivate
Input
Output
Normalized text appears here.

What This Tool Does

Unicode defines multiple ways to represent the same visible character. The letter “é”, for example, can be stored as a single precomposed code point (U+00E9) or as the letter “e” followed by a combining acute accent (U+0301). Both look identical on screen yet compare as unequal in code. This tool applies JavaScript's built-in String.prototype.normalize() to convert your text into whichever canonical or compatibility form you choose, ensuring consistent character representation across databases, APIs, and text-processing pipelines. The optional accent-stripping step removes all combining diacritical marks (U+0300–U+036F) after decomposition, leaving only base Latin characters.

How to Use

  1. Paste or type your text into the input panel.
  2. Select the normalization form — NFC is recommended for most web and storage use cases.
  3. Optionally enable “Strip combining accents” to remove diacritical marks.
  4. Copy or download the normalized output from the right panel.

Frequently Asked Questions

What is Unicode normalization and why does it matter?

Unicode allows the same perceived character to be encoded in more than one byte sequence. Without normalization, string comparisons, database lookups, and search indexes can produce incorrect results because “café” stored two different ways will not match. Normalization converts text to a predictable, canonical form so that identical-looking strings are also byte-identical. It is a standard step before hashing passwords, indexing text, or comparing user input in any language-aware application.

What is the difference between NFC and NFD?

NFD (Canonical Decomposition) breaks every composed character into its base letter plus one or more combining marks — “é” becomes “e” + U+0301. NFC (Canonical Decomposition followed by Canonical Composition) then re-composes those sequences back into the shortest precomposed form where one exists. NFC is what most operating systems, web browsers, and databases store by default, so it is the safest choice for output that will be displayed or persisted. NFD is useful when you need to inspect or manipulate individual combining marks programmatically.

When should I use NFKC instead of NFC?

NFKC applies compatibility decomposition before re-composing, which means it collapses visually similar but technically distinct characters: the ligature “fi” becomes “fi”, full-width Latin letters become their ASCII equivalents, and superscript digits turn into regular digits. This makes NFKC the right choice for search normalization, keyword matching, and any situation where you want “A” and “A” to be treated the same. Avoid it when you need to preserve typographic ligatures or CJK compatibility characters for display.

What happens when I enable “Strip combining accents”?

After decomposing the text, all combining diacritical marks in the Unicode range U+0300–U+036F are deleted. The result is unaccented base characters — “café” becomes “cafe” and “naïve” becomes “naive”. This is useful for generating ASCII-safe slugs, creating fallback plain-text versions of content, or normalizing user input for fuzzy search. The accent stripping works correctly regardless of which normalization form you select because the tool always decomposes first before removing combining marks.

Related tools