Almanac of UUIDs · Part V

The Namespace and Random Standards

5.1 UUID Version 3 & 5: Name-Based (Deterministic)

Often, a system needs to bridge legacy identifiers (like URLs, email addresses, or ISO Object IDs) into the UUID universe. Uniqueness is required, but consistency is paramount: the same URL input must always yield the same UUID output, regardless of when or where it is computed.

Mechanism

  1. Namespace: Start with a seed UUID known as the Namespace (e.g., NAMESPACE_URL).
  2. Input: Concatenate the binary bytes of the Namespace UUID with the binary bytes of the name string (e.g., https://example.com).
  3. Hash: Compute the cryptographic hash of the concatenated data.
    • v3: Uses MD5 (128 bits).
    • v5: Uses SHA-1 (160 bits).
  4. Truncate & Tag: For v5, truncate the SHA-1 hash to 128 bits. Then, overwrite the Version and Variant bits to match the spec.

RFC 9562 mandates the use of v5 over v3 for new applications. MD5 is cryptographically broken; collisions can be engineered. SHA-1, while also vulnerable to theoretical attacks, offers a higher safety margin.

5.2 UUID Version 4: Random

Version 4 represents the shift toward simplicity. By relying on pure randomness, v4 eliminates the complexity of synchronizing clocks, managing MAC addresses, or hashing inputs. It is the most widely used version in the industry today.

1. Gen 128 random bits.
2. Set Version to 0100 (4).
3. Set Variant to 10.
4. Result: 122 bits of pure entropy.

The Mathematics of Collision (Birthday Paradox)

Engineers often fear v4 collisions. This fear is mathematically unfounded for general use cases.

  • Total v4 Space: 2122 (approx 5.3 × 1036).
  • To reach a 50% probability of a single collision, one would need to generate roughly 2.71 quintillion UUIDs.

If you have a system generating 1 billion UUIDs per second, it would take 85 years to reach a 50% collision probability. You are more likely to be hit by a meteorite while winning the lottery than to generate a v4 collision in a standard business application.