3.21 Find the Duplicate Number

3.21.1 Problem Metadata

3.21.2 Description

Given an array of integers nums containing n + 1 integers where each integer is in the range [1, n] inclusive, there is only one repeated number in nums. Return this repeated number.

You must solve the problem without modifying the array nums and using only constant extra space.

3.21.3 Examples

Example 1:

Input: nums = [1,3,4,2,2]
Output: 2

Example 2:

Input: nums = [3,1,3,4,2]
Output: 3

Example 3:

Input: nums = [3,3,3,3,3]
Output: 3

3.21.4 Constraints

  • \(1 \le n \le 10^5\)
  • nums.length = \(n + 1\)
  • \(1 \le \text{nums}[i] \le n\)
  • All the integers in nums appear only once except for precisely one integer which appears two or more times

3.21.5 Follow-up

  • How can we prove that at least one duplicate number must exist in nums?
  • Can you solve the problem in linear runtime complexity?

3.21.6 Solution - Floyd’s Cycle Detection (Tortoise and Hare)

3.21.6.1 Walkthrough

This solution uses Floyd’s Cycle Detection algorithm to find the duplicate number in O(n) time and O(1) space, meeting the problem’s constraints.

Core Strategy:

The key insight is to treat the array as an implicit linked list where: - Each index i is a “node” - Each value nums[i] is a “pointer” to the next node at index nums[i] - The duplicate number creates a cycle because multiple indices point to the same value

Why a cycle exists:

Since there are n + 1 integers in the range [1, n], by the Pigeonhole Principle, at least one number must appear twice. When two different indices contain the same value, they both “point” to the same next index, creating a cycle in our implicit linked list.

Floyd’s Cycle Detection has two phases:

Phase 1: Detect the cycle - Use two pointers: slow (moves 1 step) and fast (moves 2 steps) - They will eventually meet somewhere inside the cycle - The meeting point is NOT necessarily the duplicate

Phase 2: Find the cycle entrance - Reset slow to the start (nums[0]) - Move both slow and fast one step at a time - When they meet, that position is the cycle entrance (the duplicate number)

Why Phase 2 works (Mathematical Proof):

Let: - x = distance from start to cycle entrance - y = distance from cycle entrance to meeting point in Phase 1 - C = cycle length

When slow and fast meet in Phase 1: - Slow traveled: x + y - Fast traveled: x + y + nC (some multiple of cycle length) - Since fast moves twice as fast: 2(x + y) = x + y + nC - Simplifying: x + y = nC - Therefore: x = nC - y

This means the distance from start to entrance equals the distance from meeting point to entrance (after some full cycles). Moving both pointers one step at a time from these positions will make them meet exactly at the entrance.

Example Walkthrough:

Using nums = [1,3,4,2,2]:

Index:  0  1  2  3  4
Value: [1, 3, 4, 2, 2]

Implicit linked list visualization:
0 → 1 → 3 → 2 → 4 → 2 (cycle!)
         ↑_________|

Phase 1: Find intersection point
Start: slow=1, fast=1
Step 1: slow=nums[1]=3, fast=nums[nums[1]]=nums[3]=2
Step 2: slow=nums[3]=2, fast=nums[nums[2]]=nums[4]=2
They meet at value 2!

Phase 2: Find cycle entrance
Reset slow to nums[0]=1
Step 1: slow=nums[1]=3, fast=nums[2]=4
Step 2: slow=nums[3]=2, fast=nums[4]=2
They meet at 2 → This is the duplicate!

Visual Representation:

nums = [1, 3, 4, 2, 2]
        ↓  ↓  ↓  ↓  ↓
index:  0→ 1→ 3→ 2→ 4
                 ↑_____|  (cycle: 2→4→2)

The duplicate value 2 appears at indices 3 and 4, creating the cycle entrance at index 2.

3.21.6.2 Analysis

  • Time Complexity: O(n)
    • Phase 1: At most O(n) steps to detect the cycle
    • Phase 2: At most O(n) steps to find the entrance
    • Total: O(n) + O(n) = O(n)
    • Each array element is visited a constant number of times
  • Space Complexity: O(1)
    • Only uses two pointer variables (slow and fast)
    • No additional data structures needed
    • Meets the problem’s constant space requirement
    • Does not modify the input array

Why This Approach is Optimal:

  • Cannot do better than O(n) time since we must examine elements to find the duplicate
  • O(1) space is optimal given the constraint of not modifying the array
  • Hash table approach would use O(n) space (violates constraint)
  • Sorting would modify the array (violates constraint)
  • Floyd’s algorithm achieves both optimal time and space

3.21.6.3 Implementation Steps

Phase 1: Detect the cycle (find intersection point)

  1. Initialize both pointers to nums[0]:
    • slow = nums[0]
    • fast = nums[0]
  2. Use a do-while loop to move pointers until they meet:
    • Move slow one step: slow = nums[slow]
    • Move fast two steps: fast = nums[nums[fast]]
    • Continue until slow == fast
  3. When they meet, we’ve found an intersection point inside the cycle

Phase 2: Find the cycle entrance (the duplicate)

  1. Reset slow to the start:
    • slow = nums[0]
    • Keep fast at the meeting point
  2. Move both pointers one step at a time until they meet:
    • slow = nums[slow]
    • fast = nums[fast]
    • Continue until slow == fast
  3. Return slow (or fast, they’re equal) as the duplicate number

Why do-while for Phase 1?

We use do-while because both pointers start at the same position (nums[0]). A regular while loop would immediately exit since slow == fast initially.

3.21.6.4 Code - Java

class Solution {
    public int findDuplicate(int[] nums) {
        // Phase 1: Find intersection point in the cycle
        int slow = nums[0];
        int fast = nums[0];

        // Use do-while since both start at same position
        do {
            slow = nums[slow];           // Move 1 step
            fast = nums[nums[fast]];     // Move 2 steps
        } while (slow != fast);

        // Phase 2: Find the cycle entrance (the duplicate)
        slow = nums[0];  // Reset slow to start

        // Move both at same speed until they meet
        while (slow != fast) {
            slow = nums[slow];
            fast = nums[fast];
        }

        return slow;  // The duplicate number
    }
}