3.21 Find the Duplicate Number
3.21.1 Problem Metadata
- Platform: LeetCode
- Problem ID: 287
- Difficulty: Medium
- URL: https://leetcode.com/problems/find-the-duplicate-number/
- Tags: NeetCode 150
- Techniques: Two Pointers, Array, Floyd’s Cycle Detection
3.21.2 Description
Given an array of integers nums containing n + 1 integers where each integer is in the range [1, n] inclusive, there is only one repeated number in nums. Return this repeated number.
You must solve the problem without modifying the array nums and using only constant extra space.
3.21.3 Examples
Example 1:
Input: nums = [1,3,4,2,2]
Output: 2
Example 2:
Input: nums = [3,1,3,4,2]
Output: 3
Example 3:
Input: nums = [3,3,3,3,3]
Output: 3
3.21.4 Constraints
- \(1 \le n \le 10^5\)
nums.length= \(n + 1\)- \(1 \le \text{nums}[i] \le n\)
- All the integers in
numsappear only once except for precisely one integer which appears two or more times
3.21.5 Follow-up
- How can we prove that at least one duplicate number must exist in
nums? - Can you solve the problem in linear runtime complexity?
3.21.6 Solution - Floyd’s Cycle Detection (Tortoise and Hare)
3.21.6.1 Walkthrough
This solution uses Floyd’s Cycle Detection algorithm to find the duplicate number in O(n) time and O(1) space, meeting the problem’s constraints.
Core Strategy:
The key insight is to treat the array as an implicit linked list where:
- Each index i is a “node”
- Each value nums[i] is a “pointer” to the next node at index nums[i]
- The duplicate number creates a cycle because multiple indices point to the same value
Why a cycle exists:
Since there are n + 1 integers in the range [1, n], by the Pigeonhole Principle, at least one number must appear twice. When two different indices contain the same value, they both “point” to the same next index, creating a cycle in our implicit linked list.
Floyd’s Cycle Detection has two phases:
Phase 1: Detect the cycle
- Use two pointers: slow (moves 1 step) and fast (moves 2 steps)
- They will eventually meet somewhere inside the cycle
- The meeting point is NOT necessarily the duplicate
Phase 2: Find the cycle entrance
- Reset slow to the start (nums[0])
- Move both slow and fast one step at a time
- When they meet, that position is the cycle entrance (the duplicate number)
Why Phase 2 works (Mathematical Proof):
Let:
- x = distance from start to cycle entrance
- y = distance from cycle entrance to meeting point in Phase 1
- C = cycle length
When slow and fast meet in Phase 1:
- Slow traveled: x + y
- Fast traveled: x + y + nC (some multiple of cycle length)
- Since fast moves twice as fast: 2(x + y) = x + y + nC
- Simplifying: x + y = nC
- Therefore: x = nC - y
This means the distance from start to entrance equals the distance from meeting point to entrance (after some full cycles). Moving both pointers one step at a time from these positions will make them meet exactly at the entrance.
Example Walkthrough:
Using nums = [1,3,4,2,2]:
Index: 0 1 2 3 4
Value: [1, 3, 4, 2, 2]
Implicit linked list visualization:
0 → 1 → 3 → 2 → 4 → 2 (cycle!)
↑_________|
Phase 1: Find intersection point
Start: slow=1, fast=1
Step 1: slow=nums[1]=3, fast=nums[nums[1]]=nums[3]=2
Step 2: slow=nums[3]=2, fast=nums[nums[2]]=nums[4]=2
They meet at value 2!
Phase 2: Find cycle entrance
Reset slow to nums[0]=1
Step 1: slow=nums[1]=3, fast=nums[2]=4
Step 2: slow=nums[3]=2, fast=nums[4]=2
They meet at 2 → This is the duplicate!
Visual Representation:
nums = [1, 3, 4, 2, 2]
↓ ↓ ↓ ↓ ↓
index: 0→ 1→ 3→ 2→ 4
↑_____| (cycle: 2→4→2)
The duplicate value 2 appears at indices 3 and 4, creating the cycle entrance at index 2.
3.21.6.2 Analysis
- Time Complexity: O(n)
- Phase 1: At most O(n) steps to detect the cycle
- Phase 2: At most O(n) steps to find the entrance
- Total: O(n) + O(n) = O(n)
- Each array element is visited a constant number of times
- Space Complexity: O(1)
- Only uses two pointer variables (
slowandfast) - No additional data structures needed
- Meets the problem’s constant space requirement
- Does not modify the input array
- Only uses two pointer variables (
Why This Approach is Optimal:
- Cannot do better than O(n) time since we must examine elements to find the duplicate
- O(1) space is optimal given the constraint of not modifying the array
- Hash table approach would use O(n) space (violates constraint)
- Sorting would modify the array (violates constraint)
- Floyd’s algorithm achieves both optimal time and space
3.21.6.3 Implementation Steps
Phase 1: Detect the cycle (find intersection point)
- Initialize both pointers to
nums[0]:slow = nums[0]fast = nums[0]
- Use a do-while loop to move pointers until they meet:
- Move
slowone step:slow = nums[slow] - Move
fasttwo steps:fast = nums[nums[fast]] - Continue until
slow == fast
- Move
- When they meet, we’ve found an intersection point inside the cycle
Phase 2: Find the cycle entrance (the duplicate)
- Reset
slowto the start:slow = nums[0]- Keep
fastat the meeting point
- Move both pointers one step at a time until they meet:
slow = nums[slow]fast = nums[fast]- Continue until
slow == fast
- Return
slow(orfast, they’re equal) as the duplicate number
Why do-while for Phase 1?
We use do-while because both pointers start at the same position (nums[0]). A regular while loop would immediately exit since slow == fast initially.
3.21.6.4 Code - Java
class Solution {
public int findDuplicate(int[] nums) {
// Phase 1: Find intersection point in the cycle
int slow = nums[0];
int fast = nums[0];
// Use do-while since both start at same position
do {
slow = nums[slow]; // Move 1 step
fast = nums[nums[fast]]; // Move 2 steps
} while (slow != fast);
// Phase 2: Find the cycle entrance (the duplicate)
slow = nums[0]; // Reset slow to start
// Move both at same speed until they meet
while (slow != fast) {
slow = nums[slow];
fast = nums[fast];
}
return slow; // The duplicate number
}
}