2.3 Sorting Algorithms
Sorting is a fundamental operation in computer science that arranges elements in a specific order (ascending or descending). Understanding different sorting algorithms, their trade-offs, and when to use each is essential for technical interviews.
2.3.1 Core Concepts
Stability: A sorting algorithm is stable if elements with equal keys maintain their relative order after sorting. This matters when sorting by multiple criteria.
In-place: An algorithm is in-place if it uses only O(1) extra space (excluding the input array).
Comparison-based: Most common sorting algorithms compare elements pairwise. The theoretical lower bound for comparison-based sorting is \(O(n \log n)\).
2.3.2 Bubble Sort
Bubble Sort is the simplest sorting algorithm that repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. The pass through the list is repeated until the list is sorted.
Key Characteristics: - Compares and swaps adjacent elements - Largest elements “bubble up” to their correct position - Easy to understand and implement - Rarely used in practice due to poor performance
Algorithm: 1. Compare each pair of adjacent elements 2. Swap if they are in the wrong order 3. Repeat until no swaps are needed
Implementation:
public void bubbleSort(int[] arr) {
int n = arr.length;
for (int i = 0; i < n - 1; i++) {
boolean swapped = false;
for (int j = 0; j < n - 1 - i; j++) {
if (arr[j] > arr[j + 1]) {
// Swap adjacent elements
int temp = arr[j];
arr[j] = arr[j + 1];
arr[j + 1] = temp;
swapped = true;
}
}
// Optimization: if no swaps, array is sorted
if (!swapped) break;
}
}Complexity Analysis: - Time: \(O(n^2)\) worst/average case, \(O(n)\) best case (already sorted with optimization) - Space: \(O(1)\) - in-place - Stable: Yes
When to Use: Almost never in practice. Useful for educational purposes or when simplicity is more important than performance on very small arrays.
2.3.3 Insertion Sort
Insertion Sort builds the final sorted array one element at a time by repeatedly picking the next element and inserting it into its correct position among the previously sorted elements.
Key Characteristics: - Builds sorted portion from left to right - Efficient for small arrays and nearly sorted data - Online algorithm (can sort as data arrives) - Used as the base case in hybrid sorting algorithms
Algorithm: 1. Start with the second element (first element is trivially sorted) 2. Compare with elements in the sorted portion (moving right to left) 3. Shift larger elements right to make room 4. Insert the current element in its correct position 5. Repeat for all remaining elements
Implementation:
public void insertionSort(int[] arr) {
int n = arr.length;
for (int i = 1; i < n; i++) {
int key = arr[i];
int j = i - 1;
// Shift elements greater than key to the right
while (j >= 0 && arr[j] > key) {
arr[j + 1] = arr[j];
j--;
}
// Insert key at correct position
arr[j + 1] = key;
}
}Complexity Analysis: - Time: \(O(n^2)\) worst/average case, \(O(n)\) best case (already sorted) - Space: \(O(1)\) - in-place - Stable: Yes
When to Use: - Small arrays (n < 10-20) - Nearly sorted data - As the base case in hybrid algorithms (e.g., Timsort uses insertion sort for small subarrays) - When you need an online algorithm
2.3.4 Merge Sort
Merge Sort is a divide-and-conquer algorithm that divides the array into halves, recursively sorts each half, and then merges the sorted halves back together.
Key Characteristics: - Divide and conquer approach - Guaranteed \(O(n \log n)\) performance - Stable sorting algorithm - Requires \(O(n)\) extra space - Well-suited for linked lists (can be done in-place)
Algorithm: 1. Divide: Split the array into two halves 2. Conquer: Recursively sort each half 3. Combine: Merge the two sorted halves
Implementation:
public void mergeSort(int[] arr, int left, int right) {
if (left < right) {
int mid = left + (right - left) / 2;
// Recursively sort both halves
mergeSort(arr, left, mid);
mergeSort(arr, mid + 1, right);
// Merge sorted halves
merge(arr, left, mid, right);
}
}
private void merge(int[] arr, int left, int mid, int right) {
// Create temporary arrays
int n1 = mid - left + 1;
int n2 = right - mid;
int[] leftArr = new int[n1];
int[] rightArr = new int[n2];
// Copy data to temporary arrays
for (int i = 0; i < n1; i++) {
leftArr[i] = arr[left + i];
}
for (int j = 0; j < n2; j++) {
rightArr[j] = arr[mid + 1 + j];
}
// Merge temporary arrays back
int i = 0, j = 0, k = left;
while (i < n1 && j < n2) {
if (leftArr[i] <= rightArr[j]) {
arr[k] = leftArr[i];
i++;
} else {
arr[k] = rightArr[j];
j++;
}
k++;
}
// Copy remaining elements
while (i < n1) {
arr[k] = leftArr[i];
i++;
k++;
}
while (j < n2) {
arr[k] = rightArr[j];
j++;
k++;
}
}Complexity Analysis: - Time: \(O(n \log n)\) for all cases (worst, average, best) - Space: \(O(n)\) for temporary arrays - Stable: Yes
When to Use: - When stability is required - When guaranteed \(O(n \log n)\) is needed (no worst-case degradation) - Sorting linked lists (can achieve \(O(1)\) space) - External sorting (sorting data too large to fit in memory)
2.3.5 Quick Sort
Quick Sort is a divide-and-conquer algorithm that selects a “pivot” element and partitions the array around the pivot, placing smaller elements before it and larger elements after it.
Key Characteristics: - Divide and conquer with in-place partitioning - Average case \(O(n \log n)\), but \(O(n^2)\) worst case - Not stable (relative order of equal elements may change) - Cache-friendly due to in-place operations - Most practical choice for general-purpose sorting
Algorithm: 1. Choose pivot: Select an element as pivot (first, last, middle, or random) 2. Partition: Rearrange so elements < pivot are left, elements > pivot are right 3. Recurse: Recursively sort the left and right partitions
Implementation (Lomuto Partition Scheme):
public void quickSort(int[] arr, int low, int high) {
if (low < high) {
int pivotIndex = partition(arr, low, high);
quickSort(arr, low, pivotIndex - 1);
quickSort(arr, pivotIndex + 1, high);
}
}
private int partition(int[] arr, int low, int high) {
int pivot = arr[high]; // Choose last element as pivot
int i = low - 1; // Index of smaller element
for (int j = low; j < high; j++) {
if (arr[j] < pivot) {
i++;
// Swap arr[i] and arr[j]
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
}
// Place pivot in correct position
int temp = arr[i + 1];
arr[i + 1] = arr[high];
arr[high] = temp;
return i + 1;
}Implementation (Hoare Partition Scheme - More Efficient):
private int partitionHoare(int[] arr, int low, int high) {
int pivot = arr[low + (high - low) / 2]; // Middle element as pivot
int i = low - 1;
int j = high + 1;
while (true) {
do {
i++;
} while (arr[i] < pivot);
do {
j--;
} while (arr[j] > pivot);
if (i >= j) {
return j;
}
// Swap arr[i] and arr[j]
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
}Complexity Analysis: - Time: \(O(n \log n)\) average, \(O(n^2)\) worst case (already sorted with bad pivot) - Space: \(O(\log n)\) for recursion stack (average), \(O(n)\) worst case - Stable: No
Pivot Selection Strategies: - First/Last element: Simple but vulnerable to sorted input - Middle element: Better for sorted input - Random element: Avoids worst case with high probability - Median-of-three: Choose median of first, middle, and last elements
When to Use: - General-purpose sorting when average performance matters - When in-place sorting is required - When stability is not required - Arrays (not as efficient for linked lists)
2.3.6 Topological Sort
Topological Sort is an ordering of vertices in a Directed Acyclic Graph (DAG) such that for every directed edge (u, v), vertex u comes before vertex v in the ordering. It’s not a comparison-based sort like the others, but rather a graph algorithm that produces a linear ordering of dependencies.
Key Characteristics: - Only works on Directed Acyclic Graphs (DAGs) - Multiple valid topological orderings may exist - Used for dependency resolution and scheduling - Detects cycles (if a topological sort is impossible, the graph has a cycle)
Common Use Cases: - Build systems (compile dependencies in correct order) - Course scheduling (prerequisites before courses) - Task scheduling with dependencies - Package managers (install dependencies first)
Algorithm 1: Kahn’s Algorithm (BFS-based)
Uses in-degree tracking: repeatedly remove nodes with no incoming edges.
public List<Integer> topologicalSortKahn(int numNodes, int[][] edges) {
// Build adjacency list and in-degree array
List<List<Integer>> graph = new ArrayList<>();
int[] inDegree = new int[numNodes];
for (int i = 0; i < numNodes; i++) {
graph.add(new ArrayList<>());
}
for (int[] edge : edges) {
graph.get(edge[0]).add(edge[1]);
inDegree[edge[1]]++;
}
// Start with nodes having no incoming edges
Queue<Integer> queue = new LinkedList<>();
for (int i = 0; i < numNodes; i++) {
if (inDegree[i] == 0) {
queue.offer(i);
}
}
List<Integer> result = new ArrayList<>();
while (!queue.isEmpty()) {
int node = queue.poll();
result.add(node);
// Remove this node's outgoing edges
for (int neighbor : graph.get(node)) {
inDegree[neighbor]--;
if (inDegree[neighbor] == 0) {
queue.offer(neighbor);
}
}
}
// Check for cycle
if (result.size() != numNodes) {
return Collections.emptyList(); // Cycle detected
}
return result;
}Algorithm 2: DFS-based
Uses post-order DFS traversal: add node to result after visiting all descendants.
public List<Integer> topologicalSortDFS(int numNodes, int[][] edges) {
List<List<Integer>> graph = new ArrayList<>();
for (int i = 0; i < numNodes; i++) {
graph.add(new ArrayList<>());
}
for (int[] edge : edges) {
graph.get(edge[0]).add(edge[1]);
}
boolean[] visited = new boolean[numNodes];
boolean[] inStack = new boolean[numNodes]; // For cycle detection
Deque<Integer> stack = new ArrayDeque<>();
for (int i = 0; i < numNodes; i++) {
if (!visited[i]) {
if (!dfs(graph, i, visited, inStack, stack)) {
return Collections.emptyList(); // Cycle detected
}
}
}
List<Integer> result = new ArrayList<>();
while (!stack.isEmpty()) {
result.add(stack.pop());
}
return result;
}
private boolean dfs(List<List<Integer>> graph, int node,
boolean[] visited, boolean[] inStack, Deque<Integer> stack) {
visited[node] = true;
inStack[node] = true;
for (int neighbor : graph.get(node)) {
if (inStack[neighbor]) {
return false; // Cycle detected
}
if (!visited[neighbor]) {
if (!dfs(graph, neighbor, visited, inStack, stack)) {
return false;
}
}
}
inStack[node] = false;
stack.push(node); // Add after all descendants
return true;
}Complexity Analysis: - Time: \(O(V + E)\) where V is vertices and E is edges - Space: \(O(V)\) for visited arrays and result
Kahn’s vs DFS Comparison:
| Aspect | Kahn’s (BFS) | DFS-based |
|---|---|---|
| Approach | Remove zero in-degree nodes | Post-order traversal |
| Cycle Detection | result.size() != numNodes | Track nodes in current path |
| Implementation | Queue + in-degree array | Recursion + stack |
| Parallelization | Easier (process level by level) | Harder |
| Use Case | When you need levels/layers | When order doesn’t matter |
When to Use Topological Sort: - Problems involving dependencies or prerequisites - “Order of execution” problems - Course schedule problems (LeetCode 207, 210) - Build order problems - Detecting cycles in directed graphs
2.3.7 When to Use Which Sorting Algorithm
Choosing the right sorting algorithm depends on several factors: input size, data characteristics, memory constraints, and stability requirements.
2.3.7.1 Decision Matrix
| Factor | Bubble Sort | Insertion Sort | Merge Sort | Quick Sort |
|---|---|---|---|---|
| Time (Average) | \(O(n^2)\) | \(O(n^2)\) | \(O(n \log n)\) | \(O(n \log n)\) |
| Time (Worst) | \(O(n^2)\) | \(O(n^2)\) | \(O(n \log n)\) | \(O(n^2)\) |
| Space | \(O(1)\) | \(O(1)\) | \(O(n)\) | \(O(\log n)\) |
| Stable | Yes | Yes | Yes | No |
| In-place | Yes | Yes | No | Yes |
| Adaptive | Yes* | Yes | No | No |
*With early termination optimization
2.3.7.2 Decision Tree
┌─────────────────────────────┐
│ Need to sort data? │
└──────────────┬──────────────┘
│
┌──────────────┴──────────────┐
│ Is n very small (< 10-20)? │
└──────┬──────────────┬───────┘
│ │
YES NO
│ │
▼ │
┌───────────────┐ │
│Insertion Sort │ │
│ Simple, fast │ │
│ for small n │ │
└───────────────┘ │
│
┌──────────────┴──────────────┐
│ Is data nearly sorted? │
└──────┬──────────────┬───────┘
│ │
YES NO
│ │
▼ │
┌───────────────┐ │
│Insertion Sort │ │
│ O(n) for │ │
│ nearly sorted │ │
└───────────────┘ │
│
┌──────────────┴──────────────┐
│ Is stability required? │
└──────┬──────────────┬───────┘
│ │
YES NO
│ │
▼ │
┌───────────────┐ │
│ Merge Sort │ │
│ Guaranteed │ │
│ O(n log n) │ │
│ + stable │ │
└───────────────┘ │
│
┌──────────────┴──────────────┐
│ Is worst-case O(n²) │
│ acceptable? │
└──────┬──────────────┬───────┘
│ │
YES NO
│ │
▼ ▼
┌───────────────┐ ┌───────────────┐
│ Quick Sort │ │ Merge Sort │
│ Fastest avg │ │ Guaranteed │
│ In-place │ │ O(n log n) │
└───────────────┘ └───────────────┘
2.3.7.3 Practical Guidelines
Use Insertion Sort when: - Array size is small (n < 10-20) - Data is nearly sorted - Need an online algorithm (sorting as data arrives) - As base case in hybrid algorithms
Use Merge Sort when: - Stability is required - Guaranteed \(O(n \log n)\) is needed - Sorting linked lists - External sorting (data doesn’t fit in memory) - Parallelization is desired (easily parallelizable)
Use Quick Sort when: - General-purpose sorting - In-place sorting is required - Average-case performance is acceptable - Working with arrays (cache-friendly) - Memory is constrained
Use Bubble Sort when: - Educational purposes only - Extremely simple implementation is needed - Array is very small and simplicity trumps performance
2.3.7.4 Real-World Implementations
Most production sorting implementations use hybrid algorithms:
- Java’s
Arrays.sort(): Uses Dual-Pivot Quick Sort for primitives, Timsort for objects - Python’s
sorted(): Uses Timsort (hybrid of Merge Sort and Insertion Sort) - C++’s
std::sort(): Uses Introsort (Quick Sort + Heap Sort + Insertion Sort)
Timsort combines the best of Merge Sort and Insertion Sort: 1. Divide array into small “runs” 2. Sort each run with Insertion Sort (efficient for small/sorted data) 3. Merge runs using Merge Sort (stable, \(O(n \log n)\))
2.3.7.5 Interview Tips
- Know the trade-offs: Be prepared to discuss time/space complexity, stability, and use cases
- Mention optimizations: Random pivot for Quick Sort, early termination for Bubble Sort
- Consider the data: Nearly sorted? Small? Linked list? These affect algorithm choice
- Hybrid approaches: Mentioning Timsort or Introsort shows deeper understanding
- In-place vs. stable: These are often the deciding factors between Quick Sort and Merge Sort