10.7 Encode and Decode Strings
10.7.1 Problem Metadata
- Platform: LeetCode
- Problem ID: 271
- Difficulty: Medium
- URL: https://leetcode.com/problems/encode-and-decode-strings/
- Tags: Blind 75, NeetCode 150
- Techniques: String, Array, Design
10.7.2 Description
Design an algorithm to encode a list of strings to a single string. The encoded string is then decoded back to the original list of strings.
Please implement:
- String encode(List<String> strs) - Encodes a list of strings to a single string
- List<String> decode(String s) - Decodes a single string to a list of strings
10.7.3 Examples
Example 1:
Input: strs = ["lint","code","love","you"]
Output: ["lint","code","love","you"]
Explanation: One possible encode method is: "4#lint4#code4#love3#you"
Example 2:
Input: strs = ["we", "say", ":", "yes"]
Output: ["we", "say", ":", "yes"]
Explanation: Strings can contain any characters including delimiters.
Example 3:
Input: strs = [""]
Output: [""]
Explanation: Empty strings must be handled correctly.
10.7.4 Constraints
- \(0 \le\) strs.length \(\le\) 200
- \(0 \le\) strs[i].length \(\le\) 200
strs[i]contains any possible characters including UTF-8 characters- The encoded string can contain any characters
10.7.5 Solution - Length Prefix Encoding
10.7.5.1 Walkthrough
This problem requires designing a robust encoding scheme that handles strings containing any possible characters, including potential delimiter characters. The key challenge is avoiding ambiguity when encoding and decoding.
Why Simple Delimiters Fail:
Common naive approaches fail because strings can contain the delimiter itself:
- Using , as delimiter: ["a,b", "c"] → "a,b,c" → ambiguous decode
- Using : as delimiter: ["a:b", "c"] → "a:b:c" → ambiguous decode
- Using special characters: strings can contain ANY character
Solution: Length Prefix Encoding
The robust approach is to prefix each string with its length, followed by a delimiter that won’t appear in length representations:
Format: length#content
For a list of strings ["lint", "code", "love", "you"]:
- "lint" has length 4 → encode as "4#lint"
- "code" has length 4 → encode as "4#code"
- "love" has length 4 → encode as "4#love"
- "you" has length 3 → encode as "3#you"
- Encoded result: "4#lint4#code4#love3#you"
Why This Works:
- Unambiguous parsing: When decoding, we first read the length number, then skip the
#, then read exactly that many characters - Content independence: The actual string content can contain any characters including
#, because we know exactly how many characters to read - No escaping needed: Unlike delimiter-based approaches, we don’t need to escape special characters
Detailed Decoding Process:
When decoding "4#lint4#code4#love3#you":
Step 1: Read characters until '#' → "4"
Parse length = 4
Read next 4 characters → "lint"
Result: ["lint"]
Step 2: Read characters until '#' → "4"
Parse length = 4
Read next 4 characters → "code"
Result: ["lint", "code"]
Step 3: Read characters until '#' → "4"
Parse length = 4
Read next 4 characters → "love"
Result: ["lint", "code", "love"]
Step 4: Read characters until '#' → "3"
Parse length = 3
Read next 3 characters → "you"
Result: ["lint", "code", "love", "you"]
Edge Cases Handled:
- Empty strings:
[""]→"0#"→[""] - Strings containing delimiter:
["a#b"]→"3#a#b"→ we read exactly 3 chars, so we get"a#b"correctly - Strings containing numbers:
["123"]→"3#123"→ length prefix is separate from content - Multiple consecutive delimiters:
["##"]→"2###"→ read 2 chars after first#
Example with tricky input ["we", "say", ":", "yes", "#"]:
Encoding:
"we" → length 2 → "2#we"
"say" → length 3 → "3#say"
":" → length 1 → "1#:"
"yes" → length 3 → "3#yes"
"#" → length 1 → "1##"
Encoded: "2#we3#say1#:3#yes1##"
Decoding:
"2#we3#say1#:3#yes1##"
↑
Read until '#' → "2", read 2 chars → "we"
"2#we3#say1#:3#yes1##"
↑
Read until '#' → "3", read 3 chars → "say"
"2#we3#say1#:3#yes1##"
↑
Read until '#' → "1", read 1 char → ":"
"2#we3#say1#:3#yes1##"
↑
Read until '#' → "3", read 3 chars → "yes"
"2#we3#say1#:3#yes1##"
↑
Read until '#' → "1", read 1 char → "#"
Result: ["we", "say", ":", "yes", "#"] ✓
10.7.5.2 Analysis
- Time Complexity:
- Encode: O(n) where n is the total number of characters across all strings
- We iterate through each string once and append characters
- String concatenation using StringBuilder is O(1) amortized
- Decode: O(n) where n is the length of the encoded string
- Single pass through the encoded string
- Substring extraction is O(length of substring) but total work is O(n)
- Total: O(n)
- Encode: O(n) where n is the total number of characters across all strings
- Space Complexity: O(n)
- Encode: StringBuilder storage proportional to output size
- Decode: Result list storage proportional to total string length
- Both are O(n) where n is total characters
10.7.5.3 Implementation Steps
Encode Algorithm:
- Create a
StringBuilderto build the encoded string - For each string in the input list:
- Append the string’s length
- Append the delimiter
'#' - Append the string content
- Return the final encoded string
Decode Algorithm:
- Initialize an empty result list
- Initialize pointer
i = 0to track position in encoded string - While
i < s.length():- Find the position of next
'#'delimiter starting fromi - Extract substring from
ito'#'position and parse as integer (this is the length) - Move pointer past the
'#':i = delimiter_position + 1 - Extract substring of exactly
lengthcharacters starting fromi - Add the extracted string to result list
- Move pointer forward by
length:i += length
- Find the position of next
- Return the result list
10.7.5.4 Code - Java
public class Codec {
private static final char DELIM = '#';
// Encodes a list of strings to a single string.
public String encode(List<String> strs) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < strs.size(); i++) {
String str = strs.get(i);
// CRITICAL: MUST NOT use sb.append(str.length() + DELIM + str)
// That causes operator precedence issues - Java will treat int + char as arithmetic!
sb.append(str.length()).append(DELIM).append(str);
}
return sb.toString();
}
// Decodes a single string to a list of strings.
public List<String> decode(String s) {
List<String> result = new ArrayList<>();
int i = 0;
while (i < s.length()) {
// Read length digits until we hit DELIM
// Example: "3#abc" where i=0, j will stop at position 1
int j = i;
while (s.charAt(j) != DELIM) {
j++;
}
// Parse the length
int length = Integer.parseInt(s.substring(i, j));
// Skip the DELIM character
j++;
// Extract exactly 'length' characters
int k = j + length;
String str = s.substring(j, k);
result.add(str);
// Move to the next encoded string
i = k;
}
return result;
}
}