#64 ๐ 3163. String Compression III ๐ง ๐
When working with large strings, especially those filled with repeating characters, compression can save valuable space. In LeetCode problem #3163: String Compression III, weโre asked to create a compressed version of a string while following a specific rule: we can capture only up to 9 consecutive characters of the same type at a time. This twist challenges us to think carefully about how to segment the string efficiently without missing any repetitions.
Whether youโre just starting with string manipulation or looking to brush up on your algorithmic skills, this problem is a great one to dissect. Letโs jump into it and solve it step-by-step! ๐
Problem Statement ๐
Given a string word
, we need to compress it using the following algorithm:
- Initialize an empty string
comp
. - While
word
has characters left:- Identify the longest prefix of
word
that consists of the same characterc
but does not exceed 9 consecutive occurrences. - Remove this prefix from
word
. - Append the length of this prefix followed by
c
tocomp
.
- Identify the longest prefix of
- Once weโve processed the entire string,
comp
holds the compressed version ofword
.
Examples ๐
Example 1:
- Input:
word = "abcde"
- Output:
"1a1b1c1d1e"
Explanation: Here, each character appears only once, so we append "1"
followed by each character to get "1a1b1c1d1e"
.
Example 2:
- Input:
word = "aaaaaaaaaaaaaabb"
- Output:
"9a5a2b"
Explanation:
- We start with
"aaaaaaaaa"
(9a
s), which we encode as"9a"
. - Then we capture
"aaaaa"
(5a
s) as"5a"
. - Finally,
"bb"
is encoded as"2b"
.
Constraints
- String length:
1 <= word.length <= 200,000
word
contains only lowercase English letters.
Initial Thoughts ๐ค
The problem is all about grouping characters into clusters of up to 9 occurrences. This limit on the number of characters in each group forces us to consider only the first 9 repetitions of each character at a time. Hereโs the plan:
- Traverse
word
character by character, counting consecutive occurrences until we reach either a new character or the limit of 9 repetitions. - After each count, add the count and character to
comp
and proceed with the next group of the same character if more instances remain.
Edge Cases ๐งฉ
Considering edge cases helps us ensure our solution is robust:
- Single Character String:
word = "a"
. Since only one character exists, the output should be"1a"
. - String of Unique Characters:
word = "abcdefg"
. With each character unique, each will have only one occurrence:"1a1b1c1d1e1f1g"
. - All Characters Repeat but are Fewer Than 9:
word = "aaa"
. The output here should be"3a"
. - Very Long String of a Single Character:
word = "aaaaaaaaaaaaaaaa"
. With 16a
s, this tests the segmentation rule. Expected output:"9a7a"
, as we capture 9 and then the remaining 7. - Mixed Repeats and Singles:
word = "aaaabbbccddeeeeeeeeee"
. Here, the pattern and limit constraints will apply to different groups.
Approach 1: Basic Solution ๐ ๏ธ
Strategy ๐
In this approach, weโll keep a pointer to mark the start of each new group. For each character in the string:
- Count consecutive occurrences of the character until either the character changes or we reach the max of 9.
- Append the count and character to the result and move to the next character group.
Using this approach, we can process each group in constant time.
Complexity Analysis ๐ฐ๏ธ
- Time Complexity: O(n) - We pass through each character in
word
once. - Space Complexity: O(n) - Storing the compressed string in
comp
.
Basic Solution Code in Python ๐
def compress_string(word: str) -> str:
comp = ""
i = 0
while i < len(word):
char = word[i]
count = 0
# Count up to 9 occurrences of the current character
while i < len(word) and word[i] == char and count < 9:
i += 1
count += 1
# Append the count and character to the result
comp += str(count) + char
return comp
Optimized Solution ๐
To make our approach even more efficient, weโll use a list to build comp
rather than concatenating strings. This is because concatenating strings repeatedly is inefficient in Python due to creating new strings each time.
Optimized Solution Code in Python ๐
def compress_string(word: str) -> str:
comp = []
i = 0
while i < len(word):
char = word[i]
count = 0
# Count up to 9 occurrences of the current character
while i < len(word) and word[i] == char and count < 9:
i += 1
count += 1
comp.append(f"{count}{char}")
return "".join(comp)
Complexity
This approach shares the same complexity as the first:
- Time Complexity: O(n)
- Space Complexity: O(n)
Example Walkthrough ๐
Letโs look at a more detailed example to see the solution in action.
Example 1: word = "aaaaaaaaaaaaaabb"
- Initialize
comp
as an empty list. - Count the first 9
a
s:- We count up to the limit, so
count = 9
andcomp = ["9a"]
.
- We count up to the limit, so
- Move to the next group of
a
s:- We count the remaining 5
a
s:count = 5
andcomp = ["9a", "5a"]
.
- We count the remaining 5
- Move to
b
s:- Count 2
b
s:comp = ["9a", "5a", "2b"]
.
- Count 2
Example 2: word = "aaaaabbbbccccc"
Hereโs how it breaks down:
- First group of 5
a
s:comp = ["5a"]
. - Then 4
b
s:comp = ["5a", "4b"]
. - Finally, 5
c
s:comp = ["5a", "4b", "5c"]
.
Edge Case Walkthrough ๐
Letโs walk through a few tricky cases to ensure our solution is handling edge cases correctly.
Case 1: Single Character String
- Input:
word = "a"
- Output:
comp = ["1a"]
, resulting in"1a"
Case 2: Completely Unique Characters
- Input:
word = "abcdefg"
- Output:
comp = ["1a", "1b", "1c", "1d", "1e", "1f", "1g"]
, which translates to"1a1b1c1d1e1f1g"
Case 3: String of 16 Identical Characters
- Input:
word = "aaaaaaaaaaaaaaaa"
- Output:
comp = ["9a", "7a"]
, so we get"9a7a"
Conclusion ๐
Weโve solved String Compression III by taking advantage of character repetition patterns and efficiently building compressed groups. Hereโs what we learned:
- Constraints can guide our solution: Limiting to 9 repetitions per group let us decide to segment each character group into smaller, manageable clusters.
- Building with lists is efficient: By using a list to construct our final string, we reduced unnecessary operations, keeping our solution optimal.
- Edge cases are essential: Single characters, unique characters, and large repetitions helped verify the robustness of our solution.
Takeaways ๐
- Leveraging the problemโs constraints allows us to find optimal solutions.
- Efficient string operations can make a big difference in performance, especially for large inputs.
- Carefully handling edge cases leads to a more reliable solution.
With this knowledge, youโre now ready to tackle similar problems involving character manipulation and compression. Happy coding! ๐