09 March 2019

Codeforces Round #543 F(Div2) (Compress String)

Hi Everyone,

Writing a post after long time, encountered a nice question this week and thought of writing a post about it. Here it goes:

Problem Statement:
Given a string S, you need to encode it such that cost of encoding (definer later) is minimum. Cost of encoding is defined in below two ways assuming our S is encoded into k substrings (t1_t2_t3_t4__tk).
1. Cost is A if |ti| = 1
2. Cost is B if |ti| > 1, you can only do this if "ti" is substring of "t1_t2_t(i-1)".
For more understanding please look at : https://codeforces.com/contest/1121/problem/F

My Approach:
I started from how to build encoded string. If we observe that first substring "t1" must have length 1 because there is no "t0". Thus for first substring we need to pay A amount.
Now what happens after "t1", we have some subtring "t1_t(i-1)" when we are choosing ith substring, what would be best case scenario here? If length of our "ti" is maximum, that is best deal we can get by giving amount B, now how to find maximum length of substring which we can add using B amount of money? Let's try to observe:
We had "t1_t2_t3_t(i-1)" and we need "ti", if we try to come back to string S, we are at some index "idx" and want to find max "j" such that substring of S from (idx to j) is already present in substring of S(0 to idx-1). Basically we need to find S(idx to endOfString) in S(0 to idx-1) and if it's not present, find maximum match which was there. This can be done very easily in O(n^2) for every index. But we need to do better as O(n^2) for every index will become O(n^3) and will give TLE.
Is it somewhat similar to finding if a string exists in another string? We need to find S(idx to endOfString) maximum prefix present in S(0 to idx-1). So if we perform same KMP here and find out the maximum length of prefix present in S(0 to idx-1), we will know what maximum length we can have for "ti".
But still we have two options:
1. Add all these characters one by one using A amount every time.
2. Add all of these in one go spending B amount once.

So depending on what is minimum we should choose our option.

How to Implement:
We can maintain a simple 2-D DP having two values for every index, first will be answer if we chose A amount to add this character and further solved for next indices, second will be if we chose B amount to add maximum length substring which we can add and then solved further.

Implementation:
Let me know if there is any mistake in the post or anything is not clear. Thanks for reading.