Given a string, find the longest palindromic substring in linear time i.e. substring that is also a palindrome. Subsequently, Find all palindromic substrings of a given string
For example, longest palindromic substring of S = “babaabca” is “baab”. In a previous post I described a simple O(n^2) time solution for this problem. But there is a very beautiful and completely mathematical solution to find longest palindromic substring in linear time. This is called Manacher’s Algorithm.
Manacher’s Algorithm
First of all lets observe closely to a palindrome in order to find some interesting properties. For example, S1 = “abaaba” and S2=”abcba”, both are palindrome but what is the non-trivial (i.e. not length or characters) difference between them? S1 is a palindrome centered around the invisible space between i=2 and i=3 (non-existent space!). On the other hand S2 is centered around character at i=2 (ie. c). In order to graciously handle the center of a palindrome irrespective of the odd/even length, lets transform the palindrome by inserting special character $ in between characters. Then S1=”abba” and S2=”abcba” will be transformed into T1=”$a$b$a$a$b$a$” centered at i=6 and T2=”$a$b$c$b$a$” centered at i=5. Now, we can see that centers are existent and lengths are consistent 2*n+1, where n=length of original string. For example,
i' c i ----------------------------------------------------- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| ----------------------------------------------------- T1=| $ | a | $ | b | $ | a | $ | a | $ | b | $ | a | $ | -----------------------------------------------------
Next, observe that from the symmetric property of a (transformed) palindrome T around the center c, T[c-k] = T[c+k] for 0<= k<= c. That is positions c-k and c+k are mirror to each other. Let’s put it another way, for an index i on the right of center c, the mirror index i’ is on the left of c such that c-i’=i-c => i’=2*c-i and vice versa. That is,
For each position i on the right of center c of a palindromic substring, the mirror position of i is, i’=2*c-i, and vice versa.
Let us define an array P[0..2*n] such that P[i] equals to the length of the palindrome centered at i. Note that, length is actually measured by number of characters in the original string (by ignoring special chars $). Also let min and max be respectively the leftmost and rightmost boundary of a palindromic substring centered at c. So, min=c-P[c] and max=c+P[c]. For example, for palindrome S=”abaaba”, the transformed palindrome T, mirror center c=6, length array P[0..12], min=c-P[c]=6-6=0, max=c+P[c]=6+6=12 and two sample mirrored indices i and i’ are shown in the following figure.
min i' c i max ----------------------------------------------------- | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| ----------------------------------------------------- T=| $ | a | $ | b | $ | a | $ | a | $ | b | $ | a | $ | ----------------------------------------------------- P=| 0 | 1 | 0 | 3 | 0 | 5 | 6 | 1 | 0 | 3 | 0 | 1 | 0 | -----------------------------------------------------
With such a length array P, we can find the length of longest palindromic substring by looking into the max element of P. That is,
P[i] is the length of a palindromic substring with center at i in the transformed string T, ie. center at i/2 in the original string S; Hence the longest palindromic substring would be the substring of length P[imax] starting from index (imax-P[imax])/2 such that imax is the index of maximum element in P.
Let us draw a similar figure in the following for our non-palindromic example string S=”babaabca”.
min c max |----------------|-----------------| -------------------------------------------------------------------- idx= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16| --------------------------------------------------------------------- T=| $ | b | $ | a | $ | b | $ | a | $ | a | $ | b | $ | c | $ | a | $ | --------------------------------------------------------------------- P=| 0 | 1 | 0 | 3 | 0 | 3 | 0 | 1 | 4 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | ---------------------------------------------------------------------
Question is how to compute P efficiently. The symmetric property suggests the following cases that we could potentially use to compute P[i] by using previously computed P[i’] at the mirror index i’ on the left, hence skipping a lot of computations. Let’s suppose that we have a reference palindrome (first palindrome) to start with.
-
A third palindrome whose center is within the right side of a first palindrome will have exactly the same length as that of a second palindrome anchored at the mirror center on the left side, if the second palindrome is within the bounds of the first palindrome by at least one character.
For example in the following figure with the first palindrome centered at c=8 and bounded by min=4 and max=12, length of the third palindrome centered at i=9 (with mirror index i’= 2*c-i = 7) is, P[i] = P[i’] = 1. This is because the second palindrome centered at i’ is within the bounds of first palindrome. Similarly, P[10] = P[6] = 0.
|<---3rd--->| |<---2nd--->| |<----------1st Palindrome-------->| min i' c i max |------------|---|---|-------------| -------------------------------------------------------------------- idx= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16| --------------------------------------------------------------------- T=| $ | b | $ | a | $ | b | $ | a | $ | a | $ | b | $ | c | $ | a | $ | --------------------------------------------------------------------- P=| 0 | 1 | 0 | 3 | 0 | 3 | 0 | 1 | 4 | ? | ? | ? | ? | ? | ? | ? | ? | ---------------------------------------------------------------------
Now, question is how to check this case? Note that, due to symmetric property length of segment [min..i’] is equals to the length of segment [i..max]. Also, note that 2nd palindrome is completely within 1st palindrome iff left edge of the 2nd palindrome is inside the left boundary, min of the 1st palindrome. That is,
i'-P[i'] >= min =>P[i']-i' < -min [negate] =>P[i'] < i'-min =>P[i'] < max-i [(max-i)=(i'-min) due to symmetric property].
Combining all the facts in case 1,
P[i] = P[i’], iff (max-i) > P[i’]
-
If the second palindrome meets or extends beyond the left bound of the first palindrome, then the third palindrome is guaranteed to have at least the length from its own center to the right outermost character of the first palindrome. This length is the same from the center of the second palindrome to the left outermost character of the first palindrome.
For example in the following figure, second palindrome centered at i=5 extends beyond the left bound of the first palindrome. So, in this case we can’t say P[i]=P[i’]. But length of the third palindrome centered at i=11, P[i] is at least the length from its center i=11 to the right bound max=12 of first palindrome centered at c. That is, P[i]>=1. This means third palindrome could be extended past max if and only if next immediate character past max matches exactly with the mirrored character, and we continue this check beyond. For example, in this case P[13]!=P[9] and it can’t be extended. So, P[i] = 1.
|<------2nd palindrome----->| |<---3rd----|-->? |<----------1st Palindrome-------->| min i' c i max |----|-----------|-----------|-----| -------------------------------------------------------------------- idx= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16| --------------------------------------------------------------------- T=| $ | b | $ | a | $ | b | $ | a | $ | a | $ | b | $ | c | $ | a | $ | --------------------------------------------------------------------- P=| 0 | 1 | 0 | 3 | 0 | 3 | 0 | 1 | 4 | 1 | 0 | ? | ? | ? | ? | ? | ? | ---------------------------------------------------------------------
So, how to check this case? This is simply the failed check for case 1. That is, second palindrome will extend past left edge of first palindrome iff,
i'-P[i'] < min =>P[i']-i' >= -min [negate] =>P[i'] >= i'-min =>P[i'] >= max-i [(max-i)=(i'-min) due to symmetric property].
That is, P[i] is at least (max-i) iff (max-i)<=P[i’].
P[i]>=(max-i), iff (max-i) <= P[i’]
Now, if the third palindrome does extend beyond max then we need to update the center and the boundary of the new palindrome.
If the palindrome centered at i does expand past max then we have new (extended) palindrome, hence a new center at c=i. Update max to the rightmost boundary of the new palindrome.
Combining all the facts in case 1 and case 2, we can come up with a very beautiful little formulae:
Case 1: P[i] = P[i'], iff (max-i) > P[i'] Case 2: P[i]>=(max-i), iff (max-i) <= P[i'] That is, P[i] >= min(P[i'], max-i).
That is, P[i]=min(P[i’], max-i) when the third palindrome is not extendable past max. Otherwise, we have new third palindrome at center at c=i and new max=i+P[i].
-
Neither the first nor second palindrome provides information to help determine the palindromic length of a fourth palindrome whose center is outside the right side of the first palindrome.
That is, we can’t determine preemptively P[i] if i>max. That is,
P[i] = 0, iff max-i < 0
Combining all the cases, we conclude the formulae:
Case 1: P[i] = P[i'], iff (max-i) > P[i'] (no expansion) Case 2: P[i]>=(max-i), iff (max-i) <= P[i'] (check for expansion) Case 3: P[i] = 0, iff max-i < 0 That is, P[i] = (max-i)>0 ? min(P[i'], max-i) : 0.
P[i] = max>i ? min(P[i’], max-i) : 0. In case we can expand beyond max then we expand by matching characters beyond max with the mirrored character with respect to new center at c=i. Finally when we have a mismatch we update new max=i+P[i].
I have put all the above maths in the following simple code. Note that, the expansion is only necessary in the special case where P[i’]==max-i as we don’t know if the palindrome at P[i] may be longer. This is handled by setting P[i]=min(P[i’],max-i) and then always expanding. This way of doing it doesn’t increase the time complexity, because if no expansion is necessary, the time taken to do the expansion is constant. That is the inner while loop for expanding the palindrome centered at i takes at most a total of n steps, and then positioning and testing each centers take a total of n steps too. So, overall complexity is O(n).
public static String normalize(final String in) { final StringBuffer sb = new StringBuffer(); sb.append('$'); for (int i = 0; i < in.length(); i++) { sb.append(in.charAt(i)); sb.append('$'); } return sb.toString(); } public static String longestPalindromeLinear(final String in) { /* * Normalize the length of the string by inserting special character ‘$’ in the space between each pair of * characters in the input and the two outside edges. This is done merely to make the computation identical for * both odd and even length input string. */ final String S = normalize(in); int c = 0; // contains center of current palindrome int max = 0; // contains the right edge of current palindrome // P[i] contains the length of longest palindrome (in S, not original) centered at i final int[] P = new int[S.length()]; for (int i = 0; i < P.length; i++) { P[i] = 0; } // for each position find longest palindrome centered at the position, save length in P for (int i = 1; i < S.length() - 1; i++) { final int i_mirror = 2 * c - i; // as (C - i_mirror) = (i - C) due to symmetric property /* * When max-i > P[i_mirror] then palindrome at center i_prime contained completely within palindrome * centered at c. Else max-i <= P[i_mirror] means that the palindrome at ceter i_mirror doesn’t fully * contained in the palindrome centered at c. In that case palindrome at i could be extended past max. */ P[i] = (max > i) ? Math.min(P[i_mirror], max - i) : 0; // Try to expand the palindrome centered at i. if the palindrome centered at i could be extended past max // then extend max to the right edge of newly extended palindrome while ((i + P[i] + 1) < S.length() && (i - P[i] - 1) >= 0 && S.charAt(i + P[i] + 1) == S.charAt(i - P[i] - 1)) { P[i]++; } // If palindrome centered at was extend past max then update Center to i and update the right edge if (i + P[i] > max) { c = i; max = i + P[i]; } } return extractLongest(in, P); } public static String extractLongest(final String in, final int[] P) { int longestCenter = 0; int longestLength = 0; for (int i = 0; i < P.length; i++) { if (P[i] > longestLength) { longestLength = P[i]; longestCenter = i; } } final int offset = (longestCenter - longestLength) / 2; return in.substring(offset, offset + longestLength); } public Set<String> allPalindromicSubstring(final String in, final int[] P) { final HashSet<String> all = new HashSet<String>(); for (int i = 0; i < P.length; i++) { if (P[i] != 0) { final int offset = (i - P[i]) / 2; all.add(in.substring(offset, offset + P[i])); } } return all; }
Another beautiful thing to note about this algorithm that we can find all the palindromic substring of a given string in linear time by using the length array P[i]. This is because P[i] is actually suggesting a palindrome of length of P[i] centered at position i. So, we could simply traverse P[i] and calculate palindromic substring of length P[i] starting at S[(i-P[i]-1)/2]. I have also put the code for finding all the palindromic substring in the above snippet.
This post significantly uses information from this wiki page describing Manacher algorithm.