Longest Duplicate Substring

Given a string S, consider all duplicated substrings: (contiguous) substrings of S that occur 2 or more times.  (The occurrences may overlap.)

Return any duplicated substring that has the longest possible length.  (If S does not have a duplicated substring, the answer is "".)

Example 1:

Input: "banana"
Output: "ana"

Example 2:

Input: "abcd"
Output: ""

Note:

  1. 2 <= S.length <= 10^5
  2. S consists of lowercase English letters.

 class Solution {
    public int search(int L, int a, long modulus, int n, int[] nums) {
  long h = 0;
  for(int i = 0; i < L; ++i) h = (h * a + nums[i]) % modulus;

  HashSet<Long> seen = new HashSet();
  seen.add(h);
  long aL = 1;
  for (int i = 1; i <= L; ++i) aL = (aL * a) % modulus;

  for(int start = 1; start < n - L + 1; ++start) {
    h = (h * a - nums[start - 1] * aL % modulus + modulus) % modulus;
    h = (h + nums[start + L - 1]) % modulus;
    if (seen.contains(h)) return start;
    seen.add(h);
  }
  return -1;
}

public String longestDupSubstring(String S) {
  int n = S.length();
  int[] nums = new int[n];
  for(int i = 0; i < n; ++i) nums[i] = (int)S.charAt(i) - (int)'a';
  int a = 26;
  long modulus = (long)Math.pow(2, 32);

  int left = 1, right = n;
  int L;
  while (left != right) {
    L = left + (right - left) / 2;
    if (search(L, a, modulus, n, nums) != -1) left = L + 1;
    else right = L;
  }

  int start = search(left - 1, a, modulus, n, nums);
  return start != -1 ? S.substring(start, start + left - 1) : "";
}
}


Most popular algorithms for solving string problems are Aho-CorasickKMPand Rabin-Karp.

Before proceeding to explanation, please go through Rolling Hash for it's basic understanding. It is bit harder to understand solution, if you are new to Rolling Hash.

Here, we are going to solve problem using binary search and rolling hash. Suffix array is also one of the way to solve this solution, but it is quiet long.

Solution can be come up using hints provided on Leetcode.

Below are steps to be followed to solve this problem..,
1) Convert string to integer array.[For hashing purposes] While converting, store each value ascii by subtracting ascii value of 'a'.
2) Base value for rolling hash to be set as 26 and modulus value also need to be set. The reason for setting modulus value is to avoid overflow during hashing.
3) Then we need to do binary search and for this, left will be 1 and right will be n(length of string)
4) L will store the repeated length substring.
5) Do binary search for a string and call search method.
6) In Search method, initially do hash value for all integers in a array.
7) A constant value to be added in hash value along with modulus for rolling hash.
8) A HashSet is created inorder to store seen hashes, which means it records all hash value.
9) While iterating, if any value is already found in seen set, it is returned. Else, it is added to seen set as new value.
10) Repeat above steps from Step6 to Step 9, until binary search faile.
11) After that finally do a search on left-1 to n.
12) Finally after step 11, if start is equal to -1 then return empty string (""). Else, return substring(start, start+left-1).



Click Follow button to receive updates from us instantly.

Please let us know about your views in comment section.

[Thanks to MassiveAlgorithms]

Happy Coding...😍

Comments

Popular posts from this blog

First Unique Character in a String

Balanced Binary Tree

Majority Element

Sequential Digits