Word Break Problem

Given an input string and a dictionary of words, find out if the input string can be segmented into a space-separated sequence of dictionary words.

For example, Given a dictionary { i, like, ili, kesa, msun, sam, sung, mobile} then

text “ilikemobile” is breakable into [“i”,”like”,”mobile”] text “ilikesamobile” is breakable into [“ili”,”kesa”,”mobile”] text “ilikesammobile” is breakable into [“i”,”like”,”sam”,”mobile”] text “likeimobile” is breakable into [“like”,”i”,”mobile”]

but

text “ilikesa” is not brekable
text “likeis” is not brekable
text “ilikesamsun” is not brekable

We can solve it recursively by splitting at each prefix and search it in dictionary. If the prefix is present in dictionary then we recur for the suffix (rest of the string). If the recursive call for suffix returns true then we return true, otherwise we try next prefix. If we have tried all prefixes and none of them resulted in a solution, we return false.

 

public static boolean wordBreak(Set<String> dictionary, String text){
	//base case
	if(text.isEmpty()){
		return true;
	}
	//break the string at i+1 such that prefix text[...i] is in dict and suffix text[i+1...] is breakable
	for(int i = 0; i<text.length(); i++){
		if(dictionary.contains(text.substring(0, i+1)) && wordBreak(dictionary, text.substring(i+1))){
			return true;
		}
	}
	
	return false;
}

Repetitive Subproblems
The recursive solution has many subproblems that are repeating at various stage of the recursion. For example, for string abcd the recursion tree will look like

           abcd
          / \   \     
        bcd  cd  d
       /   \  |
      cd   d  d         

We can see that subproblems for “cd” is repeating twice. So we ca use DP to cache the solution

Dynamic Programming Solution
Straight forward dp solution in Java. We use dp[i] table to store results computed for each prefix string[0..i]. A prefix string[0..i] is breakable if and only if either the whole prefix is contained in dictionary ot there is a smaller prefix at index j < i such that prefix string[0..j] is breakable and substring(j+1..i) is contained in the dictionary. Below is the implementation of the above using DP that runs in O(n^2) time.

public static boolean wordBreakDP(Set<String> dictionary, String text){
	int n = text.length();
	if(n == 0){
		return true;
	}
	
	//dp[i] = true if there is a solution in prefix text[0..i]
	boolean[] dp = new boolean[n];  
	
	//try all possible prefixes
	for(int i = 0; i< n; i++){
		if(dictionary.contains(text.substring(0, i+1))){
			dp[i] = true;
		}
		else{
			//Try to break in each position j < i
			for(int j = 0; j < i; j++){
				if(dp[j] && dictionary.contains(text.subSequence(j+1, i+1))){
					dp[i] = true;
					break;
				}
			}
		}
		
	}
	
	return dp[n-1];
}

 

Find All Possible Valid Sentences by Breaking
We do the similar recursion but with a DP map to cache the subproblem solutions.

 

public static HashMap<String, ArrayList<String>> wordBreakMap = new HashMap<String, ArrayList<String>>();
public static ArrayList<String> wordBreakAll(Set<String> dictionary, String text){
	//if already computed the current substring text then return from map
	if(wordBreakMap.containsKey(text)){
		return wordBreakMap.get(text);
	}
	ArrayList<String>  result = new ArrayList<String>();
	
	//if the whole word is in the dictionary then we add this to final result
	if(dictionary.contains(text)){
		result.add(text);
	}
	
	//try each prefix and extend
	for(int i = 0; i< text.length(); i++){
		String prefix = text.substring(0, i+1);
		if(dictionary.contains(prefix)){
			//extend
			String suffix = text.substring(i+1);
			ArrayList<String> subRes = wordBreakAll(dictionary, suffix);
			for(String word : subRes){
				result.add(prefix+" "+word);
			}
		}
	}
	
	wordBreakMap.put(text, result);
	return result;
}

Leave a Reply

Your email address will not be published. Required fields are marked *