Check if a string is a substring of another string
I read article with nice exercise about checking is string is a substring of another. The content of the exercise is:
Write a program that takes 2 string parameters from the command line. Program must verify if the second string is a substring of the first string (you cannot use substr, substring or any other standard function including regular expression libraries). Each occurrence of * in the second substring means that it can be a match for zero or more characters of the first string. Consider the example: Input string 1: abcd Input string 2: a*c Program should evaluate that the string 2 is a substring of the string 1. Additionally asterisk (*) may be considered as a regular character, if it is preceded by a backslash (\). Backslash (\) is considered as a regular character in all cases other than preceding the asterisk (*).
I wrote simple app, which firstly check that second string is NOT longer than first(but there is one problem, when test at («ab», «a*b») this is correct test, but the method fails):
public static boolean checkCharactersCount(String firstString, String secondString) < return (firstString.length() >0 && secondString.length() > 0) && (firstString.length() > secondString.length());
public static boolean checkSubstring(String firstString, String secondString) < int correctCharCounter = 0; int lastCorrectCharAtIndex = -1; for (int i = 0; i < secondString.length(); i++) < for (int j = 0; j < firstString.length(); j++) < if (j >lastCorrectCharAtIndex) < if ((secondString.charAt(i) == firstString.charAt(j)) || secondString.charAt(i) == '*') < correctCharCounter++; lastCorrectCharAtIndex = j; >if (correctCharCounter >= secondString.length()) return true; > > > return false; >
- My code as above does not support character continuity(for example test: checkSubstring(«abacd», «bcd») return true, but it is wrong! — should return false)
- Any idea how to verify special symbol as «\*» ? Sample to test (checkSubstring(«abc», «\b»)
How is Your vision of solution? 🙂
Side note: the escaping rules are somewhat strange, they don’t allow to specify a backslash followed by a wildcard.
@Henry Yep, hard to write it ;P We need to use double backslash(«\\»), that define us second backslash ass real backlash or other sign 😛
Note that your length check will rule out «ab» with «a*b», even though that should work, as the asterisk can stand for zero characters.
4 Answers 4
Try this: (Comments added for explanation)
// only for non empty Strings public boolean isSubString(String string1,String string2) < // step 1: split by *, but not by \* Listlist1 = new ArrayList(); char[]cs = string2.toCharArray(); int lastIndex = 0 ; char lastChar = 0 ; int i = 0 ; for(; i < cs.length ; ++i) < if(cs[i]=='*' && lastChar!='\\') < list1.add(new String(cs,lastIndex,i-lastIndex).replace("\\*", "*")); //earlier buggy line: //list1.add(new String(cs,lastIndex,i-lastIndex)); lastIndex = i + 1 ; >lastChar = cs[i]; > if(lastIndex < i ) < list1.add(new String(cs,lastIndex,i-lastIndex).replace("\\*", "*")); >// step 2: check indices of each string in the list // Note: all indices should be in proper order. lastIndex = 0; for(String str : list1) < int newIndex = string1.indexOf(str,lastIndex); if(newIndex < 0) < return false; >lastIndex = newIndex+str.length(); > return true; >
in case you are not allowed to use String.indexOf() then write a function public int indexOf(String string1,String string2, int index2) and replace this statement
int newIndex = string1.indexOf(str,lastInxdex);
int newIndex = indexOf(string1, str,lastInxdex);
Appendix A: The code i tested:
package jdk.conf; import java.util.ArrayList; import java.util.List; public class Test01 < public static void main(String[] args) < Test01 test01 = new Test01(); System.out.println(test01.isSubString("abcd", "a*c")); System.out.println(test01.isSubString("abcd", "bcd")); System.out.println(test01.isSubString("abcd", "*b")); System.out.println(test01.isSubString("abcd", "ac")); System.out.println(test01.isSubString("abcd", "bd")); System.out.println(test01.isSubString("abcd", "b*d")); System.out.println(test01.isSubString("abcd", "b\\*d")); System.out.println(test01.isSubString("abcd", "\\*d")); System.out.println(test01.isSubString("abcd", "b\\*")); System.out.println(test01.isSubString("a*cd", "\\*b")); System.out.println(test01.isSubString("", "b\\*")); System.out.println(test01.isSubString("abcd", "")); System.out.println(test01.isSubString("a*bd", "\\*b")); >// only for non empty Strings public boolean isSubString(String string1,String string2) < // step 1: split by *, but not by \* Listlist1 = new ArrayList(); char[]cs = string2.toCharArray(); int lastIndex = 0 ; char lastChar = 0 ; int i = 0 ; for(; i < cs.length ; ++i) < if(cs[i]=='*' && lastChar!='\\') < list1.add(new String(cs,lastIndex,i-lastIndex).replace("\\*", "*")); lastIndex = i + 1 ; >lastChar = cs[i]; > if(lastIndex < i ) < list1.add(new String(cs,lastIndex,i-lastIndex).replace("\\*", "*")); >// step 2: check indices of each string in the list // Note: all indices should be in proper order. lastIndex = 0; for(String str : list1) < int newIndex = string1.indexOf(str,lastIndex); if(newIndex < 0) < return false; >lastIndex = newIndex+str.length(); > return true; > >
true true true false false true false false false false false true true
I’d do this in a couple of stages.
Let’s call the potential substring p and the string which we’re testing for containing the substring s.
Break down the «contains» portion as a series of questions of «does p match starting at the Nth position of s?»; obviously you go through s from the first position onward to see if p matches at any position of s.
In matching, we have the possibility of running into a «*»; in that case, we want to know if the portion of p after the * is a substring of the portion of s after matching the portion of p up until the *. That suggests a recursive routine, taking the portion to be matched and the string to match and returning true/false. When you encounter a *, form the two new strings and call yourself.
If instead you encounter a \, then you just continue the regular matching with the next character instead of making the recursive call. Given that you need to do that, I suppose it might be easiest if you build pPrime from the original p, so that you can remove the backslash(es) when they’re encountered, analagous to removing the asterisk(s) from the wildcard matching.
I haven’t actually written any code on this, you only asked for approach.
Nice vision of solution! The temporary string without backslash sounds good 😉 I’m curious how you will solve it in the code
I found this a nice challenge to do. This exercise really forces us to think on a very low level of the language and algorithms in general. No lambdas, no streams, no regex, no find, no substring, nothing. Just the old CharAt, some fors and what not. Essentially I made a lookup method which looks up for the first character of the string to be found and then another lookup which takes into consideration your rules from that point onwards. If it fails, it goes back to the first index found, adds one and it does how many iterations necessary until the end of the string. If no match is found it’s supposed to return false. If only one is found, then that is enough to consider it a substring. The most important corner cases are considered at the beginning of the calculus so that if false is detected as certain that it doesn’t go further. Thus ‘*’ alone means any character match and we can escape it with \. I tried to include most corner cases and this was really a challenge. I’m not entirely sure if my code covers all cases for you but it should cover quite a few. I really wanted to help you so this is my approach and here is my code:
package com.jesperancinha.string; public class StringExercise < private static final char ASTERISK = '*'; private static final char BACKSLASH = '\\'; public boolean checkIsSubString(String mainString, String checkString) < int nextIndex = getNextIndex(0, checkString.charAt(0), mainString); if (nextIndex == -1) < return false; >boolean result = checkFromIndex(nextIndex, mainString, checkString); while (nextIndex < mainString.length() - 1 && nextIndex >-1) < if (!result) < nextIndex = getNextIndex(nextIndex + 1, checkString.charAt(0), mainString); if (nextIndex >-1) < result = checkFromIndex(nextIndex, mainString, checkString); >> else < return result; >> return result; > private int getNextIndex(int start, char charAt, String mainString) < if (charAt == ASTERISK || charAt == BACKSLASH) < return start; >for (int i = start; i < mainString.length(); i++) < if (mainString.charAt(i) == charAt) < return i; >> return -1; > private boolean checkFromIndex(int nextIndex, String mainString, String checkString) < for (int i = 0, j = 0; i < checkString.length(); i++, j++) < if (i < (checkString.length() - 2) && checkString.charAt(i) == BACKSLASH && checkString.charAt(i + 1) == ASTERISK) < i++; if (mainString.charAt(j + nextIndex) == BACKSLASH) < j++; >if (checkString.charAt(i) != mainString.charAt(j + nextIndex)) < return false; >> if (i > 0 && checkString.charAt(i - 1) != BACKSLASH && checkString.charAt(i) == ASTERISK) < if (i < checkString.length() - 1 && (j + nextIndex) < (mainString.length() - 1) && checkString.charAt(i + 1) != mainString.charAt(j + nextIndex + 1)) < i--; >else < if (j + nextIndex == mainString.length() - 1 && checkString.charAt(checkString.length() - 1) != ASTERISK && checkString.charAt(checkString.length() - 2) != BACKSLASH) < return false; >> > else < if ((j + nextIndex) < (mainString.length() - 2) && mainString.charAt(j + nextIndex) != checkString.charAt(i)) < return false; >> > return true; > >
I have made a set of unit tests but if I would put the whole class here it would be too long and the only thing I want to show you is the test cases I implemented in the unit tests. Here is the condensed version of my unit tests for this case:
package com.jesperancinha.string; import static org.assertj.core.api.Assertions.assertThat; import org.junit.jupiter.api.Test; class StringExerciseMegaTest < @Test void checkIsSubString() < StringExercise stringExercise = new StringExercise(); boolean test = stringExercise.checkIsSubString("abcd", "a*c"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("abcd", "a\\*c"); assertThat(test).isFalse(); test = stringExercise.checkIsSubString("a*c", "a\\*c"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("aasdsadasa*c", "a\\*c"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("aasdsadasa*csdfdsfdsfdsf", "a\\*c"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("aasdsadasa**csdfdsfdsfdsf", "a\\*c"); assertThat(test).isFalse(); test = stringExercise.checkIsSubString("aasdsadasa**csdfdsfdsfdsf", "a*c"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("aasdsadasa*csdfdsfdsfdsf", "a*c"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("aasdweriouiauoisdf9977675tyhfgh", "a*c"); assertThat(test).isFalse(); test = stringExercise.checkIsSubString("aasdweriouiauoisdf9977675tyhfgh", "dwer"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("aasdweriouiauoisdf9977675tyhfgh", "75tyhfgh"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("aasdweriou\\iauoisdf9977675tyhfgh", "riou\\iauois"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("aasdweriou\\*iauoisdf9977675tyhfgh", "riou\\\\*iauois"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("aasdweriou\\*iauoisdf9\\*977675tyhfgh", "\\\\*977675tyhfgh"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("aasdweriou\\*iauoisdf9\\*977675tyhfgh", "\\*977675tyhfgh"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("\\*aasdweriou\\*iauoisdf9\\*977675tyhfgh", "\\*aasdwer"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("*aasdweriou\\*iauoisdf9\\*977675tyhfgh", "*aasdwer"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("abcd", "bc"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("abcd", "zbc"); assertThat(test).isFalse(); test = stringExercise.checkIsSubString("abcd", "*bc*"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("*bcd", "\\*bc*"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("abcd", "a*c"); assertThat(test).isTrue(); test = stringExercise.checkIsSubString("abcd", "az*bc"); assertThat(test).isFalse(); >>
In Java, how do I check if a string contains a substring (ignoring case)? [duplicate]
I have two String s, str1 and str2 . How do I check if str2 is contained within str1 , ignoring case?
Both indexOf and contains go character by character, so if you need faster string searching (which you can get), then you would need to implement one of many published algorithms.
6 Answers 6
str1.toUpperCase().contains(str2.toUpperCase())
Original answer was using toLowerCase() method. But as some people correctly noticed, there are some exceptions in Unicode and it’s better to use toUpperCase() . Because:
There are languages knowing more than one lower case variant for one upper case variant.
@RadijatoR yes, it’s called indexOf, like int pos = str1.indexOf(str2) or case insensitive as int pos = str1.toLowerCase().indexOf(str2.toLowerCase())
String string = "Madam, I am Adam"; // Starts with boolean b = string.startsWith("Mad"); // true // Ends with b = string.endsWith("dam"); // true // Anywhere b = string.indexOf("I am") >= 0; // true // To ignore case, regular expressions must be used // Starts with b = string.matches("(?i)mad.*"); // Ends with b = string.matches("(?i).*adam"); // Anywhere b = string.matches("(?i).*i am.*");
Your «indexOf» example should use >= 0, not > 0, since 0 is valid if the substring occurs at the beginning of the string. (Doesn’t in your example, but could in other cases.) Added this response since people are obviously still searching and finding this answer.
If you are able to use org.apache.commons.lang.StringUtils, I suggest using the following:
String container = "aBcDeFg"; String content = "dE"; boolean containerContainsContent = StringUtils.containsIgnoreCase(container, content);
You can use the toLowerCase() method:
public boolean contains( String haystack, String needle ) < haystack = haystack == null ? "" : haystack; needle = needle == null ? "" : needle; // Works, but is not the best. //return haystack.toLowerCase().indexOf( needle.toLowerCase() ) >-1 return haystack.toLowerCase().contains( needle.toLowerCase() ) >
Notice that by creating your own method, you can reuse it. Then, when someone points out that you should use contains instead of indexOf , you have only a single line of code to change.
I also favor the RegEx solution. The code will be much cleaner. I would hesitate to use toLowerCase() in situations where I knew the strings were going to be large, since strings are immutable and would have to be copied. Also, the matches() solution might be confusing because it takes a regular expression as an argument (searching for «Need$le» cold be problematic).
Building on some of the above examples:
public boolean containsIgnoreCase( String haystack, String needle ) < if(needle.equals("")) return true; if(haystack == null || needle == null || haystack .equals("")) return false; Pattern p = Pattern.compile(needle,Pattern.CASE_INSENSITIVE+Pattern.LITERAL); Matcher m = p.matcher(haystack); return m.find(); >example call: String needle = "Need$le"; String haystack = "This is a haystack that might have a need$le in it."; if( containsIgnoreCase( haystack, needle) )
(Note: you might want to handle NULL and empty strings differently depending on your needs. I think they way I have it is closer to the Java spec for strings.)
Speed critical solutions could include iterating through the haystack character by character looking for the first character of the needle. When the first character is matched (case insenstively), begin iterating through the needle character by character, looking for the corresponding character in the haystack and returning «true» if all characters get matched. If a non-matched character is encountered, resume iteration through the haystack at the next character, returning «false» if a position > haystack.length() — needle.length() is reached.
How to check if a string contains a substring containing spaces?
Say I have a string like this in java: «this is
the spaces are a red herring — they are treated just like any other character and make no difference to any normal substring search function
6 Answers 6
If you are looking to see if a String contains another specific sequence of characters then you could do something like this :
String stringToTest = "blah blah blah"; if(stringToTest.contains("blah"))
You could also use matches. For a decent explanation on matching Strings I would advise you check out the Java Oracle tutorials for Regular Expressions at :
If you have any number of white space between each character of your matching string, I think you are better off removing all white spaces from the string you are trying to match before the search. I.e. :
String searchedString = "this is ok"; String stringToMatch = ""; boolean foundMatch = searchedString.replaceAll(" ", "").contains(stringToMatch.replaceAll(" ",""));
Put it all into a string variable, say s, then do s.contains(«); this will return true if is in s.
For this purpose you need to use String#contains(CharSequence) .
Note, there can be any number of white spaces in between the various characters.
For this purpose String#trim() method is used to returns a copy of the string, with leading and trailing whitespace omitted.
String myStr = "this is ok"; if (myStr.trim().contains("")) < //Do something. >