Java regexp not contain

How would you use a regular expression to ignore strings that contain a specific substring?

How would I go about using a negative lookbehind(or any other method) regular expression to ignore strings that contains a specific substring? I’ve read two previous stackoverflow questions:
java-regexp-for-file-filtering
regex-to-match-against-something-that-is-not-a-specific-substring
They are nearly what I want. my problem is the string doesn’t end with what I want to ignore. If it did this would not be a problem. I have a feeling this has to do with the fact that lookarounds are zero-width and something is matching on the second pass through the string. but, I’m none too sure of the internals. Anyway, if anyone is willing to take the time and explain it I will greatly appreciate it. Here is an example of an input string that I want to ignore: 192.168.1.10 — — [08/Feb/2009:16:33:54 -0800] «GET /FOO/BAR/ HTTP/1.1» 200 2246 Here is an example of an input string that I want to keep for further evaluation: 192.168.1.10 — — [08/Feb/2009:16:33:54 -0800] «GET /FOO/BAR/content.js HTTP/1.1» 200 2246 The key for me is that I want to ignore any HTTP GET that is going after a document root default page. Following is my little test harness and the best RegEx I’ve come up with so far.

public static void main(String[] args) < String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/ HTTP/1.1\" 200 2246"; //String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/content.js HTTP/1.1\" 200 2246"; //String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/content.js HTTP/"; // This works //String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/ HTTP/"; // This works String inRegEx = "^.*(?:GET).*$(?else < System.out.printf("No match found.%n"); >> catch (PatternSyntaxException pse) < System.out.println("Invalid RegEx: " + inRegEx); pse.printStackTrace(); >> 

so, you’re only interested in something that’s explicitly requesting a «file» (e.g. /path/to/file.txt) and not something pointing at a «directory» (e.g. /path/to/) Is the only requirement that the requested URI end with some «extension» (.js in your example)?

Читайте также:  Мультиклассы

Correct on the first question. I only want «files» and not «directories.» The file name and extension don’t matter. just want to ignore requests to the document root

Источник

Which regular expression to find a string not containing a substring?

I am working on a simple tool to check Java coding guidelines on a project. One of those guidelines is to verify that there is no variable declared like «private static . «, only «private static final . » is allowed. I am wondering how I can get this result. I wrote this one:

pattern = "private\\\s*static\\\s*(?!final)"; 

Then the pattern wouldn’t match it, as intended, because private\\s*static doesn’t match private final static . It’s not an issue, Med.

5 Answers 5

That should work, yes. You might want to move the second whitespace inside the lookahead:

pattern = "private\\s*static(?!\\s*final)"; 

I think that you are going about this problem the wrong way. Writing a half-decent style checker is a difficult task, especially if you are going to cope with all of the possible «trivial» variations of constructs (e.g. different modifier orders) and all points of potential fragility (e.g. «hits» on stuff in comments and string literals).

IMO, a better approach would be to use an existing source code checker and define your own style checking rules. This is easy to do in PMD. PMD has the advantage that its rules operate on a parsed AST. This makes them much less sensitive to syntactic variations, etc than anything implemented using regex matches on source files.

using \s+ to require the space made it start working for me.

I also added the .+ to match the rest of the line which might not be what you’re after.

Источник

Regex not to match a set of Strings

How to construct a regex not to contain a set of strings within. For this example, I want to validate the Address Line 1 text box so that it wont contain any secondary address parts such ‘Apt’, ‘Bldg’,’Ste’,’Unit’ etc.

4 Answers 4

A regex can be used to verify that a string does not contain a set of words. Here is a tested Java code snippet with a commented regex which does precisely this:

if (s.matches("(?sxi)" + "# Match string containing no 'bad' words.\n" + "^ # Anchor to start of string.\n" + "(?: # Step through string one char at a time.\n" + " (?! # Negative lookahead to exclude words.\n" + " \\b # All bad words begin on a word boundary\n" + " (?: # List of 'bad' words NOT to be matched.\n" + " Apt # Cannot be 'Apt',\n" + " | Bldg # or 'Bldg',\n" + " | Ste # or 'Ste',\n" + " | Unit # or 'Unit'.\n" + " ) # End list of words NOT to be matched.\n" + " \\b # All bad words end on a word boundary\n" + " ) # Not at the beginning of bad word.\n" + " . # Ok. Safe to match this character.\n" + ")* # Zero or more 'not-start-of-bad-word' chars.\n" + "$ # Anchor to end of string.") ) < // String has no bad words. System.out.print("OK: String has no bad words.\n"); >else < // String has bad words. System.out.print("ERR: String has bad words.\n"); >

This assumes that the words must be «whole» words and that the «bad» words should be recognized regardless of case. Note also, (as others have correctly stated), that this is not as efficient as simply checking for the presence of bad words and then taking the logical NOT.

+1 awesome. 🙂 I still don’t get it. Especially a ‘negative lookahead’ (is this a lookbehind?) is confusing. Do you know a page with stepwise introduction to all this possibilities? Another question: (?sxi) at the beginning means what? I guess i:=ignore case?

@user unknown: The (?sxi) expression at the start sets the single-line , free-spacing and ignorecase modifier flags. There is an excellent online tutorial at: www.regular-expressions.info. The time you spend there will pay for itself many times over. Happy regexing!

Thanks. I’m a regular user of sed and learned the basics of regexes — Groups [a-n], [::alnum::], quantifieres (a?b+c)<3,4>*, capturing groups «(ab*c?)» «$1» (\1 in sed) and start-end of string/line ^$, negation of alternatives [^m-z]. But ?: and ?! are still aliens for me. Maybe I find them between your links. 🙂

@user unknown: If you want to really know regex (in the Neo: «I know kung-fu!» sense), then the absolute best way is to sit down and read Jeffrey Friedl’s classic: Mastering Regular Expressions (3rd Edition) This is, hands down, the most useful book I have ever read. (The slashdot review rated it 11 out of 10!)

Rather than trying to construct a regex to match strings that don’t contain these substrings, why not construct a regex to match strings that do contain one or more of them? Then if that regex returns true , you know that you have an invalid string.

thanks for the reply, but I would need a regex that wont match because i would need to pass it as a parameter to a JSF tag.

@greg: Ok. You should add that fact to your question. Also, constructing the complementary (i.e. opposite) regex is probably going to be very tricky indeed! So you may need to come up with an alternative strategy.

A more theoretical answer:

Deterministic Finite Automata have a one-to-one correspondence with regular expressions; that is, for every regular language, you can construct a DFA that will accept exactly the strings that are contained in the regular language. And, for every regular language, you can construct a regular expression that will match only the strings that are in that language. Thus, for any regular expression, you can construct a DFA that accepts exactly the same strings, and vice versa.

A Non-Deterministic Finite Automaton (NFA) can be turned into a Deterministic Finite Automaton (DFA) by constructing a DFA state for every combination of states in the NFA. (This is |Q| 2 states, which is a finite number.)

With that knowledge, we can reverse a DFA A and produce a DFA A’ which accepts every string that A rejects, and rejects every string that A accepts.

This can be done by turning all of the end states into temporary start states, and the start state into an end state. Then, we proceed to add epsilon-transitions from a new starting state to every one of these temporary start states to make it a valid NFA (epsilon-NFA, if you want to nitpick). Then, we turn it into a DFA as we know we can do.

The only remaining step is to turn our new DFA into a regular expression. The algorithm for this is stupidly simple: for every path from start to end states, we include that in the regular expression by using | (or) for every branch, concatenation for serial states, and * (kleene closure) for every loop.

Источник

Regular Expressions: A String should not contain the word «TEST»

send pies

posted 13 years ago

  • Report post to moderator
  • The following regular expression matches for all Strings that contains word TEST, not matter if in the beginning or in the end:

    How would the regular expression look like that matches for all Strings that do not contain the word TEST?

    For testing I am using the following code:

    I know that it is possible to program this with the indexOf method of the String class, but I need to implement this using regular expressions.

    send pies

    posted 13 years ago

  • Report post to moderator
  • send pies

    posted 13 years ago

  • Report post to moderator
  • What’s wrong with the code you have ? If matcher.matches() returns false, the string doesn’t contain TEST.

    send pies

    posted 13 years ago

  • Report post to moderator
  • JDBCSupport — An easy to use, light-weight JDBC framework

    send pies

    posted 13 years ago

  • Report post to moderator
  • You must use negative lookahead construct:
    This will match only if string doesn’t contain TEST:

    send pies

    posted 13 years ago

  • Report post to moderator
  • Why simple when you can do it incredibly complicated 🙂

    JDBCSupport — An easy to use, light-weight JDBC framework

    send pies

    posted 13 years ago

  • Report post to moderator
  • Thank you for all your help guys! Of course you are right, the simplest code ist the best for the given problem.
    But I wanted to figure out how to solve it with an regular expression. Thank you Ireneusz Kordal for your correct regular expression!

    Delivers the following answer I needed:

    This program delivers false when a String contains the String «TEST» and true otherwise (It is forbidden to have the Substring «TEST» in a String)
    OK! String «TEST» delivers false
    OK! String «TEST TEST» delivers false
    OK! String «123 TEST 456» delivers false
    OK! String «Peter» delivers true

    Источник

    Оцените статью