- Regex to match anything
- 4 Answers 4
- PHP Regular Expressions
- Syntax
- Regular Expression Functions
- Using preg_match()
- Example
- Using preg_match_all()
- Example
- Using preg_replace()
- Example
- Regular Expression Modifiers
- Regular Expression Patterns
- Metacharacters
- Quantifiers
- Grouping
- Example
- Complete RegExp Reference
- regex to match any string whether Unicode or not?
- empty regex matches any string
- 2 Answers 2
Regex to match anything
I know it seems a bit redundant but I’d like a regex to match anything. At the moment we are using ^*$ but it doesn’t seem to match no matter what the text. I do a manual check for no text but the test view we use is always validated with a regex. However, sometimes we need it to validate anything using a regex. i.e. it doesn’t matter what is in the text field, it can be anything. I don’t actually produce the regex and I’m a complete beginner with them.
4 Answers 4
The regex .* will match anything (including the empty string, as Junuxx points out).
Checking for nothing doesn’t matter too much as I’m doing that outside of the regex. I will try both though, thanks.
The chosen answer is slightly incorrect, as it wont match line breaks or returns. This regex to match anything is useful if your desired selection includes any line breaks:
[\s\S] matches a character that is either a whitespace character (including line break characters), or a character that is not a whitespace character. Since all characters are either whitespace or non-whitespace, this character class matches any character. the + matches one or more of the preceding expression^ is the beginning-of-line anchor, so it will be a «zero-width match,» meaning it won’t match any actual characters (and the first character matched after the ^ will be the first character of the string). Similarly, $ is the end-of-line anchor.
* is a quantifier. It will not by itself match anything; it only indicates how many times a portion of the pattern can be matched. Specifically, it indicates that the previous «atom» (that is, the previous character or the previous parenthesized sub-pattern) can match any number of times.
To actually match some set of characters, you need to use a character class. As RichieHindle pointed out, the character class you need here is . , which represents any character except newlines (and it can be made to match newlines as well using the appropriate flag). So .* represents * (any number) matches on . (any character). Similarly, .+ represents + (at least one) matches on . (any character).
PHP Regular Expressions
A regular expression is a sequence of characters that forms a search pattern. When you search for data in a text, you can use this search pattern to describe what you are searching for.
A regular expression can be a single character, or a more complicated pattern.
Regular expressions can be used to perform all types of text search and text replace operations.
Syntax
In PHP, regular expressions are strings composed of delimiters, a pattern and optional modifiers.
In the example above, / is the delimiter, w3schools is the pattern that is being searched for, and i is a modifier that makes the search case-insensitive.
The delimiter can be any character that is not a letter, number, backslash or space. The most common delimiter is the forward slash (/), but when your pattern contains forward slashes it is convenient to choose other delimiters such as # or ~.
Regular Expression Functions
PHP provides a variety of functions that allow you to use regular expressions. The preg_match() , preg_match_all() and preg_replace() functions are some of the most commonly used ones:
Function | Description |
---|---|
preg_match() | Returns 1 if the pattern was found in the string and 0 if not |
preg_match_all() | Returns the number of times the pattern was found in the string, which may also be 0 |
preg_replace() | Returns a new string where matched patterns have been replaced with another string |
Using preg_match()
The preg_match() function will tell you whether a string contains matches of a pattern.
Example
Use a regular expression to do a case-insensitive search for «w3schools» in a string:
Using preg_match_all()
The preg_match_all() function will tell you how many matches were found for a pattern in a string.
Example
Use a regular expression to do a case-insensitive count of the number of occurrences of «ain» in a string:
$str = «The rain in SPAIN falls mainly on the plains.»;
$pattern = «/ain/i»;
echo preg_match_all($pattern, $str); // Outputs 4
?>?php
Using preg_replace()
The preg_replace() function will replace all of the matches of the pattern in a string with another string.
Example
Use a case-insensitive regular expression to replace Microsoft with W3Schools in a string:
$str = «Visit Microsoft!»;
$pattern = «/microsoft/i»;
echo preg_replace($pattern, «W3Schools», $str); // Outputs «Visit W3Schools!»
?>?php
Regular Expression Modifiers
Modifiers can change how a search is performed.
Modifier | Description |
---|---|
i | Performs a case-insensitive search |
m | Performs a multiline search (patterns that search for the beginning or end of a string will match the beginning or end of each line) |
u | Enables correct matching of UTF-8 encoded patterns |
Regular Expression Patterns
Brackets are used to find a range of characters:
Expression | Description |
---|---|
[abc] | Find one character from the options between the brackets |
[^abc] | Find any character NOT between the brackets |
4 | Find one character from the range 0 to 9 |
Metacharacters
Metacharacters are characters with a special meaning:
Metacharacter | Description |
---|---|
| | Find a match for any one of the patterns separated by | as in: cat|dog|fish |
. | Find just one instance of any character |
^ | Finds a match as the beginning of a string as in: ^Hello |
$ | Finds a match at the end of the string as in: World$ |
\d | Find a digit |
\s | Find a whitespace character |
\b | Find a match at the beginning of a word like this: \bWORD, or at the end of a word like this: WORD\b |
\uxxxx | Find the Unicode character specified by the hexadecimal number xxxx |
Quantifiers
Quantifiers define quantities:
Quantifier | Description |
---|---|
n+ | Matches any string that contains at least one n |
n* | Matches any string that contains zero or more occurrences of n |
n? | Matches any string that contains zero or one occurrences of n |
n | Matches any string that contains a sequence of X n‘s |
n | Matches any string that contains a sequence of X to Y n‘s |
n | Matches any string that contains a sequence of at least X n‘s |
Note: If your expression needs to search for one of the special characters you can use a backslash ( \ ) to escape them. For example, to search for one or more question marks you can use the following expression: $pattern = ‘/\?+/’;
Grouping
You can use parentheses ( ) to apply quantifiers to entire patterns. They also can be used to select parts of the pattern to be used as a match.
Example
Use grouping to search for the word «banana» by looking for ba followed by two instances of na:
Complete RegExp Reference
The reference contains descriptions and examples of all Regular Expression functions.
regex to match any string whether Unicode or not?
If your PCRE is compiled with unicode support, you can just match against the letter space from the unicode standard.
Please note the u-modifier, that enables unicode matching.
$string = ""; $pattern = '/"; $pattern = '/(.*)/u'; if (preg_match_all($pattern, $string, $matches, PREG_SET_ORDER)) < print($matches[0][1]."\n"); >else < echo 'No matches.'; >?>
rasjani@laptop:~$ php unitest.php نص عربى English text rasjani@laptop:~$
it works well for me if the title was in page with UTF8 ENCODED ONLY !! but if the page was encoded in windows-1256 doesn’t work
@D3VELOPER: correct. To do what you want with PCRE you need to normalize the encoding before you apply the regex. You can use iconv to convert from windows-1256 to UTF-8.
The (. ) will only match something which is exactly 6 characters long, and it will only match ‘?’. To match ‘any’ character, use ‘.’ and to match repeating number of them use ‘.*’
Matching HTML tags like that is not easy in regex, so you should probably use a HTML parser for that instead.
As an aproximation you could do something like /([^<]*)<\/title>/ Which will almost work, as long as your text does not contain a ‘
This question is in a collective: a subcommunity defined by tags with relevant content and experts.
empty regex matches any string
I was trying to find a regex that matches any string! and after some search I found almost all the answers says that [\s\S] will match any string as said here or .* as said here But while playing a bit with PHP preg_match I found that an empty regex is matching any string!
if(preg_match("//u", "")) echo "empty string matchs\n"; else echo "empty string does not match\n"; if(preg_match("//u", "abc")) echo "abc matchs\n"; else echo "abc does not match\n"; if(preg_match("//u", "\n")) echo "new line matchs\n"; else echo "new line does not match\n"; if(preg_match("//u", "/")) echo "/ matchs\n"; else echo "/ does not match\n"; exit;
empty string matchs abc matchs new line matchs / matchs
live demo (https://eval.in/845001) Can I use this empty regex safely to match anything ? and what does an empty regex mean ? If you are asking why would I need a regex that matches anything, that is because I’m using a function that requires a regex parameter as part of it’s string validation functionality and I want it to accept anything.
2 Answers 2
An empty regex pattern // matches at start, end and any position between characters in a string. See this demo at eval.in preg_match_all(‘//’, «foo», $out); which returns 4 empty matches:
As preg_match would just check for the first match it should be fine to use the empty pattern. However generally I’d probably prefer /^/ which matches start of the string that every string has.
[\s\S] (shorts for whitespaces together with non-whitespaces in a character class) means just any character and is usually used line-break related to also match newlines where there is no flag available for making the dot match linebreaks. Often used with JS regex which does not support s flag. Similar are [\D\d] (digits and non-digits), [\w\W] (word characters and non word characters). Also possible with JS regex is [^] a negated empty character class for «not nothing».To use /[\s\S]/ or one of the others without quantifier will require at least one character.
Further to mention that in your patterns you use the u flag for unicode regex. There is probably no reason to use this flag together with an empty pattern or just checking for start of the string. Interesting with pcre unicode regex might be the following escape sequences.
- \X matches an unicode grapheme. Similar the dot in u -mode (but not newlines).
- \C matches one data unit (similar using the dot without u flag on unicode input).
Well, I don’t really see why one would need a pattern to match any string, but wrote for interest 🙂