Регулярные выражения ссылки html

The Perfect URL Regular Expression

Simply copy and paste the URL regex below for the language of your choice.

PHP (use with preg_match)

%^(?:(?:https?|ftp)://)(?:\S+(. \S*)?@|\d(?:\.\d)|(?:(?:[a-z\d\x-\x]+-?)*[a-z\d\x-\x]+)(?:\.(?:[a-z\d\x-\x]+-?)*[a-z\d\x-\x]+)*(?:\.[a-z\x-\x]))(. \d+)?(?:[^\s]*)?$%iu

PHP (with validate filter)

if (filter_var($url, FILTER_VALIDATE_URL) !== false).

Python

Javascript

HTML5

Below is the regex used in type=”url” from RFC3986:

Perl

^(((ht|f)tp(s?))\://)?(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|us|ca|uk)(\:1+)*(/($|[a-zA-Z0-9\.\,\;\?\'\\\+&%\$#\=~_\-]+))*$

Ruby

/\A(?:(?:https?|ftp):\/\/)(?:\S+(. \S*)?@)?(?:(?!10(?:\.\d))(?!127(?:\.\d))(?!169\.254(?:\.\d))(?!192\.168(?:\.\d))(?!172\.(?:19|2\d|31)(?:\.\d))(?:4\d?|1\d\d|2[01]\d|223)(?:\.(?:1?\d|22\d|252))(?:\.(?:6\d?|1\d\d|21\d|253))|(?:(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff])))(. \d)?(?:\/[^\s]*)?\z/i

Go (use the govalidator IsURL())

package main import ( "fmt" "github.com/asaskevich/govalidator" ) func main() < str := "https://www.urlregex.com" validURL := govalidator.IsURL(str) fmt.Printf("%s is a valid URL : %v \n", str, validURL) >

Objective-C

Swift

func canOpenURL(string: String?) -> Bool < let regEx = "((https|http)://)((\\w|-)+)(([.]|[/])((\\w|-)+))+" let predicate = NSPredicate(format:"SELF MATCHES %@", argumentArray:[regEx]) return predicate.evaluateWithObject(string) >

if canOpenURL(«https://www.urlregex.com») < print("valid url.") >else

Читайте также:  Html input all events

Swift (use canOpenURL)

UIApplication.sharedApplication().canOpenURL(urlString)

Java

^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|. ;]*[-a-zA-Z0-9+&@#/%=~_|]

VB.NET

C#

^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?$

MySQL

SELECT field FROM table WHERE field REGEXP «^(https?://|www\\.)[\.A-Za-z0-9\-]+\\.[a-zA-Z]

Bonus: What does the following regex do?

Regex Cheat Sheet

Modifiers:

Brackets:

[abc] Match a single character a, b, or c
[^abc] Match any character except a, b, or c
[A-z] Match any character from uppercase A to lowercase z
(ab|cd|ef) Match either ab, cd, or ef
(…) Capture anything enclosed

Metacharacters

^ Start of line
$ End of line
. Match any character
\w Match a word chracter
\W Match a non-word character
\d Match a digit
\D Match any non-digit character
\s Match a whitespace character
\S Match a non-whitespace character
\b Match character at the beginning or end of a word
\B Match a character not at beginning or end of a word
\0 Match a NUL character
\t Match a tab character
\xxx Match a character specified by octal number xxx
\xdd Match a character specified by hexadecimal number dd
\uxxxx Match a Unicode character specified by hexadecimal number xxxx

Quantifiers

n+ Match at least one n
n* Match zero or more n’s
n? Match zero or one n
n

Match sequence of X n’s
n Match sequence of X to Y n’s
n Match sequence of X or more n’s

Источник

URL RegEx Pattern – How to Write a Regular Expression for a URL

Kolade Chris

Kolade Chris

URL RegEx Pattern – How to Write a Regular Expression for a URL

Regular expressions provide a powerful and flexible way to define patterns and match specific strings, be it usernames, passwords, phone numbers, or even URLs.

In this article, I’ll show you the fundamentals of crafting a regular expression for URLs. Whether you need to validate user input, extract components from URLs, or perform any other URL-related tasks, understanding how to construct a regex for URLs can greatly enhance URL validation in your applications.

First, let me show you what a URL is.

What We’ll Cover

What is a URL?

A URL, short for Uniform Resource Locator, is a string that identifies the location of a resource on the web. It typically consists of various components, including:

  • the protocol – for instance, HTTP or HTTPS
  • domain name – for example, freecodecamp.org
  • subdomain – for example, Chinese.freecodecamp.org
  • port number – 3000, 5000, 4000, and more
  • path – for example, freecodecamp.org/news
  • query parameters – for example, https://example.com/search?q=apple&category=fruits

How to Write a Regular Expression for a URL

A URL can be a base URL (without a subdomain, path, or query param). It can also contain a subdomain, path, or other components. Due to this, you have to tailor your regular expression to the way you’re expecting the URL.

If the users are typing in a base URL, you have to tailor your regex fir that, and if you’re expecting a URL that has a subdomain or path, you have to tailor your regex that way. If you like, you can also write a complex regex that can accept a URL in any form it can come. It is not impossible.

Here’s a regex pattern that matches a base URL of any domain extension:

(https:\/\/www\.|http:\/\/www\.|https:\/\/|http:\/\/)?[a-zA-Z0-9](\.[a-zA-Z0-9])(\.[a-zA-Z0-9])? 

This would match domains like https://www.freecodecamp.org , http://www.freecodecamp.org/ , freeCodeCamp.org , google.co.uk , facebook.net , google.com.ng , google.com.in , and many other base URLs.

The pattern below matches any URL with a path:

(https:\/\/www\.|http:\/\/www\.|https:\/\/|http:\/\/)?[a-zA-Z0-9](\.[a-zA-Z0-9])(\.[a-zA-Z0-9])?\/[a-zA-Z0-9]

This include URLs like https://www.freecodecamp.org/news ,
http://www.freecodecamp.org/ukrainian , and others

If you want to match a URL with a subdomain, the pattern below can do it for you:

(https:\/\/www\.|http:\/\/www\.|https:\/\/|http:\/\/)?[a-zA-Z0-9]\.[a-zA-Z0-9]\.[a-zA-Z0-9](\.[a-zA-Z0-9])? 

This would match subdomains like https://www.chinese.freecodecamp.org ,
chinese.freecodecamp.org , https://chinese.freecodecamp.org , and others.

If you want a regex that matches any URL that is base, has a subdomain, or a path, you can combine all the patterns I’ve shown you like this:

(https:\/\/www\.|http:\/\/www\.|https:\/\/|http:\/\/)?[a-zA-Z](\.[a-zA-Z])(\.[a-zA-Z])?\/[a-zA-Z0-9]|((https:\/\/www\.|http:\/\/www\.|https:\/\/|http:\/\/)?[a-zA-Z](\.[a-zA-Z])(\.[a-zA-Z])?)|(https:\/\/www\.|http:\/\/www\.|https:\/\/|http:\/\/)?[a-zA-Z0-9]\.[a-zA-Z0-9]\.[a-zA-Z0-9](\.[a-zA-Z0-9])? 

ss1-5

Not the prettiest way to do things, but it works:

Testing the RegEx with JavaScript

On testing the regex using the test() method of JavaScript RegEx, I got true :

const pattern = /(https:\/\/www\.|http:\/\/www\.|https:\/\/|http:\/\/)?[a-zA-Z](\.[a-zA-Z])(\.[a-zA-Z])?\/[a-zA-Z0-9]|((https:\/\/www\.|http:\/\/www\.|https:\/\/|http:\/\/)?[a-zA-Z](\.[a-zA-Z])(\.[a-zA-Z])?)|(https:\/\/www\.|http:\/\/www\.|https:\/\/|http:\/\/)?[a-zA-Z0-9]\.[a-zA-Z0-9]\.[a-zA-Z0-9](\.[a-zA-Z0-9])?/g; const urls = `https://www.freecodecamp.org http://www.freecodecamp.org google.co.uk facebook.net google.com.ng google.com.in freecodecamp.org yoruba.freecodecamp.org freecodecamp.org/yoruba http://www.freecodecamp.org/news freecodecamp.org/news chinese.freecodecamp.org https://chinese.freecodecamp.org`; console.log(pattern.test(urls)); //true; 

ss2-2

Conclusion

The regular expression patterns for matching a URL depend on your specific need – since URLs can be in various forms. So, while writing the patterns for the URL, you have to write them to suit the way you expect the URL.

Writing a regex that matches all kinds of URLs works, but it’s not the best way to because it’s very hard to read and debug.

Kolade Chris

Kolade Chris

Web developer and technical writer focusing on frontend technologies. I also dabble in a lot of other technologies.

If you read this far, tweet to the author to show them you care. Tweet a thanks

Learn to code for free. freeCodeCamp’s open source curriculum has helped more than 40,000 people get jobs as developers. Get started

freeCodeCamp is a donor-supported tax-exempt 501(c)(3) charity organization (United States Federal Tax Identification Number: 82-0779546)

Our mission: to help people learn to code for free. We accomplish this by creating thousands of videos, articles, and interactive coding lessons — all freely available to the public. We also have thousands of freeCodeCamp study groups around the world.

Donations to freeCodeCamp go toward our education initiatives, and help pay for servers, services, and staff.

Источник

The Perfect URL Regular Expression

Simply copy and paste the URL regex below for the language of your choice.

PHP (use with preg_match)

%^(?:(?:https?|ftp)://)(?:\S+(. \S*)?@|\d(?:\.\d)|(?:(?:[a-z\d\x-\x]+-?)*[a-z\d\x-\x]+)(?:\.(?:[a-z\d\x-\x]+-?)*[a-z\d\x-\x]+)*(?:\.[a-z\x-\x]))(. \d+)?(?:[^\s]*)?$%iu

PHP (with validate filter)

if (filter_var($url, FILTER_VALIDATE_URL) !== false).

Python

Javascript

HTML5

Below is the regex used in type=”url” from RFC3986:

Perl

^(((ht|f)tp(s?))\://)?(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|us|ca|uk)(\:9+)*(/($|[a-zA-Z0-9\.\,\;\?\'\\\+&%\$#\=~_\-]+))*$

Ruby

/\A(?:(?:https?|ftp):\/\/)(?:\S+(. \S*)?@)?(?:(?!10(?:\.\d))(?!127(?:\.\d))(?!169\.254(?:\.\d))(?!192\.168(?:\.\d))(?!172\.(?:16|2\d|31)(?:\.\d))(?:4\d?|1\d\d|2[01]\d|221)(?:\.(?:1?\d|21\d|251))(?:\.(?:8\d?|1\d\d|24\d|253))|(?:(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff])))(. \d)?(?:\/[^\s]*)?\z/i

Go (use the govalidator IsURL())

package main import ( "fmt" "github.com/asaskevich/govalidator" ) func main() < str := "https://www.urlregex.com" validURL := govalidator.IsURL(str) fmt.Printf("%s is a valid URL : %v \n", str, validURL) >

Objective-C

Swift

func canOpenURL(string: String?) -> Bool < let regEx = "((https|http)://)((\\w|-)+)(([.]|[/])((\\w|-)+))+" let predicate = NSPredicate(format:"SELF MATCHES %@", argumentArray:[regEx]) return predicate.evaluateWithObject(string) >

if canOpenURL(«https://www.urlregex.com») < print("valid url.") >else

Swift (use canOpenURL)

UIApplication.sharedApplication().canOpenURL(urlString)

Java

^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|. ;]*[-a-zA-Z0-9+&@#/%=~_|]

VB.NET

C#

^(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\'\/\\\+&%\$#_]*)?$

MySQL

SELECT field FROM table WHERE field REGEXP «^(https?://|www\\.)[\.A-Za-z0-9\-]+\\.[a-zA-Z]

Bonus: What does the following regex do?

Regex Cheat Sheet

Modifiers:

Brackets:

[abc] Match a single character a, b, or c
[^abc] Match any character except a, b, or c
[A-z] Match any character from uppercase A to lowercase z
(ab|cd|ef) Match either ab, cd, or ef
(…) Capture anything enclosed

Metacharacters

^ Start of line
$ End of line
. Match any character
\w Match a word chracter
\W Match a non-word character
\d Match a digit
\D Match any non-digit character
\s Match a whitespace character
\S Match a non-whitespace character
\b Match character at the beginning or end of a word
\B Match a character not at beginning or end of a word
\0 Match a NUL character
\t Match a tab character
\xxx Match a character specified by octal number xxx
\xdd Match a character specified by hexadecimal number dd
\uxxxx Match a Unicode character specified by hexadecimal number xxxx

Quantifiers

n+ Match at least one n
n* Match zero or more n’s
n? Match zero or one n
n

Match sequence of X n’s
n Match sequence of X to Y n’s
n Match sequence of X or more n’s

Источник

JavaScript | Регулярное выражение для извлечения HTML-ссылок из строки

В этой публикации мы рассматриваем HTML-элементы , которые имеют парные тэги:

Мы будем считать, что сами теги и последовательность символов внутри них считается, а также между ними считается HTML-ссылкой.

Что мы считаем адресом в HTML-ссылке из строки?

Из HTML-элемента нас будет интересовать его атрибут href, который после знака равенства должен содержать двойные кавычки.

Последовательность символов между первой двойной кавычкой и второй двойной кавычкой мы будем считать адресом.

Нюансы в решении задачи на извлечение HTML-ссылок

HTML — это гибкий язык. Это значит, что строковое оформление HTML-элементов допускает существование множественного числа пробелов. Например:

Также допускаются символы переносов строк:

Эти моменты накладывают дополнительные условия в создании регулярного выражения на JavaScript.

В общем между всеми нашими символами МЕНЬШЕ-БОЛЬШЕ могут встречаться только пробелы, которые не чувствительны к парсингу HTML.

Получение всех HTML-ссылок из строки

Мы подстрахуемся под все возможные ситуации, даже если сервер отдаёт совершенно некорректную ссылку с множественными пробелами:

Массив с индексами нахождения начала сопоставления:

Только последовательность с HTML-элементом

Получение всех значений HREF из HTML-ссылок из строки

Источник

Оцените статью