- Saved searches
- Use saved searches to filter your results more quickly
- License
- teamtnt/tntsearch
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- Поиск файлов в PHP
- Поиск в директории
- Список всех файлов и директорий
- Результат:
- Только файлы
- Результат:
- Только директории
- Результат:
- Поиск по расширению
- Результат:
- Поиск по нескольким расширениям
- Результат:
- Поиск по имени файла
- Результат:
- Результат:
- Поиск в дереве
- Список всех файлов
- Результат:
- Список всех директорий
- Результат:
- Поиск по имени/расширению
- Результат:
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
A fully featured full text search engine written in PHP
License
teamtnt/tntsearch
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
TNTSearch is a full-text search (FTS) engine written entirely in PHP. A simple configuration allows you to add an amazing search experience in just minutes. Features include:
- Fuzzy search
- Search as you type
- Geo-search
- Text classification
- Stemming
- Custom tokenizers
- Bm25 ranking algorithm
- Boolean search
- Result highlighting
- Dynamic index updates (no need to reindex each time)
- Easily deployable via Packagist.org
We also created some demo pages that show tolerant retrieval with n-grams in action. The package has a bunch of helper functions like Jaro-Winkler and Cosine similarity for distance calculations. It supports stemming for English, Croatian, Arabic, Italian, Russian, Portuguese and Ukrainian. If the built-in stemmers aren’t enough, the engine lets you easily plugin any compatible snowball stemmer. Some forks of the package even support Chinese. And please contribute other languages!
Unlike many other engines, the index can be easily updated without doing a reindex or using deltas.
View online demo | Follow us on Twitter, or Facebook | Visit our sponsors:
If you’re using TNT Search and finding it useful, take a look at our premium analytics tool:
Support us on Open Collective
The easiest way to install TNTSearch is via composer:
composer require teamtnt/tntsearch
Before you proceed, make sure your server meets the following requirements:
- PHP >= 7.1
- PDO PHP Extension
- SQLite PHP Extension
- mbstring PHP Extension
In order to be able to make full text search queries, you have to create an index.
use TeamTNT\TNTSearch\TNTSearch; $tnt = new TNTSearch; $tnt->loadConfig([ 'driver' => 'mysql', 'host' => 'localhost', 'database' => 'dbname', 'username' => 'user', 'password' => 'pass', 'storage' => '/var/www/tntsearch/examples/', 'stemmer' => \TeamTNT\TNTSearch\Stemmer\PorterStemmer::class//optional ]); $indexer = $tnt->createIndex('name.index'); $indexer->query('SELECT id, article FROM articles;'); //$indexer->setLanguage('german'); $indexer->run();
Important: «storage» settings marks the folder where all of your indexes will be saved so make sure to have permission to write to this folder otherwise you might expect the following exception thrown:
Note: If your primary key is different than id set it like:
$indexer->setPrimaryKey('article_id');
Making the primary key searchable
By default, the primary key isn’t searchable. If you want to make it searchable, simply run:
Searching for a phrase or keyword is trivial:
use TeamTNT\TNTSearch\TNTSearch; $tnt = new TNTSearch; $tnt->loadConfig($config); $tnt->selectIndex("name.index"); $res = $tnt->search("This is a test search", 12); print_r($res); //returns an array of 12 document ids that best match your query // to display the results you need an additional query against your application database // SELECT * FROM articles WHERE id IN $res ORDER BY FIELD(id, $res);
The ORDER BY FIELD clause is important, otherwise the database engine will not return the results in the required order.
use TeamTNT\TNTSearch\TNTSearch; $tnt = new TNTSearch; $tnt->loadConfig($config); $tnt->selectIndex("name.index"); //this will return all documents that have romeo in it but not juliet $res = $tnt->searchBoolean("romeo -juliet"); //returns all documents that have romeo or hamlet in it $res = $tnt->searchBoolean("romeo or hamlet"); //returns all documents that have either romeo AND juliet or prince AND hamlet $res = $tnt->searchBoolean("(romeo juliet) or (prince hamlet)");
The fuzziness can be tweaked by setting the following member variables:
public $fuzzy_prefix_length = 2; public $fuzzy_max_expansions = 50; public $fuzzy_distance = 2; //represents the Levenshtein distance;
use TeamTNT\TNTSearch\TNTSearch; $tnt = new TNTSearch; $tnt->loadConfig($config); $tnt->selectIndex("name.index"); $tnt->fuzziness = true; //when the fuzziness flag is set to true, the keyword juleit will return //documents that match the word juliet, the default Levenshtein distance is 2 $res = $tnt->search("juleit");
Once you created an index, you don’t need to reindex it each time you make some changes to your document collection. TNTSearch supports dynamic index updates.
use TeamTNT\TNTSearch\TNTSearch; $tnt = new TNTSearch; $tnt->loadConfig($config); $tnt->selectIndex("name.index"); $index = $tnt->getIndex(); //to insert a new document to the index $index->insert(['id' => '11', 'title' => 'new title', 'article' => 'new article']); //to update an existing document $index->update(11, ['id' => '11', 'title' => 'updated title', 'article' => 'updated article']); //to delete the document from index $index->delete(12);
First, create your own Tokenizer class. It should extend AbstractTokenizer class, define word split $pattern value and must implement TokenizerInterface:
use TeamTNT\TNTSearch\Support\AbstractTokenizer; use TeamTNT\TNTSearch\Support\TokenizerInterface; class SomeTokenizer extends AbstractTokenizer implements TokenizerInterface < static protected $pattern = '/[\s,\.]+/'; public function tokenize($text) < return preg_split($this->getPattern(), strtolower($text), -1, PREG_SPLIT_NO_EMPTY); > >
This tokenizer will split words using spaces, commas and periods.
After you have the tokenizer ready, you should pass it to TNTIndexer via setTokenizer method.
$someTokenizer = new SomeTokenizer; $indexer = new TNTIndexer; $indexer->setTokenizer($someTokenizer);
Another way would be to pass the tokenizer via config:
use TeamTNT\TNTSearch\TNTSearch; $tnt = new TNTSearch; $tnt->loadConfig([ 'driver' => 'mysql', 'host' => 'localhost', 'database' => 'dbname', 'username' => 'user', 'password' => 'pass', 'storage' => '/var/www/tntsearch/examples/', 'stemmer' => \TeamTNT\TNTSearch\Stemmer\PorterStemmer::class//optional, 'tokenizer' => \TeamTNT\TNTSearch\Support\SomeTokenizer::class ]); $indexer = $tnt->createIndex('name.index'); $indexer->query('SELECT id, article FROM articles;'); $indexer->run();
$candyShopIndexer = new TNTGeoIndexer; $candyShopIndexer->loadConfig($config); $candyShopIndexer->createIndex('candyShops.index'); $candyShopIndexer->query('SELECT id, longitude, latitude FROM candy_shops;'); $candyShopIndexer->run();
$currentLocation = [ 'longitude' => 11.576124, 'latitude' => 48.137154 ]; $distance = 2; //km $candyShopIndex = new TNTGeoSearch(); $candyShopIndex->loadConfig($config); $candyShopIndex->selectIndex('candyShops.index'); $candyShops = $candyShopIndex->findNearest($currentLocation, $distance, 10);
use TeamTNT\TNTSearch\Classifier\TNTClassifier; $classifier = new TNTClassifier(); $classifier->learn("A great game", "Sports"); $classifier->learn("The election was over", "Not sports"); $classifier->learn("Very clean match", "Sports"); $classifier->learn("A clean but forgettable game", "Sports"); $guess = $classifier->predict("It was a close election"); var_dump($guess['label']); //returns "Not sports"
$classifier = new TNTClassifier(); $classifier->load('sports.cls');
You’re free to use this package, but if it makes it to your production environment, we would highly appreciate you sending us a PS4 game of your choice. This way you support us to further develop and add new features.
Our address is: TNT Studio, Sv. Mateja 19, 10010 Zagreb, Croatia.
We’ll publish all received games here
Support us with a monthly donation and help us continue our activities. [Become a backer]
Become a sponsor and get your logo on our README on Github with a link to your site. [Become a sponsor]
The MIT License (MIT). Please see License File for more information.
From Croatia with ♥ by TNT Studio (@tntstudiohr, blog)
About
A fully featured full text search engine written in PHP
Поиск файлов в PHP
Для поиска файлов на сервере хорошо подходит функция glob(), которая возвращает список файлов по заданной маске, например:
В маске можно использовать следующие специальные символы:
* | Соответствует нулю или большему количеству любых символов. |
? | Один любой символ. |
[. ] | Один символ входящий в группу. |
[. ] | Один символ не входящий в группу. |
Вхождение подстрок, работает с флагом GLOB_BRACE . | |
\ | Экранирует следующий символ, кроме случаев, когда используется флаг GLOB_NOESCAPE . |
GLOB_MARK | Добавляет слеш к каждой возвращаемой директории. |
GLOB_NOSORT | Возвращает файлы в том виде, в котором они содержатся в директории (без сортировки). Если этот флаг не указан, то имена сортируются по алфавиту. |
GLOB_NOCHECK | Возвращает шаблон поиска, если с его помощью не был найден ни один файл. |
GLOB_NOESCAPE | Обратные слеши не экранируют метасимволы. |
GLOB_BRACE | Раскрывает для совпадения с « a », « b » или « c ». |
GLOB_ONLYDIR | Возвращает только директории, совпадающие с шаблоном. |
GLOB_ERR | Останавливается при ошибках чтения (например, директории без права чтения), по умолчанию ошибки игнорируются. |
Возможно использовать несколько флагов:
$files = glob('/tmp/*.jpg', GLOB_NOSORT|GLOB_ERR);
Далее во всех примерах используется папка tmp со следующим содержимым:
Поиск в директории
Список всех файлов и директорий
$dir = __DIR__ . '/tmp'; $files = array(); foreach(glob($dir . '/*') as $file) < $files[] = basename($file); >print_r($files);
Результат:
Array ( [0] => 1.svg [1] => 2.jpg [2] => 22-f.gif [3] => 22.svg [4] => img.png [5] => path [6] => prod.png [7] => style-1.txt [8] => style-2.css )
Только файлы
$dir = __DIR__ . '/tmp'; $files = array(); foreach(glob($dir . '/*') as $file) < if (is_file($file)) < $files[] = basename($file); >> print_r($files);
Результат:
Array ( [0] => 1.svg [1] => 2.jpg [2] => 22-f.gif [3] => 22.svg [4] => img.png [5] => prod.png [6] => style-1.txt [7] => style-2.css )
Только директории
$dir = __DIR__ . '/tmp'; $files = array(); foreach(glob($dir . '/*') as $file) < if (is_dir($file)) < $files[] = basename($file); >> print_r($files);
Результат:
Поиск по расширению
$dir = __DIR__ . '/tmp'; $files = array(); foreach(glob($dir . '/*.svg') as $file) < $files[] = basename($file); >print_r($files);
Результат:
Поиск по нескольким расширениям
$dir = __DIR__ . '/tmp'; $files = array(); foreach(glob($dir . '/*.', GLOB_BRACE) as $file) < $files[] = basename($file); >print_r($files);
Результат:
Array ( [0] => 2.jpg [1] => img.png [2] => prod.png )
Поиск по имени файла
$dir = __DIR__ . '/tmp'; $files = array(); foreach(glob($dir . '/style*.*') as $file) < $files[] = basename($file); >print_r($files);
Результат:
Array ( [0] => style-1.txt [1] => style-2.css )
$dir = __DIR__ . '/tmp'; $files = array(); foreach(glob($dir . '/6*.*', GLOB_BRACE) as $obj) < $files[] = basename($obj); >print_r($files);
Результат:
Array ( [0] => 1.svg [1] => 2.jpg [2] => 22-f.gif [3] => 22.svg )
Поиск в дереве
Список всех файлов
function glob_tree_files($path, $_base_path = null) < if (is_null($_base_path)) < $_base_path = ''; >else < $_base_path .= basename($path) . '/'; >$out = array(); foreach(glob($path . '/*') as $file) < if (is_dir($file)) < $out = array_merge($out, glob_tree_files($file, $_base_path)); >else < $out[] = $_base_path . basename($file); >> return $out; > $dir = __DIR__ . '/tmp'; $files = glob_tree_files($dir); print_r($files);
Результат:
Array ( [0] => 1.svg [1] => 2.jpg [2] => 22-f.gif [3] => 22.svg [4] => img.png [5] => path/icon-rew.png [6] => path/marker.png [7] => path/psd/1.psd [8] => path/psd/2.psd [9] => path/psd/index.psd [10] => path/sh-1.png [11] => path/title-1.png [12] => prod.png [13] => style-1.txt [14] => style-2.css )
Список всех директорий
function glob_tree_dirs($path, $_base_path = null) < if (is_null($_base_path)) < $_base_path = ''; >else < $_base_path .= basename($path) . '/'; >$out = array(); foreach(glob($path . '/*', GLOB_ONLYDIR) as $file) < if (is_dir($file)) < $out[] = $_base_path . basename($file); $out = array_merge($out, glob_tree_dirs($file, $_base_path)); >> return $out; > $dir = __DIR__ . '/tmp'; $files = glob_tree_dirs($dir); print_r($files);
Результат:
Array ( [0] => path [1] => path/psd )
Поиск по имени/расширению
function glob_tree_search($path, $pattern, $_base_path = null) < if (is_null($_base_path)) < $_base_path = ''; >else < $_base_path .= basename($path) . '/'; >$out = array(); foreach(glob($path . '/' . $pattern, GLOB_BRACE) as $file) < $out[] = $_base_path . basename($file); >foreach(glob($path . '/*', GLOB_ONLYDIR) as $file) < $out = array_merge($out, glob_tree_search($file, $pattern, $_base_path)); >return $out; > $path = __DIR__ . '/tmp'; $files = glob_tree_search($path, '*.'); print_r($files);
Результат:
Array ( [0] => 2.jpg [1] => img.png [2] => prod.png [3] => path/icon-rew.png [4] => path/marker.png [5] => path/sh-1.png [6] => path/title-1.png )
Чтобы в результирующих списках выводились полные пути к файлам, достаточно удалить функцию basename() .