- PHP Tutorial: MySQL Database Indexing principle
- A simple but powerful search algorithm in PHP, MySQL
- Requirement :
- Creating the Database :
- Now Let’s start the indexing. What’s the big deal with indexing anyway? We can just compare and find the matching strings of a query right?
- TO ERR IS HYUMANN
- Here we will use a powerful function in PHP called metaphone(). It returns a value that corresponds to the sound of that given word
- 1. First, establish the connection
- 2. Fetch the article_title, article_content from the database
- 3. For each row of article_title and article_content, split the content into single words using the function explode() and append the metaphone($word) with a space to the variable $sound. Update the corresponding column ‘indexing’
- Now we have successfully indexed our title and content and stored them in the ‘indexing’ column. Now we will implement the search logic in the file ‘sample_search.php’
- 1. Once the query is submitted, we convert the search query into a metaphone string. Then search for that string in the indexing column using the “LIKE” query in SQL
- Explanation:
- 2. Now we display the results if the number of results > zero
- Sample Output for search query “data”
- 3. If the number of results is zero or no results found
- Resources:
PHP Tutorial: MySQL Database Indexing principle
PHP tutorial for several days did not bring you a knowledge of PHP, today make up Ah! This article is about: MySQL database indexing principle, hope to bring help to everyone!
The first part mainly discusses the mathematical basis of MySQL database index from the data structure and algorithm theory level.
The second part discusses the topics such as clustered index, nonclustered index, and overlay index, which is based on the schema of the index in INNODB data storage engine in MySQL database.
The third part discusses the strategy of high performance using indexes in MySQL.
I. Data structure and algorithm theory
INNODB Storage Engine Implementation index data structure is B + Tree, the following describes several data structures, step by step explain why to use B + Tree
The B + Tree index is constructed similar to a binary tree and quickly finds data based on key values. But the B + tree does not represent two forks, but rather represents balance. Note: The B + Tree index can find only the page where the data row is located. The database then reads the page into memory, then finds it in memory, and then the data is finally found.
The following is a binary lookup method: Arranging records in order (increment or decrement), in the search process using a jumping way to find, for example: 5, 10, 19, 21, 31, 37, 42, 48, 50, 52 This 10 number,:
You can find 48 with three times of search speed. If it is a sequential lookup, it will take 8 times. For the above 10 numbers, the average number of lookups for sequential lookups is 5.5 times, and the binary lookup method is 2.9 times, in the worst case, the number of sequential lookups is 10, and the number of binary lookups is 4. Binary find slots in page directory in InnoDB are stored in the order of the primary key, and the query for each specific record is searched by two points in page directory.
The number represents the key value for each node, and the key value of the Zuozi is always less than the key value followed by the binary lookup tree, and the key value of the right subtree is always greater than the key value followed. The key values are obtained through the middle order traversal: 2, 3, 5, 6, 7, 8.
The binary lookup tree has an average number of lookups of 2.3 times. But binary search trees can be arbitrarily constructed, such as constructs
But this is similar to the order lookup, so it refers to the idea of a balanced binary tree, the AVL tree.
Definition: Conform to the definition of binary search tree, and then must meet the height of the left and right two subtrees of any node with a maximum difference of 1.
Balanced binary Tree Although the search speed is very fast but the cost of maintaining a balanced binary tree is very large, it usually takes 1 or more left and right spins to get the balance of the tree after insertion or renewal.
All records are in the leaf node, and are sequentially stored, each leaf node (page units) is a logical continuous storage, is a two-way circular linked list.
B + Tree Inserts you must ensure that the records in the post-insertion leaf node are still sorted, so there are three things to consider when inserting:
B + Tree Index in the database has a feature is its high fan out, so in the database, B + Tree height is generally 2-3 layers, that is, to find a key value of the row records, up to 2-3 io, and the general disk can do at least 100 io per second, 2-3 times means the query time is only 0.02-0.03 seconds.
Two, clustered index, non-clustered index
The difference between a clustered index and a nonclustered index is whether the page node holds a whole row of records
The InnoDB Storage Engine table is an indexed organization table in which the data in the table is stored in the primary key order. The clustered index constructs a B + tree according to the primary key of each table, and the leaf node holds the row record data for the entire table, so that the leaf node of the clustered index becomes the data page. This attribute of the clustered index determines that the data in the indexed organization table is also part of the index. As well as the B + tree data structure, each data page is linked through a doubly linked list.
The actual data can only be sorted by a B + tree, so each table can have only one clustered index. In many cases, the query optimizer is very inclined to take a clustered index because the clustered index allows us to find the data directly on the leaf nodes of the index. In addition, because the logical order of the data is defined, the clustered index can be quickly accessed against the range-worthy query. The query optimizer can quickly discover that a range of data needs to be scanned. Note that the records in each page are also maintained by a doubly linked list.
Also called a secondary index, the page level does not contain all the data for the row. In addition to the key values, the page node contains a bookmark in the index at each page level that tells the InnoDB storage engine where to find the row data that corresponds to the index. Because the InnoDB Storage engine table is an indexed organization table, the secondary index bookmark for the InnoDB storage engine is the clustered index key for the corresponding row data. Is the relationship between a clustered index and a secondary index:
When looking for data through a secondary index, the InnoDB storage engine traverses the secondary index and obtains a primary key to the primary key index through a leaf-level pointer, and then finds a complete row record through the primary key index. For example: To find data in a secondary index tree with a height of 3, you need to traverse the secondary index 3 times to find the specified primary key, and if the clustered index tree has the same height of 3, you also need to do three lookups on the clustered index to find the page where the full row data resides. Therefore, 6 logical IO is required to access the final data page.
PHP Tutorial: MySQL Database Indexing principle
This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.
A simple but powerful search algorithm in PHP, MySQL
Hello all, I’ve been working with PHP for a while and in one of my projects, I had the need to implement a search algorithm. After some learning, I have found some best ways to implement it.
Requirement :
Let’s assume we are building a Tech aggregator website and we store the article URLs, title, content of an article in the mysqli database. Now our task is to display the best results of the article URLs based on the search terms provided by the user.
Creating the Database :
First of all, we have to create the database to store the given values and a separate column called “indexing” to store the indexing values. When a user enters a search string we will search in this column. It is used for a fast and effective search. The database looks like this.
There are plenty of different methods for indexing. But you should choose the best and robust method. When a search query is received, we only look at the indexing table and return the results instead of searching the entire table. Now let’s create a very basic layout for the search form. Let’s name it “sample_search.php”.
Let’s ignore the UI and focus on the backend works. The form output looks like this
Now Let’s start the indexing. What’s the big deal with indexing anyway? We can just compare and find the matching strings of a query right?
Not quite. Let’s assume the user searches for the word “science”. We just search for the string “science” and return the content that matches the word “science”. Now, what if the user as a result of a typo, entered search query as “synce” or “Science” or “SCience” or “SCIEnce” or “SCIENS” or “Sience”. If we follow the same procedure as above, we are most likely to return an empty result.
TO ERR IS HYUMANN
And of course, most people know the correct spelling of science. But what if some mistake in spelling happened. There’s a clue to fix this problem. Without any word processing techniques, we can see that all the words “synce”, “Sience”, “Science”,etc., sound similar.
Here we will use a powerful function in PHP called metaphone(). It returns a value that corresponds to the sound of that given word
Now try running the following PHP code
The output of the above code looks like this
Abraca Tabra! We get the same sounds for the different spellings of science. Now if we do the indexing in such a way that we store the sounds of each of the words in the content and title. We can easily return the most reliable test results with the given search string. Here are the steps followed for indexing
1. First, establish the connection
2. Fetch the article_title, article_content from the database
3. For each row of article_title and article_content, split the content into single words using the function explode() and append the metaphone($word) with a space to the variable $sound. Update the corresponding column ‘indexing’
Now we have successfully indexed our title and content and stored them in the ‘indexing’ column. Now we will implement the search logic in the file ‘sample_search.php’
1. Once the query is submitted, we convert the search query into a metaphone string. Then search for that string in the indexing column using the “LIKE” query in SQL
But remember, while indexing, we have added the metaphone of the words in the contents individually. So if a string has two or more words, like “HELLO WORLD”, we won’t be able to find its corresponding metaphone in ‘indexing’ column. Hence we split the words and append the metaphone of the individual words to the search string.
Explanation:
metaphone(“Hello World”)= “HLWRLT”. But in our ‘indexing’ column it is saved as “HL WRLT” (with space as they are separate words). So we have to insert a space accordingly and then search for it.
2. Now we display the results if the number of results > zero
Sample Output for search query “data”
3. If the number of results is zero or no results found
In some cases, we don’t get any exact matches. But a part of the search query has matched. For e.g. Let the search string be “Magnificent Asia”. Now let’s assume this exact string is not found but there are separate articles with “Asia” and “Magnificent”. So we split the search query into search words and find all the articles that match the metaphone of search words.
If we still don’t find any words, then we display the message “No results found”
You can add ranking to each of the articles, by using the data like number of visits, number of shares etc., But that’s for another post. Hope you all found this helpful.