Построение рекомендательной системы python

Recommendation System in Python

There are a lot of applications where websites collect data from their users and use that data to predict the likes and dislikes of their users. This allows them to recommend the content that they like. Recommender systems are a way of suggesting or similar items and ideas to a user’s specific way of thinking.

Recommender System is different types:

Collaborative Filtering: Collaborative Filtering recommends items based on similarity measures between users and/or items. The basic assumption behind the algorithm is that users with similar interests have common preferences.
Content-Based Recommendation: It is supervised machine learning used to induce a classifier to discriminate between interesting and uninteresting items for the user.

Content-Based Recommendation System: Content-Based systems recommends items to the customer similar to previously high-rated items by the customer. It uses the features and properties of the item. From these properties, it can calculate the similarity between the items.

In a content-based recommendation system, first , we need to create a profile for each item, which represents the properties of those items. From the user profiles are inferred for a particular user. We use these user profiles to recommend the items to the users from the catalog.

Content-Based Recommendation System

Item profile:

In a content-based recommendation system, we need to build a profile for each item, which contains the important properties of each item. For Example, If the movie is an item, then its actors, director, release year , and genre are its important properties , and for the document , the important property is the type of content and set of important words in it.

Let’s have a look at how to create an item profile. First, we need to perform the TF-IDF vectorizer, here TF (term frequency) of a word is the number of times it appears in a document and The IDF (inverse document frequency) of a word is the measure of how significant that term is in the whole corpus. These can be calculated by the following formula:

TF_<ij data-lazy-src=

= \frac>>» width=»» height=»»/>

where f_ij is the frequency of term(feature) i in document(item) j.

IDF_<i data-lazy-src=

= log_e \frac» width=»» height=»»/>

where, n_i number of documents that mention term i. N is the total number of docs.

TF-IDF score (w_</p data-lazy-src=

) = TF_ * IDF_i» width=»» height=»»/>

Here, doc profile is the set of words with

User profile:

The user profile is a vector that describes the user preference. During the creation of the user’s profile, we use a utility matrix that describes the relationship between user and item. From this information, the best estimate we can decide which item the user likes, is some aggregation of the profiles of those items.

Advantages and Disadvantages:

Advantages:
- No need for data on other users when applying to similar users.
- Able to recommend to users with unique tastes.
- Able to recommend new & popular items
- Explanations for recommended items.
- Finding the appropriate feature is hard.
- Doesn’t recommend items outside the user profile.
Collaborative Filtering: Collaborative filtering is based on the idea that similar people (based on the data) generally tend to like similar things. I t predicts which item a user will like based on the item preferences of other similar users.

Collaborative filtering uses a user-item matrix to generate recommendations. This matrix contains the values that indicate a user’s preference towards a given item. These values can represent either explicit feedback (direct user ratings) or implicit feedback (indirect user behavior such as listening, purchasing, watching).
- Explicit Feedback: The amount of data that is collected from the users when they choose to do so. Many of the times, users choose not to provide data for the user. So, this data is scarce and sometimes costs money. For example, ratings from the user.
- Implicit Feedback: In implicit feedback, we track user behavior to predict their preference.
- Consider a user x, we need to find another user whose rating are similar to x’s rating, and then we estimate x’s rating based on another user.
- Let’s create a matrix representing different user and movies:
- Consider two users x, y with rating vectors r_x and r_y. We need to decide a similarity matrix to calculate similarity b/w sim(x,y). THere are many methods to calculate similarity such as: Jaccard similarity, cosine similarity and pearson similarity. Here, we use centered cosine similarity/ pearson similarity, where we normalize the rating by subtracting the mean:
- Here, we can calculate similarity: For ex: sim(A,B) = cos(r_A, r_B) = 0.09 ; sim(A,C) = -0.56. sim(A,B) > sim(A,C).
Rating Predictions
- Let r_x be the vector of user x’s rating. Let N be the set of k similar users who also rated item i. Then we can calculate the prediction of user x and item i by using following formula:
= \fracS_r_>S_> \, \, S_ = sim(x,y) » width=»» height=»»/>

Advantages and Disadvantages:
- Advantages:
  - No need for the domain knowledge because embedding are learned automatically.
  - Capture inherent subtle characteristics.
  - Cannot handle fresh items due to cold start problem.
  - Hard to add any new features that may improve quality of model
  Источник
  
  Создаем рекомендательную систему с помощью языка Python
  
  Коллаборативная фильтрация – это самый простой способ для построения рекомендаций или прогнозов в рекомендательных системах. Основное предположение заключается в том, что люди, дающие одинаковые оценки предметам или выбирающие одинаковые вещи, в будущем также будут вести себя похоже. С помощью коллаборативной фильтрации можно рекомендовать книги, на основе уже прочитанных, похожими людьми или продвигать услуги. Попробуем сравнить несколько объектов, используя язык Python.
  
  В настоящее время крупные компании, в том числе Банки хотят привлечь максимальное количество клиентов к себе, но анализировать вручную каждого клиента, чтобы предложить ему что-то – долго и нудно, поэтому эту задачу выполняют компьютеры.
  
  Существует несколько схем коллаборативной фильтрации:
  1. Вычисляют тех, кто разделяет оценочные суждения выбранного человека. Используют оценки максимально похоже мыслящих людей, найденных на первом шаге (для построения прогноза).
  2. Сначала строят матрицу, определяющую отношения между парами предметов, для нахождения похожих. Используя построенную матрицу и информацию о человеке – строят прогноз.
  (К сведению, второй способ был изобретен Amazon)
  
  Также существует 3 основных типа коллаборативной фильтрации:
  1. Основанный на соседстве (подбор группы похожих людей)
  2. Основанный на модели (использует методы машинного обучения)
  3. Гибридный (объединяет первый и второй тип)
  А теперь попробуем практически реализовать 1 тип на языке Python.
  
  Для начала построения прогнозов нам необходим DataSet, где есть пользователи (объекты) и их параметры (возраст, пол и другие параметры), пример изображен на рисунке 1.
  
  Первым шагом нам необходимо подключить библиотеку pandas, именно с помощью неё и будет реализован весь метод.
  
  Вторым шагом мы загружаем в DataFrame наш DataSet (у меня он с именем test), также мы устанавливаем в качестве индекса нашу колонку с именем «Объект» и создаем новый DataFrame для рекомендаций.
  
  recomendations=pd.DataFrame() df=pd.read_excel(‘test.xlsx’) df=df.astype(str) df=df.set_index(‘Объект’).T df.to_excel(‘DataSet.xlsx’) df=pd.read_excel(‘DataSet.xlsx’,index_col=0) recList=list()
  
  Третьим шагом мы перебираем все строки нашего DataFrame и для каждой находим один (можно и больше, если изменить переменную «k») максимально похожий объект (также удаляем этот объект, чтобы исключить случай, что максимально похожий — будет он сам) и записываем его в рекомендации.
  
  for row in df: print(row) k = 0 # Количество похожих объектов corrMatr=df.corrwith(df[row]) corrMatr=pd.DataFrame(corrMatr) tempMatr=corrMatr tempMatr=tempMatr.drop([row],axis=0) while k != 1: name = tempMatr.idxmax().item() value = tempMatr[0][tempMatr.idxmax().item()] recList.append([row,name,value]) tempMatr=tempMatr.drop([tempMatr.idxmax().item()],axis=0) k += 1 recomendations=recomendations.append(recList, ignore_index=True) recomendations.to_excel(‘result.xlsx’)
  
  В выходном файле, для примера, я вывел 3 колонки (рисунок 2):
  
  В итоге, мы нашли похожие объекты. Далее полученную информацию можно использовать уже по своему усмотрению. Например, искать пересечения обучающих курсов, книг, оценок, фильмов, музыки и рекомендовать только то, что нет у выбранного объекта, чтобы не советовать, что он уже прослушал или посмотрел и т.д.
  
  Источник

Построение рекомендательной системы python

Recommendation System in Python

Создаем рекомендательную систему с помощью языка Python