Category encoders python install

Содержание

Saved searches
Use saved searches to filter your results more quickly
License
andymancodes/categorical-encoding
Name already in use
Sign In Required
Launching GitHub Desktop
Launching GitHub Desktop
Launching Xcode
Launching Visual Studio Code
Latest commit
Git stats
Files
README.md
About
Categorical Encoding Methods
Important Links
Encoding Methods
Installation
Usage
Examples
Contributing
References

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

A library of sklearn compatible categorical variable encoders

License

andymancodes/categorical-encoding

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Categorical Encoding Methods

A set of scikit-learn-style transformers for encoding categorical variables into numeric by means of different techniques.

Ordinal [2][3]
One-Hot [2][3]
Binary [5]
Helmert Contrast [2][3]
Sum Contrast [2][3]
Polynomial Contrast [2][3]
Backward Difference Contrast [2][3]
Hashing [1]
BaseN [6]
LeaveOneOut [4]
Target Encoding [7]

The package by itself comes with a single module and an estimator. Before installing the module you will need numpy , statsmodels , and scipy .

To install the module execute:

pip install category_encoders

conda install -c conda-forge category_encoders

import category_encoders as ce encoder = ce.BackwardDifferenceEncoder(cols=[. ]) encoder = ce.BinaryEncoder(cols=[. ]) encoder = ce.HashingEncoder(cols=[. ]) encoder = ce.HelmertEncoder(cols=[. ]) encoder = ce.OneHotEncoder(cols=[. ]) encoder = ce.OrdinalEncoder(cols=[. ]) encoder = ce.SumEncoder(cols=[. ]) encoder = ce.PolynomialEncoder(cols=[. ]) encoder = ce.BaseNEncoder(cols=[. ]) encoder = ce.TargetEncoder(cols=[. ]) encoder = ce.LeaveOneOutEncoder(cols=[. ])

All of these are fully compatible sklearn transformers, so they can be used in pipelines or in your existing scripts. If the cols parameter isn’t passed, every non-numeric column will be encoded. Please see the docs for transformer-specific configuration options.

from category_encoders import * import pandas as pd from sklearn.datasets import load_boston # prepare some data bunch = load_boston() y = bunch.target X = pd.DataFrame(bunch.data, columns=bunch.feature_names) # use binary encoding to encode two categorical features enc = BinaryEncoder(cols=['CHAS', 'RAD']).fit(X, y) # transform the dataset numeric_dataset = enc.transform(X)

In the examples directory, there is an example script used to benchmark different encoding techniques on various datasets.

The datasets used in the examples are car, mushroom, and splice datasets from the UCI dataset repository, found here:

Category encoders is under active development, if you’d like to be involved, we’d love to have you. Check out the CONTRIBUTING.md file or open an issue on the github project to get started.

Kilian Weinberger; Anirban Dasgupta; John Langford; Alex Smola; Josh Attenberg (2009). Feature Hashing for Large Scale Multitask Learning. Proc. ICML.
Contrast Coding Systems for categorical variables. UCLA: Statistical Consulting Group. from https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/.
Gregory Carey (2003). Coding Categorical Variables, from http://psych.colorado.edu/~carey/Courses/PSYC5741/handouts/Coding%20Categorical%20Variables%202006-03-03.pdf
Strategies to encode categorical variables with many categories. from https://www.kaggle.com/c/caterpillar-tube-pricing/discussion/15748#143154.
Beyond One-Hot: an exploration of categorical variables. from http://www.willmcginnis.com/2015/11/29/beyond-one-hot-an-exploration-of-categorical-variables/
BaseN Encoding and Grid Search in categorical variables. from http://www.willmcginnis.com/2016/12/18/basen-encoding-grid-search-category_encoders/
A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems. from https://kaggle2.blob.core.windows.net/forum-message-attachments/225952/7441/high%20cardinality%20categoricals.pdf

About

A library of sklearn compatible categorical variable encoders

Источник

Categorical Encoding Methods

A set of scikit-learn-style transformers for encoding categorical variables into numeric by means of different techniques.

Important Links

Encoding Methods

Unsupervised:

Backward Difference Contrast [2][3]
BaseN [6]
Binary [5]
Gray [14]
Count [10]
Hashing [1]
Helmert Contrast [2][3]
Ordinal [2][3]
One-Hot [2][3]
Rank Hot [15]
Polynomial Contrast [2][3]
Sum Contrast [2][3]

CatBoost [11]
Generalized Linear Mixed Model [12]
James-Stein Estimator [9]
LeaveOneOut [4]
M-estimator [7]
Target Encoding [7]
Weight of Evidence [8]
Quantile Encoder [13]
Summary Encoder [13]

Installation

The package requires: numpy , statsmodels , and scipy .

To install the package, execute:

pip install category_encoders

conda install -c conda-forge category_encoders

To install the development version, you may use:

pip install --upgrade git+https://github.com/scikit-learn-contrib/category_encoders

Usage

All of the encoders are fully compatible sklearn transformers, so they can be used in pipelines or in your existing scripts. Supported input formats include numpy arrays and pandas dataframes. If the cols parameter isn’t passed, all columns with object or pandas categorical data type will be encoded. Please see the docs for transformer-specific configuration options.

Examples

There are two types of encoders: unsupervised and supervised. An unsupervised example:

   For the transformation of the training data with the supervised methods, you should use fit_transform() method instead of fit().transform() , because these two methods do not have to generate the same result. The difference can be observed with LeaveOneOut encoder, which performs a nested cross-validation for the training data in fit_transform() method (to decrease over-fitting of the downstream model) but uses all the training data for scoring with transform() method (to get as accurate estimates as possible).

Furthermore, you may benefit from following wrappers:

PolynomialWrapper, which extends supervised encoders to support polynomial targets
NestedCVWrapper, which helps to prevent overfitting

Additional examples and benchmarks can be found in the examples directory.

Contributing

Category encoders is under active development, if you’d like to be involved, we’d love to have you. Check out the CONTRIBUTING.md file or open an issue on the github project to get started.

References

Kilian Weinberger; Anirban Dasgupta; John Langford; Alex Smola; Josh Attenberg (2009). Feature Hashing for Large Scale Multitask Learning. Proc. ICML.
Contrast Coding Systems for categorical variables. UCLA: Statistical Consulting Group. From https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/.
Gregory Carey (2003). Coding Categorical Variables. From http://psych.colorado.edu/~carey/Courses/PSYC5741/handouts/Coding%20Categorical%20Variables%202006-03-03.pdf
Owen Zhang — Leave One Out Encoding. From https://datascience.stackexchange.com/questions/10839/what-is-difference-between-one-hot-encoding-and-leave-one-out-encoding
Beyond One-Hot: an exploration of categorical variables. From http://www.willmcginnis.com/2015/11/29/beyond-one-hot-an-exploration-of-categorical-variables/
BaseN Encoding and Grid Search in categorical variables. From http://www.willmcginnis.com/2016/12/18/basen-encoding-grid-search-category_encoders/
Daniele Miccii-Barreca (2001). A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems. SIGKDD Explor. Newsl. 3, 1. From http://dx.doi.org/10.1145/507533.507538
Weight of Evidence (WOE) and Information Value Explained. From https://www.listendata.com/2015/03/weight-of-evidence-woe-and-information.html
Empirical Bayes for multiple sample sizes. From http://chris-said.io/2017/05/03/empirical-bayes-for-multiple-sample-sizes/
Simple Count or Frequency Encoding. From https://www.datacamp.com/community/tutorials/encoding-methodologies
Transforming categorical features to numerical features. From https://tech.yandex.com/catboost/doc/dg/concepts/algorithm-main-stages_cat-to-numberic-docpage/
Andrew Gelman and Jennifer Hill (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. From https://faculty.psau.edu.sa/filedownload/doc-12-pdf-a1997d0d31f84d13c1cdc44ac39a8f2c-original.pdf
Carlos Mougan, David Masip, Jordi Nin and Oriol Pujol (2021). Quantile Encoder: Tackling High Cardinality Categorical Features in Regression Problems. Modeling Decisions for Artificial Intelligence, 2021. Springer International Publishing https://link.springer.com/chapter/10.1007%2F978-3-030-85529-1_14
Gray Encoding. From https://en.wikipedia.org/wiki/Gray_code
Jacob Buckman, Aurko Roy, Colin Raffel, Ian Goodfellow: Thermometer Encoding: One Hot Way To Resist Adversarial Examples. From https://openreview.net/forum?id=S18Su—CW
Fairness implications of encoding protected categorical attributes. Carlos Mougan, Jose Alvarez, Salvatore Ruggieri, and Steffen Staab. In Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’21, https://arxiv.org/abs/2201.11358

Источник

Читайте также: Favicon png html code

Category encoders python install

Saved searches

Use saved searches to filter your results more quickly

License

andymancodes/categorical-encoding

Name already in use

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md

About

Categorical Encoding Methods

Important Links

Encoding Methods

Installation

Usage

Examples

Contributing

References