Pandas profiling to html

Содержание

Saved searches
Use saved searches to filter your results more quickly
License
eefalco/pandas-profiling
Name already in use
Sign In Required
Launching GitHub Desktop
Launching GitHub Desktop
Launching Xcode
Launching Visual Studio Code
Latest commit
Git stats
Files
README.md
About

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Create HTML profiling reports from pandas DataFrame objects

License

eefalco/pandas-profiling

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

pandas-profiling generates profile reports from a pandas DataFrame . The pandas df.describe() function is handy yet a little basic for exploratory data analysis. pandas-profiling extends pandas DataFrame with df.profile_report() , which automatically generates a standardized univariate and multivariate report for data understanding.

For each column, the following information (whenever relevant for the column type) is presented in an interactive HTML report:

Type inference: detect the types of columns in a DataFrame
Essentials: type, unique values, indication of missing values
Quantile statistics: minimum value, Q1, median, Q3, maximum, range, interquartile range
Descriptive statistics: mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
Most frequent and extreme values
Histograms: categorical and numerical
Correlations: high correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér’s V, Phik)
Missing values: through counts, matrix, heatmap and dendrograms
Duplicate rows: list of the most common duplicated rows
Text analysis: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic)
File and Image analysis: file sizes, creation dates, dimensions, indication of truncated images and existance of EXIF metadata

The report contains three additional sections:

Overview: mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint)
Alerts: a comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others)
Reproduction: technical details about the analysis (time, version and configuration)

⚡ Looking for a Spark backend to profile large datasets? It’s work in progress.

⌛ Interested in uncovering temporal patterns? Check out popmon.

Start by loading your pandas DataFrame as you normally would, e.g. by using:

import numpy as np import pandas as pd from pandas_profiling import ProfileReport df = pd.DataFrame(np.random.rand(100, 5), columns=["a", "b", "c", "d", "e"])

To generate the standard profiling report, merely run:

profile = ProfileReport(df, title="Pandas Profiling Report")

Using inside Jupyter Notebooks

There are two interfaces to consume the report inside a Jupyter notebook: through widgets and through an embedded HTML report.

The above is achieved by simply displaying the report as a set of widgets. In a Jupyter Notebook, run:

The HTML report can be directly embedded in a cell in a similar fashion:

profile.to_notebook_iframe()

Exporting the report to a file

To generate a HTML report file, save the ProfileReport to an object and use the to_file() function:

profile.to_file("your_report.html")

Alternatively, the report’s data can be obtained as a JSON file:

# As a JSON string json_data = profile.to_json() # As a file profile.to_file("your_report.json")

Using in the command line

For standard formatted CSV files (which can be read directly by pandas without additional settings), the pandas_profiling executable can be used in the command line. The example below generates a report named Example Profiling Report, using a configuration file called default.yaml , in the file report.html by processing a data.csv dataset.

pandas_profiling --title "Example Profiling Report" --config_file default.yaml data.csv report.html

Additional details on the CLI are available on the documentation.

The following example reports showcase the potentialities of the package across a wide range of dataset and data types:

Census Income (US Adult Census data relating income with other demographic properties)
NASA Meteorites (comprehensive set of meteorite landing — object properties and locations)
Titanic (the «Wonderwall» of datasets)
NZA (open data from the Dutch Healthcare Authority)
Stata Auto (1978 Automobile data)
Colors (a simple colors dataset)
Vektis (Vektis Dutch Healthcare data)
UCI Bank Dataset (marketing dataset from a bank)
Russian Vocabulary (100 most common Russian words, showcasing unicode text analysis)
Website Inaccessibility (website accessibility analysis, showcasing support for URL data)
Orange prices and Coal prices (simple pricing evolution datasets, showcasing the theming options)

Additional details, including information about widget support, are available on the documentation.

You can install using the pip package manager by running:

pip install -U pandas-profiling[notebook]

You can install using the conda package manager by running:

conda install -c conda-forge pandas-profiling

Download the source code by cloning the repository or click on Download ZIP to download the latest stable version.

Install it by navigating to the proper directory and running:

The profiling report is written in HTML and CSS, which means a modern browser is required.

You need Python 3 to run the package. Other dependencies can be found in the requirements files:

Filename	Requirements
requirements.txt	Package requirements
requirements-dev.txt	Requirements for development
requirements-test.txt	Requirements for testing
setup.py	Requirements for widgets etc.

The documentation includes guides, tips and tricks for tackling commmon use cases:

Use case	Description
Profiling large datasets	Tips on how to prepare data and configure pandas-profiling for working with large datasets
Handling sensitive data	Generating reports which are mindful about sensitive data in the input dataset
Dataset metadata and data dictionaries	Complementing the report with dataset details and column-specific data dictionaries
Customizing the report’s appearance	Changing the appearance of the report’s page and of the contained visualizations

To maximize its usefulness in real world contexts, pandas-profiling has a set of implicit and explicit integrations with a variety of other actors in the Data Science ecosystem:

Integration type	Description
Other DataFrame libraries	How to compute the profiling of data stored in libraries other than pandas
Great Expectations	Generating Great Expectations expectations suites directly from a profiling report
Interactive applications	Embedding profiling reports in Streamlit, Dash or Panel applications
Pipelines	Integration with DAG workflow execution tools like Airflow or Kedro
Cloud services	Using pandas-profiling in hosted computation services like Lambda, Google Cloud or Kaggle
IDEs	Using pandas-profiling directly from integrated development environments such as PyCharm

Need help? Want to share a perspective? Report a bug? Ideas for collaborations? Reach out via the following channels:

Stack Overflow: ideal for asking questions on how to use the package
GitHub Issues: bugs, proposals for changes, feature requests
Slack: general chat, questions, collaborations
Email: project collaborations or sponsoring

❗ Before reporting an issue on GitHub, check out Common Issues.

Learn how to get involved in the Contribution Guide.

A low-threshold place to ask questions or start contributing is the Data Centric AI Community’s Slack.

About

Create HTML profiling reports from pandas DataFrame objects

Источник

Pandas profiling to html

Saved searches

Use saved searches to filter your results more quickly

License

eefalco/pandas-profiling

Name already in use

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md

About