Boxplot python pandas title

pandas.DataFrame.boxplot#

Make a box-and-whisker plot from DataFrame columns, optionally grouped by some other columns. A box plot is a method for graphically depicting groups of numerical data through their quartiles. The box extends from the Q1 to Q3 quartile values of the data, with a line at the median (Q2). The whiskers extend from the edges of box to show the range of the data. By default, they extend no more than 1.5 * IQR (IQR = Q3 — Q1) from the edges of the box, ending at the farthest data point within that interval. Outliers are plotted as separate dots.

For further details see Wikipedia’s entry for boxplot.

Parameters column str or list of str, optional

Column name or list of names, or vector. Can be any valid input to pandas.DataFrame.groupby() .

by str or array-like, optional

Column in the DataFrame to pandas.DataFrame.groupby() . One box-plot will be done per value of columns in by .

ax object of class matplotlib.axes.Axes, optional

The matplotlib axes to be used by boxplot.

fontsize float or str

Tick label font size in points or as a string (e.g., large ).

rot float, default 0

The rotation angle of labels (in degrees) with respect to the screen coordinate system.

grid bool, default True

Setting this to True will show the grid.

figsize A tuple (width, height) in inches

The size of the figure to create in matplotlib.

layout tuple (rows, columns), optional

For example, (3, 5) will display the subplots using 3 rows and 5 columns, starting from the top-left.

return_type or None, default ‘axes’

The kind of object to return. The default is axes .

  • ‘axes’ returns the matplotlib axes the boxplot is drawn on.
  • ‘dict’ returns a dictionary whose values are the matplotlib Lines of the boxplot.
  • ‘both’ returns a namedtuple with the axes and dict.
  • when grouping with by , a Series mapping columns to return_type is returned. If return_type is None , a NumPy array of axes with the same shape as layout is returned.

Backend to use instead of the backend specified in the option plotting.backend . For instance, ‘matplotlib’. Alternatively, to specify the plotting.backend for the whole session, set pd.options.plotting.backend .

All other plotting keyword arguments to be passed to matplotlib.pyplot.boxplot() .

Matplotlib equivalent plot.

The return type depends on the return_type parameter:

  • ‘axes’ : object of class matplotlib.axes.Axes
  • ‘dict’ : dict of matplotlib.lines.Line2D objects
  • ‘both’ : a namedtuple with structure (ax, lines)

For data grouped with by , return a Series of the above or a numpy array:

Use return_type=’dict’ when you want to tweak the appearance of the lines after plotting. In this case a dict containing the Lines making up the boxes, caps, fliers, medians, and whiskers is returned.

Boxplots can be created for every column in the dataframe by df.boxplot() or indicating the columns to be used:

>>> np.random.seed(1234) >>> df = pd.DataFrame(np.random.randn(10, 4), . columns=['Col1', 'Col2', 'Col3', 'Col4']) >>> boxplot = df.boxplot(column=['Col1', 'Col2', 'Col3']) 

../../_images/pandas-DataFrame-boxplot-1.png

Boxplots of variables distributions grouped by the values of a third variable can be created using the option by . For instance:

>>> df = pd.DataFrame(np.random.randn(10, 2), . columns=['Col1', 'Col2']) >>> df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A', . 'B', 'B', 'B', 'B', 'B']) >>> boxplot = df.boxplot(by='X') 

../../_images/pandas-DataFrame-boxplot-2.png

A list of strings (i.e. [‘X’, ‘Y’] ) can be passed to boxplot in order to group the data by combination of the variables in the x-axis:

>>> df = pd.DataFrame(np.random.randn(10, 3), . columns=['Col1', 'Col2', 'Col3']) >>> df['X'] = pd.Series(['A', 'A', 'A', 'A', 'A', . 'B', 'B', 'B', 'B', 'B']) >>> df['Y'] = pd.Series(['A', 'B', 'A', 'B', 'A', . 'B', 'A', 'B', 'A', 'B']) >>> boxplot = df.boxplot(column=['Col1', 'Col2'], by=['X', 'Y']) 

../../_images/pandas-DataFrame-boxplot-3.png

The layout of boxplot can be adjusted giving a tuple to layout :

>>> boxplot = df.boxplot(column=['Col1', 'Col2'], by='X', . layout=(2, 1)) 

../../_images/pandas-DataFrame-boxplot-4.png

Additional formatting can be done to the boxplot, like suppressing the grid ( grid=False ), rotating the labels in the x-axis (i.e. rot=45 ) or changing the fontsize (i.e. fontsize=15 ):

>>> boxplot = df.boxplot(grid=False, rot=45, fontsize=15) 

../../_images/pandas-DataFrame-boxplot-5.png

The parameter return_type can be used to select the type of element returned by boxplot . When return_type=’axes’ is selected, the matplotlib axes on which the boxplot is drawn are returned:

>>> boxplot = df.boxplot(column=[‘Col1’, ‘Col2’], return_type=‘axes’) >>> type(boxplot)

When grouping with by , a Series mapping columns to return_type is returned:

>>> boxplot = df.boxplot(column=[‘Col1’, ‘Col2’], by=‘X’, . return_type=‘axes’) >>> type(boxplot)

If return_type is None , a NumPy array of axes with the same shape as layout is returned:

>>> boxplot = df.boxplot(column=[‘Col1’, ‘Col2’], by=‘X’, . return_type=None) >>> type(boxplot)

Источник

How to Plot the Boxplot from DataFrame?

Pandas DataFrame boxplot() function is used to make a box plot from the given DataFrame columns. Boxplot is also called a Whisker plot that helps us better understand by providing the range of values in your data set and identifying any outliers in a format that’s easier to understand than the raw data.

In the boxplot graph, the x-axis represents the data we are going to plot and the y-axis represents frequency. In this article, I will explain how to plot the boxplot from DataFrame. The boxplot is also present in the Matplotlib library.

1. Quick Examples of Create Boxplot of DataFrame

If you are in a hurry below are some quick examples of how to create a box plot using boxplot().

 # Below are the quick examples # Create DataFrame np.random.seed(10) df = pd.DataFrame(np.random.rand(10, 3), columns=['Num1', 'Num2', 'Num3' ]) # Example 1: Plot the box plot of single column of DataFrame b_plot = df.boxplot(column = 'Num1') b_plot.plot() # Example 2: Create plot box for multiple columns b_plot = df.boxplot(column = ['Num1', 'Num2', 'Num3']) b_plot.plot() # Example 3: Customize the boxplot color b_plot = df.boxplot(column = 'Num1', color = 'orange' ) b_plot.plot() # Example 4 : Create the title of the boxplot b_plot = df.boxplot(column = 'Num1') plot.title('Random Numbers') b_plot.plot() # Example 5: Customize the font size of boxplot b_plot = df.boxplot(column = 'Num1', fontsize = 15) b_plot.plot() 

2. Syntax of Pandas boxplot()

Following is the syntax of the boxplot() .

 # Syntax of boxplot() DataFrame.boxplot(column=None, by=None, ax=None, fontsize=None, rot=0, grid=True, figsize=None, layout=None, return_type=None, backend=None, **kwargs) 

2.1 Parameters of the boxplot()

  • column : ( string, list of string) Column name or names
  • by : (string, array) Column in the DataFrame to group by.
  • ax : object of class matplot.axes.Axes – The matplot axis to be used by a boxplot.
  • fontsize : (int or float) The font size of the label.
  • rot : (int or float) The degree by which the labels should be rotated.
  • grid : (bool) Whether or not to show the grid.
  • figsize : tuple (width, height) The size of the output image.
  • **kwargs : tuple (rows, columns) All other plotting keyword arguments to be passed to matplotlib.pyplot.boxplot().

2.2 Return Value

  • axes : Returns the matplot axes that the boxplot is drawn on
  • dict : Returns the dictionary that is in the matplotlib Lines of the boxplo
  • axes and dict : Returns a named tuple with the axes and dict.
  • Grouping with by : A series mapping columns to return_type is returned.
  • None : A NumPy array of axes with the same shape as layout is returned.

3. Usage of boxplot()

  • Median : Which is the value in the middle of the distribution.
  • Lower quartile : Midpoint between the median and lowest value in the range
  • Upper quartile : Midpoint between the median and highest value in the range
  • Lower boundary : Which is the lowest value in the distribution
  • Higher boundary : Which is the highest value in the distribution

4. Pandas Boxplot Single Column

We can visualize the given DataFrame in box plot chart by using boxplot() function, it will return the summarization of the given data in the form of boxplot. Let’s create a Pandas DataFrame with columns of randomly generated numbers using np.random.rand() function. In order to stop the repeating random numbers for every run time execution, we have to feed the random seed() function.

 # Imports import matplotlib.pyplot as plot import pandas as pd import numpy as np # Create DataFrame np.random.seed(10) df = pd.DataFrame(np.random.rand(10, 3), columns=['Num1', 'Num2', 'Num3' ]) print(df) 
 # Output: Num1 Num2 Num3 0 0.771321 0.020752 0.633648 1 0.748804 0.498507 0.224797 2 0.198063 0.760531 0.169111 3 0.088340 0.685360 0.953393 4 0.003948 0.512192 0.812621 5 0.612526 0.721755 0.291876 6 0.917774 0.714576 0.542544 7 0.142170 0.373341 0.674134 8 0.441833 0.434014 0.617767 9 0.513138 0.650397 0.601039 

By using the above DataFrame, plot the Boxplot on random numbers. In the boxplot, the bottom line indicates the minimum number of random numbers and the top line indicates the maximum number of random numbers. Between the bottom and top, the middle 3 lines indicate 1st quartile, median, and 3rd quartile respectively.

Let’s create a boxplot for a single column of a given DataFrame using boxplot() function. It will generate a boxplot from the column of ‘Num1’ .

 # Plot the box plot of single column of DataFrame b_plot = df.boxplot(column = 'Num1') b_plot.plot() plot.show() 

Pandas boxplot

5. Pandas Boxplot Multiple Columns

Let’s create a boxplot() with multiple column names, for each column it creates a boxplot. It will generate multiple boxplots from the columns of ‘Num1’, ‘Num2’, ‘Num3’ . Boxplots are not limited to depicting single columns, A major use case for boxplots is to compare related distributions. For example,

 # Create plot box for multiple columns b_plot = df.boxplot(column = ['Num1', 'Num2', 'Num3']) b_plot.plot() plot.show() 

Pandas boxplot chart

From the above, you can see the distributions of the random number for all columns of random numbers and how each column’s numbers compare with others. You can also notice that an outlier in the “Num2” distribution, as denoted by the bubble outside the distribution.

6. Pandas Boxplot Customizations

The pandas library provides multiple keyword arguments for providing customization of boxplots. Let’s see some of them and how they work with boxplots.

6.1 Customize the Color of Boxplot

We can improve the boxplot distribution by providing customized colors for that we need to pass the color argument into boxplot(), which will return the desired color of the boxplot.

 # Customize the boxplot color b_plot = df.boxplot(column = 'Num1', color = 'orange' ) b_plot.plot() plot.show() 

Pandas boxplot

6.2 Pandas Boxplot Title

By providing the title to the boxplot, users can understand quickly what they are seeing. You can add a title to your boxplot by using the title() function.

 # Create the title of the boxplot b_plot = df.boxplot(column = 'Num1') plot.title('Random Numbers') b_plot.plot() plot.show 

Pandas box plot

6.3 Pandas Boxplot Label Font Size

We can change the default font size by providing a customized size. This can help the boxplot more clearly and easier to read. For that, we need to pass the fontsize argument in to this function.

 # Customize the font size of boxplot b_plot = df.boxplot(column = 'Num1', fontsize = 15) b_plot.plot() plot.show 

Pandas boxplot

7. Conclusion

In this article, I have explained boxplot() function and using this how we can plot the data in a DataFrame in the form boxplot presentation. And also I explained the organization of the boxplot using various keyword arguments.

  • How to Plot a Scatter Plot Using Pandas?
  • How to Change Pandas Plot Size?
  • Create Pandas Plot Bar Explained with Example
  • How to add title to Pandas plots?
  • How to generate line plot in Pandas?
  • How to add legends to plots in Pandas
  • How to change Plot size in pandas?
  • How to Generate Time Series Plot in Pandas?
  • Create Pandas Plot Bar Explained with Example
  • How to Plot Columns of Pandas DataFrame
  • How to generate histograms in Pandas?
  • How to create Pandas Series plot?

References

You may also like reading:

Источник

Читайте также:  Bool function in java
Оцените статью