Python Libraries Every Programming Beginner Should Know
Python is a popular general-purpose programming language. Its clear syntax makes it ideal for beginners to learn. One of this language’s advantages is the large number of open-source libraries available. A Python library is a group of related code modules. You can use these modules in your own programs to make coding simpler and faster – e.g. instead of writing your own function to open an Excel file, you can use one from the pandas library.
Many libraries and modules come with the standard installation of Python, while others need to be downloaded separately. Once they are installed, they can be easily imported into your project, giving you direct access to additional functionality.
Previously, we discussed the 13 Top Python Libraries You Should Know in 2020 and the Most Popular Python Packages in 2021. In this article, we will show you some of the most important and useful libraries you should know as a beginning Python programmer.
For those of you who are new to Python, you can get started using our Python Basics track. If you’re interested in working with data, take a look at the Introduction to Python for Data Science course; both of these options include interactive exercises to accelerate your learning. If you’re looking for some more advanced material, we have a Data Processing with Python track.
Now that we know what libraries are and how to begin your Python learning journey, let’s talk about the Python libraries beginners should know.
7 Best Python Libraries for New Programmers
Here’s our list of Python libraries every beginner should know. You may not want to learn all of them in detail, but you should get familiar with what they can do.
1. NumPy
NumPy is one of the most widely-used Python libraries. It contains functionality for fast and efficient numerical computations, but its strength lies in working with arrays. In Python, arrays can contain integers, floats, strings, or even complex numbers. For example, a 2-dimensional NumPy array can be created as follows:
>>> import numpy as np >>> ar = np.array([[1, 2, 3], [4, 5, 6]]) >>> print(ar) [[1 2 3] [4 5 6]]
We can also efficiently calculate the square root of each element or do some other calculations for all data at the same time:
>>> print(np.sqrt(ar)) [[1. 1.41421356 1.73205081], [2. 2.23606798 2.44948974]]
If you need to work with large amounts of numerical data, look no further than NumPy. We have a dedicated article on An Introduction to NumPy in Python. Check it out for more examples.
2. pandas
The pandas library has become the backbone of data analysis in Python; it’s a must-know for those of you wanting to learn how to work with data. With pandas, you can read in data from a file, do some exploratory data analysis and visualizations, manipulate the data, calculate statistics, and much more.
To learn more about reading files into Python, take a look at some of our courses. They’ll teach you everything you need to know about reading in Excel files, CSV files, and JSON files in Python.
Let’s say you have an Excel file containing students’ grades in different subjects. We can use pandas to easily read in the file:
>>> import pandas as pd >>> df = pd.read_excel('grades.xlsx', index_col='name') >>> print(df) physics geography french name Sam 68 81 78 Aiko 91 84 88 Lisa 62 73 74 Jonas 72 57 60
Now we can calculate the average grade for each student:
>>> print(df.mean(axis=1)) name Sam 75.666667 Aiko 87.666667 Lisa 69.666667 Jonas 63.000000
This could then be merged with data from students in another class and filtered to find the students with the highest grades. Then the results could be written out to a new file – all with just a few lines of code.
3. matplotlib
Producing nice visualizations quickly and easily is a key skill for many Python projects. Although there are many libraries for producing different kinds of plots in Python, matplotlib should be your first stop. This is due to its ease of use and mature documentation with an excellent assortment of examples.
We have a few articles that demonstrate everything from reading in the data, manipulating and preparing the data, and producing nice plots in matplotlib. Check them out at How to Plot a Running Average in Python Using matplotlib and How to Visualize Sound in Python.
4. os
This one isn’t as glamorous as some of the other Python libraries on this list. However, it comes in handy when you need to interact with the operating system from within your Python program. For example, you can get the current directory of your project and list files in the directory like this:
>>> import os >>> root_directory = os.getcwd() >>> file_list = os.listdir(root_directory)
After you have your list of files, you could select some and move them to another directory using os.rename() or os.replace() and change the permissions using os.chmod() . Check out Python’s official documentation for more information. If you want some relevant learning material, take a look at our Working with Files and Directories in Python course.
5. datetime
If you need to work with dates or times in your Python project, look no further than the datetime library. With it, you can define dates and times as objects and manipulate them. – e.g. by adding a certain number of days or calculating the time between two datetime objects. This saves you the trouble of worrying about the different numbers of days in each month or leap years. You can even make your datetime objects timezone aware. Here’s how to calculate how many days have passed since your birthday:
>>> import datetime as dt >>> date_of_birth = dt.datetime(1990, 3, 12) >>> print(dt.datetime.today() - date_of_birth) 11818 days, 17:17:27.865661
If you want some more examples, take a look at our article on How to Work with Date and Time in Python.
6. statsmodels
Doing statistics is an important part of scientific programming. You have a few options to choose from (such as using NumPy or pandas for calculating averages). However, statsmodels takes it further and provides functions for estimating many different statistical models and conducting statistical tests.
This library is built on NumPy and SciPy (another great library for scientific computing). With statsmodels, you can easily fit a regression model to some data and print out the results summary containing the model parameters, r-squared metric, f-statistic, and much more information. Here’s the statsmodels documentation if you’re looking for more details.
7. scikit-learn
If you want to get into machine learning, this library should be top of your list. It contains functionality to do everything in the machine learning pipeline:
- Load, clean, and prepare data.
- Split data into train and test sets.
- Calculate features.
- Train a supervised or unsupervised algorithm.
- Evaluate an algorithm’s performance.
You can use scikit-learn for many machine learning projects, regardless of whether you’re working with numerical data, text, or images.
The scikit-learn library even contains some test data sets which you can directly import into Python. This means you don’t have to find your own data if you want to get a little experience in machine learning. Here’s how to load the famous iris data set:
>>> from sklearn.datasets import load_iris >>> X = data.data >>> y = data.target
From here, you could do some exploratory data analysis by calculating average values using NumPy or visualizing the data using matplotlib. You could even put the X and y arrays into a pandas DataFrame to practice manipulating the data. For more of a challenge, you could use scikit-learn to do a cluster analysis. If you can manage this, you’re well on your way to becoming a machine learning master!
Want More Python Libraries?
Many of the libraries we discussed here rightfully earned their place in the Top 15 Python Libraries for Data Science. But there are so many fantastic Python libraries out there that it’s hard to narrow them down to the most important. Regardless of what you want to do in Python, chances are there’s a library for it; it’s always a good idea to see if something already exists. This will make your life easier and your programs more robust. If all else fails, then how about Writing a Custom Module in Python?