- How does python find packages?
- sys.path
- How sys.path gets populated
- You can manipulate sys.path
- The module __file__ attribute
- The imp module
- Ubuntu Python
- Ubuntu Python ( /usr/bin/python ):
- Python compiled from source ( /usr/local/bin/python )
- How did Ubuntu manipulate the sys.path ?
- Where does Python look for modules?¶
- Python looks for modules in “sys.path”¶
How does python find packages?
I just ran into a situation where I compiled and installed Python 2.7.9 from source on Ubuntu, but Python could not find the packages I had previously installed. This naturally raises the question — how does Python know where to find packages when you call import ? This post applies specifically to Python 2.7.9, but I’m guessing Python 3x works very similarly.
In this post I first describe how Python finds packages, and then I’ll finish with the discovery I made regarding the default Python that ships with Ubuntu and how it differs from vanilla Python in how it finds packages.
sys.path
Python imports work by searching the directories listed in sys.path .
Using my default Ubuntu 14.04 Python:
> import sys > print '\n'.join(sys.path) /usr/lib/python2.7 /usr/lib/python2.7/plat-x86_64-linux-gnu /usr/lib/python2.7/lib-tk /usr/lib/python2.7/lib-old /usr/lib/python2.7/lib-dynload /usr/local/lib/python2.7/dist-packages /usr/lib/python2.7/dist-packages
So Python will find any packages that have been installed to those locations.
How sys.path gets populated
As the docs explain, sys.path is populated using the current working directory, followed by directories listed in your PYTHONPATH environment variable, followed by installation-dependent default paths, which are controlled by the site module.
You can read more about sys.path in the Python docs.
Assuming your PYTHONPATH environment variable is not set, sys.path will consist of the current working directory plus any manipulations made to it by the site module.
The site module is automatically imported when you start Python, you can read more about how it manipulates your sys.path in the Python docs.
You can manipulate sys.path
You can manipulate sys.path during a Python session and this will change how Python finds modules. For example:
import sys, os # This won't work - there is no hi module import hi Traceback (most recent call last): File "", line 1, in module> ImportError: No module named hi # Create a hi module in your home directory. home_dir = os.path.expanduser("~") my_module_file = os.path.join(home_dir, "hi.py") with open(my_module_file, 'w') as f: f.write('print "hi"\n') f.write('a=10\n') # Add the home directory to sys.path sys.path.append(home_dir) # Now this works, and prints hi! import hi print hi.a
The module __file__ attribute
When you import a module, you usually can check the __file__ attribute of the module to see where the module is in your filesystem:
> import numpy > numpy.__file__ '/usr/local/lib/python2.7/dist-packages/numpy/__init__.pyc'
The file attribute is not present for C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file.
So, for example this doesn’t work:
> import sys > sys.__file__ Traceback (most recent call last): File "", line 1, in module> AttributeError: 'module' object has no attribute '__file__'
It makes sense that the sys module is statically linked to the interpreter — it is essentially part of the interpreter!
The imp module
Python exposes the entire import system through the imp module. That’s pretty cool that all of this stuff is exposed for us to abuse, if we wanted to.
imp.find_module can be used to find a module:
> import imp > imp.find_module('numpy') (None, '/usr/local/lib/python2.7/dist-packages/numpy', ('', '', 5))
You can also import and arbitrary Python source as a module using imp.load_source . This is the same example before, except imports our module using imp instead of by manipulating sys.path :
import sys, os, imp # Create a hi module in your home directory. home_dir = os.path.expanduser("~") my_module_file = os.path.join(home_dir, "hi.py") with open(my_module_file, 'w') as f: f.write('print "hi"\n') f.write('a=10\n') # Load the hi module using imp hi = imp.load_source('hi', my_module_file) # Now this works, and prints hi! import hi print hi.a # a is 10! print type(hi) # it's a module!
Passing ‘hi’ to imp.load_source simply sets the __name__ attribute of the module.
Ubuntu Python
Now back to the issue of missing packages after installing a new version of Python compiled from source. By comparing the sys.path from both the Ubuntu Python, which resides at /usr/bin/python , and the newly installed Python, which resides at /usr/local/bin/python , I could sort things out:
Ubuntu Python ( /usr/bin/python ):
>>> import sys >>> print '\n'.join(sys.path) /usr/lib/python2.7 /usr/lib/python2.7/plat-x86_64-linux-gnu /usr/lib/python2.7/lib-tk /usr/lib/python2.7/lib-old /usr/lib/python2.7/lib-dynload /usr/local/lib/python2.7/dist-packages /usr/lib/python2.7/dist-packages
Python compiled from source ( /usr/local/bin/python )
>>> import sys >>> print '\n'.join(sys.path) /usr/local/lib/python27.zip /usr/local/lib/python2.7 /usr/local/lib/python2.7/plat-linux2 /usr/local/lib/python2.7/lib-tk /usr/local/lib/python2.7/lib-old /usr/local/lib/python2.7/lib-dynload /usr/local/lib/python2.7/site-packages
Turns out what mattered for me was dist-packages vs. site-packages . Using Ubuntu’s Python, my packages were installed to /usr/local/lib/python2.7/dist-packages , whereas the new Python I installed expects packages to be installed to /usr/local/lib/python2.7/site-packages . I just had to manipulate the PYTHONPATH environment variable to point to dist-packages in order to gain access to the previously installed packaged with the newly installed version of Python.
How did Ubuntu manipulate the sys.path ?
So how does the Ubuntu distribution of Python know to use /usr/local/lib/python2.7/dist-packages in sys.path ? It’s hardcoded into their site module! First, find where the site module code lives:
> import site > site.__file__ '/usr/lib/python2.7/site.pyc'
Here is an excerpt from Ubuntu Python’s site.py , which I peeked by opening /usr/lib/python2.7/site.py in a text editor. First, a comment at the top:
For Debian and derivatives, this sys.path is augmented with directories for packages distributed within the distribution. Local addons go into /usr/local/lib/python /dist-packages, Debian addons install into /usr//python /dist-packages. /usr/lib/python /site-packages is not used.
OK so there you have it. They explain how the Debian distribution of Python is different.
And now, for the code that implementes this change:
def getsitepackages(): """Returns a list containing all global site-packages directories (and possibly site-python). For each directory present in the global ``PREFIXES``, this function will find its `site-packages` subdirectory depending on the system environment, and will return a list of full paths. """ sitepackages = [] seen = set() for prefix in PREFIXES: if not prefix or prefix in seen: continue seen.add(prefix) if sys.platform in ('os2emx', 'riscos'): sitepackages.append(os.path.join(prefix, "Lib", "site-packages")) elif os.sep == '/': sitepackages.append(os.path.join(prefix, "local/lib", "python" + sys.version[:3], "dist-packages")) sitepackages.append(os.path.join(prefix, "lib", "python" + sys.version[:3], "dist-packages")) else: sitepackages.append(prefix) sitepackages.append(os.path.join(prefix, "lib", "site-packages")) if sys.platform == "darwin": # for framework builds *only* we add the standard Apple # locations. from sysconfig import get_config_var framework = get_config_var("PYTHONFRAMEWORK") if framework: sitepackages.append( os.path.join("/Library", framework, sys.version[:3], "site-packages")) return sitepackages
It’s all there, if you are crazy enough to dig this deep.
© Lee Mendelowitz – Built with Pure Theme for Pelican
Where does Python look for modules?¶
Let’s say we have written a Python module and saved it as a_module.py , in a directory called code .
We also have a script called a_script.py in a directory called scripts .
We want to be able to import the code in a_module.py to use in a_script.py . So, we want to be able to put his line in a_script.py :
The module and script might look like this:
def func(): print("Running useful function")
import a_module a_module.func()
At the moment, a_script.py will fail with:
$ python3 scripts/a_script.py Traceback (most recent call last): File "scripts/a_script.py", line 1, in import a_module ModuleNotFoundError: No module named 'a_module'
When Python hits the line import a_module , it tries to find a package or a module called a_module . A package is a directory containing modules, but we will only consider modules for now. A module is a file with a matching extension, such as .py . So, Python is looking for a file a_module.py , and not finding it.
You will see the same effect at the interactive Python console, or in IPython:
>>> import a_module Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'a_module'
Python looks for modules in “sys.path”¶
Python has a simple algorithm for finding a module with a given name, such as a_module . It looks for a file called a_module.py in the directories listed in the variable sys.path .
>>> import sys >>> type(sys.path) >>> for path in sys.path: . print(path) . /Users/brettmz-admin/dev_trees/psych-214-fall-2016/sphinxext /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python37.zip /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7 /usr/local/Cellar/python/3.7.2_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/lib-dynload /Users/brettmz-admin/Library/Python/3.7/lib/python/site-packages /Users/brettmz-admin/dev_trees/grin /Users/brettmz-admin/dev_trees/rmdex /usr/local/lib/python3.7/site-packages
The a_module.py file is in the code directory, and this directory is not in the sys.path list.
Because sys.path is just a Python list, like any other, we can make the import work by appending the code directory to the list.
>>> import sys >>> sys.path.append('code') >>> # Now the import will work >>> import a_module
There are various ways of making sure a directory is always on the Python sys.path list when you run Python, including:
- put the directory into the contents of the PYTHONPATH environment variable – Using PYTHONPATH
- make the module part of an installable package, and install it – see: making a Python package.
As a crude hack, you can also put your code directory on the Python sys.path at the top of the files that need it:
import sys sys.path.append('code') import a_module a_module.func()
$ python3 scripts/a_script_with_hack.py Running useful function
The simple append above will only work when running the script from a directory containing the code subdirectory. For example:
$ mkdir another_dir $ cd another_dir $ python3 ../scripts/a_script_with_hack.py Traceback (most recent call last): File "../scripts/a_script_with_hack.py", line 4, in import a_module ModuleNotFoundError: No module named 'a_module'
This is because the directory code that we specified is a relative path, and therefore Python looks for the code directory in the current working directory.
To make the hack work when running the code from any directory, you could use some path manipulation on the The “__file__” variable :
from os.path import dirname, abspath, join import sys # Find code directory relative to our directory THIS_DIR = dirname(__file__) CODE_DIR = abspath(join(THIS_DIR, '..', 'code')) sys.path.append(CODE_DIR) import a_module a_module.func()
Now the module import does work from another_dir :
$ python3 ../scripts/a_script_with_better_hack.py Running useful function