The Definitive Guide to Python import Statements

The Definitive Guide to Python import Statements

Summary / Key Points


Basic Definitions


Packages


Python has only one type of module object, and all modules are of this type, regardless of whether the module is implemented in Python, C, or something else. To help organize modules and provide a naming hierarchy, Python has a concept of packages.

You can think of packages as the directories on a file system and modules as files within directories, but don’t take this analogy too literally since packages and modules need not originate from the file system. For the purposes of this documentation, we’ll use this convenient analogy of directories and files. Like file system directories, packages are organized hierarchically, and packages may themselves contain subpackages, as well as regular modules.

It’s important to keep in mind that all packages are modules, but not all modules are packages. Or put another way, packages are just a special kind of module. Specifically, any module that contains a __path__ attribute is considered a package.

All modules have a name. Subpackage names are separated from their parent package name by a dot, akin to Python’s standard attribute access syntax. Thus you might have a module called sys and a package called email, which in turn has a subpackage called email.mime and a module within that subpackage called email.mime.text.

What is an import


When a module is imported, Python runs all of the code in the module file. When a package is imported, Python runs all of the code in the package’s __init__.py file, if such a file exists. All of the objects defined in the module or the package’s __init__.py file are made available to the importer.

Basics of the Python import and sys.path


According to Python documentation, here is how an import statement searches for the correct module or package to import:

Technically, Python’s documentation is incomplete. The interpreter will not only look for a file (i.e., module) named spam.py, it will also look for a folder (i.e., package) named spam.

Note that the Python interpreter first searches through the list of built-in modules, modules that are compiled directly into the Python interpreter. This list of built-in modules is installation-dependent and can be found in sys.builtin_module_names (Python 2 and 3).

The function pkgutil.iter_modules (Python 2 and 3) can be used to get a list of all importable modules from a given path:

import pkgutil
search_path = ['.'] # set to None to see all modules importable from sys.path
all_modules = [x[1] for x in pkgutil.iter_modules(path=search_path)]
print(all_modules)

Sources:

More on sys.path

To see what is in sys.path, run the following in the interpreter or as a script:

import sys
print(sys.path)

# Output on my computer:
['', '/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/dist-packages', '/usr/lib/python3/dist-packages']

The documentation for Python’s command line interface adds the following about running scripts from the command line. Specifically, when running python <script>.py, then…

Example Directory Structure


test/                     # root folder
├── packA/                # package packA
   ├── __init__.py
   ├── a1.py
   ├── a2.py
   └── subA/             # subpackage subA
       ├── __init__.py
       ├── sa1.py
       └── sa2.py
├── packB/                # package packB (implicit namespace package)
   ├── b1.py
   └── b2.py
├── math.py
├── random.py
├── other.py
└── start.py

Note that we do not place a __init__.py file in our root test/ folder.

Recap the order for modules to import:


  1. built-in modules from the Python Standard Library (e.g. sys, math)

  2. modules or packages in a directory specified by sys.path:

    • If the Python interpreter is run interactively, sys.path[0] is the empty string ''. This tells Python to search the current working directory from which you launched the interpreter, i.e., the output of pwd on Unix systems.
    • e.g. If we run a script with python <script>.py, sys.path[0] is the path to <script>.py.
    • directories in the PYTHONPATH environment variable
    • default sys.path locations, including remaining Python Standard Library modules which are not built-in

All about __init__.py


An __init__.py file has 2 functions.

  • convert a folder of scripts into an importable package of modules (before Python 3.3)

  • run package initialization code

Converting a folder of scripts into an importable package of modules

In order to import a module or package from a directory that is not in the same directory as the script we are writing (or the directory from which we run the Python interactive interpreter), that module needs to be in a package.

As defined above, any directory with a file named __init__.py is a Python package. This file can be empty. For example, when running Python 2.7, start.py can import the package packA but not packB because there is no __init__.py file in the test/packB/ directory.

For example, packB in our example is a namespace package because it doesn’t have a __init__.py file in the folder. If we start a Python 3.6 interactive interpreter in the test/ directory, then we get the following output:

>>> import packB
>>> packB
<module 'packB' (namespace)>

Sources:

Running package initialization code

The first time that a package or one of its modules is imported, Python will execute the __init__.py file in the root folder of the package if the file exists. All objects and functions defined in __init__.py are considered part of the package namespace.

Consider the following example.

test/packA/a1.py
def a1_func():
    print("running a1_func()")
test/packA/__init__.py
## this import makes a1_func directly accessible from packA.a1_func
from packA.a1 import a1_func

def packA_func():
    print("running packA_func()")
test/start.py
import packA  # "import packA.a1" will work just the same

packA.packA_func()
packA.a1_func()
packA.a1.a1_func()

output of running python start.py:

~] python start.py
running packA_func()
running a1_func()
running a1_func()

Using Objects from the Imported Module or Package


Example: start.py needs to import the helloWorld() function in sa1.py

  • Solution 1: from packA.subA.sa1 import helloWorld

    • we can call the function directly by name: x = helloWorld()
  • Solution 2: from packA.subA import sa1 or equivalently import packA.subA.sa1 as sa1

    • we have to prefix the function name with the name of the module: x = sa1.helloWorld()
    • This is sometimes preferred over Solution 1 in order to make it explicit that we are calling the helloWorld function from the sa1 module.
  • Solution 3: import packA.subA.sa1

    • we need to use the full path: x = packA.subA.sa1.helloWorld()

Use dir() to examine the contents of an imported module

After importing a module, use the dir() function to get a list of accessible names from the module. For example, suppose I import sa1. If sa1.py defines a helloWorld() function, then dir(sa1) would include helloWorld.

>>> from packA.subA import sa1
>>> dir(sa1)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'helloWorld']

Importing Packages

Importing a package is conceptually equivalent to importing the package’s __init__.py file as a module. Indeed, this is what Python treats the package as:

>>> import packA
>>> packA
<module 'packA' from 'packA\__init__.py'>

Only objects declared in the imported package’s __init__.py are accessible to the importer. For example, since packB lacks a __init__.py file, calling import packB (in Python 3.3+) has very little use because no objects in the packB package are made available. A subsequent call to packB.b1 would fail because it has not been imported yet.

Absolute vs. Relative Import


The Python documentation says the following about Python 3’s handling of relative imports:

The only acceptable syntax for relative imports is from .[module] import name. All import forms not starting with . are interpreted as absolute imports.

Source: What’s New in Python 3.0

For example, suppose we are running start.py which imports a1 which in turn imports other, a2, and sa1. Then the import statements in a1.py would look as follows:

absolute imports:

import other
import packA.a2
import packA.subA.sa1

explicit relative imports:

import other
from . import a2
from .subA import sa1

implicit relative imports (NOT SUPPORTED IN PYTHON 3):

import other
import a2
import subA.sa1

Sources:

Case Examples


Case 1: sys.path is known ahead of time

If you only ever call python start.py or python other.py, then it is very easy to set up the imports for all of the modules. In this case, sys.path will always include test/ in its search path. Therefore, all of the import statements can be written relative to the test/ folder.

Ex: a file in the test project needs to import the helloWorld() function in sa1.py

  • Solution: from packA.subA.sa1 import helloWorld (or any of the other equivalent import syntaxes demonstrated above)

Case 2: sys.path could change

Often, we want to be flexible in how we use a Python script, whether run directly on the command line or imported as a module into another script. As shown below, this is where we run into problems, especially on Python 3.

Example: Suppose start.py needs to import a2 which needs to import sa2. Assume that start.py is always run directly, never imported. We also want to be able to run a2 on its own.

Seems easy enough, right? After all, we just need 2 import statements total: 1 in start.py and another in a2.py.

Problem: This is clearly a case where sys.path changes. When we run start.py, sys.path contains test/. When we run a2.py, sys.path contains test/packA/.

The import statement in start.py is easy. Since start.py it is always run directly and never imported, we know that test/ will always be in sys.path when it is run. Then importing a2 is simply import packA.a2.

The import statement in a2.py is trickier. When we run start.py directly, sys.path contains test/, so a2.py should call from packA.subA import sa2. However, if we instead run a2.py directly, then sys.path contains test/packA/. Now the import would fail because packA is not a folder inside test/packA/.

Instead, we could try from subA import sa2. This corrects the problem when we run a2.py directly. But now we have a problem when we run start.py directly. Under Python 3, this fails because subA is not in sys.path. (This is OK in Python 2, thanks to its support for implicit relative imports.)

Let’s summarize our findings about the import statement in a2.py:

Run from packA.subA import sa2 from subA import sa2
start.py OK Py2 OK, Py3 fail (subA not in test/)
a2.py fail (packA not in test/packA/) OK

For completeness sake, I also tried using relative imports: from .subA import sa2. This matches the result of from packA.subA import sa2.

Solutions (Workarounds): I am unaware of a clean solution to this problem. Here are some workarounds:

  • Use absolute imports rooted at the test/ directory (i.e., middle column in the table above). This guarantees that running start.py directly will always work. In order to run a2.py directly, run it as an imported module instead of as a script:

    1. change directories to test/ in the console
    2. python -m packA.a2
  • Use absolute imports rooted at the test/ directory (i.e., middle column in the table above). This guarantees that running start.py directly will always work. In order to run a2.py directly, we can modify sys.path in a2.py to include test/, before sa2 is imported.

import os, sys
sys.path.append(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))

# now this works, even when a2.py is run directly
from packA.subA import sa2

NOTE: This method usually works. However, under some Python installations, the __file__ variable might not be correct. In this case, we would need to use the Python built-in inspect package. See this StackOverflow answer for instructions.

  • Only use Python 2, and use implicit relative imports (i.e., the right column in the table above).

  • Use absolute imports rooted at the test/ directory, and add test/ to the PYTHONPATH environment variable.

    1. This solution is not portable, so I recommend against it.
    2. instructions here: Permanently add a directory to PYTHONPATH

Case 3: Importing from Parent Directory

If we do not modify PYTHONPATH and avoid modifying sys.path programmatically, then the following is a major limitation of Python imports:

For example, if I were to run python sa1.py, then it is impossible for sa1.py to import anything from a1.py without resorting to a PYTHONPATH or sys.path workaround.

At first, it may seem that relative imports (e.g. from .. import a1) could work around this limitation. However, the script that is being run (in this case sa1.py) is considered the "top-level module". Attempting to import anything from a folder above this script results in this error: ValueError: attempted relative import beyond top-level package.

My approach is to avoid writing scripts that have to import from the parent directory. In cases where this must happen, the preferred workaround is to modify sys.path.

Python 2 vs. Python 3


The most important differences between how Python 2 and Python 3 treat import statements have been documented above. They are re-stated again here, along with some other less important differences.

  • Python 2 supports implicit relative imports. Python 3 does not.

  • Python 2 requires __init__.py files inside a folder in order for the folder to be considered a package and made importable. In Python 3.3 and above, thanks to its support of implicit namespace packages, all folders are packages regardless of the presence of a __init__.py file.

  • In Python 2, one could write from <module> import * within a function. In Python 3, the from <module> import * syntax is only allowed at the module level, no longer inside functions.

Sources:

Miscellaneous topics and readings not covered here, but worth exploring


  • using __all__ variable in __init__.py for specifying what gets imported by from <module> import *

  • documentation for Python 2 and 3

  • using if __name__ == '__main__' to check if a script is imported or run directly

  • documentation for Python 2 and 3

  • installing a project as a package (in developer mode) with pip install -e <project> to add the project root directory to sys.path

  • How to run tests without installing package?

  • from <module> import * does not import names from <module> that begin with an underscore _

  • documentation for Python 2 and 3