How Validate Object Attributes in Python

2021-12-06

Python Development

Generally speaking, type checking and value checking are handled by Python in a flexible and implicit way. Python has introduced typing module since Python3 which provides runtime support for type hints. But for value checking, there is no unified way to validate values due to its many possibilities.

One of the scenarios where we need value checking is when we initialize a class instance. We want to ensure valid input attributes in the first stage, for example, an email address should have the correct format xxx@xxxxx.com, an age should not be negative, the surname should not exceed 20 characters, etc.

In this article, I want to demonstrate 7 out of many options to validate class attributes using either Python built-in modules or third-party libraries. I’m curious which option you prefer, please tell me in the comments. If you know other good options, you are welcome to share as well.

Create validation functions

We start with the most straightforward solution: creating a validation function for each requirement. Here we have 3 methods to validate name, email, and age individually. The attributes are validated in sequence, any failed validation will immediately throw a ValueError exception and stop the program.

import re

class Citizen:
    def __init__(self, id, name, email, age):
        self.id = id
        self.name = self._is_valid_name(name)
        self.email = self._is_valid_email(email)
        self.age = self._is_valid_age(age)

    def _is_valid_name(self, name):
        if len(name) > 20:
            raise ValueError("Name cannot exceed 20 characters.")
        return name

    def _is_valid_email(self, email):
        regex = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$"
        if not re.match(regex, email):
            raise ValueError("It's not an email address.")
        return email

    def _is_valid_age(self, age):
        if age < 0:
            raise ValueError("Age cannot be negative.")
        return age


citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)

citizen_err = Citizen("id1", "John Smith1234567890123456789", "john_smith@gmail.com", 27)
# ValueError: Name cannot exceed 20 characters.

citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.c", 27)
# ValueError: It's not an email address.

citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.com", -27)
# ValueError: Age cannot be negative.

This option is simple, but on the other hand, it's probably not the most Pythonic solution you've ever seen and many people prefer to have a clean __init__ as much as possible.

This method allows attribute update with invalid value after the initialization.

citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)
citizen_ok.email = "john_smith@gmail.c"
citizen_ok.email
'john_smith@gmail.c'

# This email is not valid, but still accepted by the code

Python @property

The second option uses a built-in function: @property . It works as a decorator that is added to an attribute. According to Python documentation:

@property

A property object has getter, setter, and deleter methods usable as decorators that create a copy of the property with the corresponding accessor function set to the decorated function.

At the first glance, it creates more code than the first option, but on the other hand, it relieves the responsibility of __init__. Each attribute has 2 methods (except for id), one with @property, the other one with setter. Whenever an attribute is retrieved like citizen.name, the method with @property is called. When an attribute value is set during initialization or updating like citizen.name="John Smith", the method with setter is called.

import re

class Citizen:
    def __init__(self, id, name, email, age):
        self._id = id
        self.name = name
        self.email = email
        self.age = age

    @property
    def id(self):
        return self._id

    @property
    def name(self):
        return self._name

    @name.setter
    def name(self, value):
        if len(value) > 20:
            raise ValueError("Name cannot exceed 20 characters.")
        self._name = value

    @property
    def email(self):
        return self._email

    @email.setter
    def email(self, value):
        regex = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$"
        if not re.match(regex, value):
            raise ValueError("It's not an email address.")
        self._email = value

    @property
    def age(self):
        return self._age

    @age.setter
    def age(self, value):
        if value < 0:
            raise ValueError("Age cannot be negative.")
        self._age = value



citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)

citizen_err = Citizen("id1", "John Smith1234567890123456789", "john_smith@gmail.com", 27)
# ValueError: Name cannot exceed 20 characters.

citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.c", 27)
# ValueError: It's not an email address.

citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.com", -27)
# ValueError: Age cannot be negative.

This option moves validation logic to the setter method of each attribute and therefore keeps __init__ very clean. Besides, the validation also applies to every update of each attribute after initialization. So the code in the previous example is not accepted anymore.

citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)

citizen_ok.email = "john_smith@gmail.c"
# ValueError: It's not an email address.

citizen_ok.email
'john_smith@gmail.com'

Attribute id is an exception here because it doesn’t have a setter method. This is because I want to tell the client that this attribute is not supposed to be updated after initialization. If you try to do that, you will get an AttributeError exception.

citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)
citizen_ok.id
'id1'

citizen_ok.id = 'id2'
# AttributeError: can't set attribute

@property decorator does not allow an attribute with an invalid value to be updated after initialization.

citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)

citizen_ok.age = -4
# Traceback (most recent call last):
# ValueError: Age cannot be negative.

Use Python Descriptors

The third option makes use of Python Descriptors which is a powerful but often overlooked feature. Maybe the community has realized this problem, since Python3.9, examples of using descriptors to validate attributes have been added to the documentation.

A descriptor is an object which defines the methods __get__() , __set__() or __delete__() . It changes the default behaviour of retrieving, setting, or deleting attributes.

Here is the code using descriptors. Every attribute becomes a descriptor which is a class with methods __get__ and __set__. When the attribute value is set like self.name=name, then __set__ is called. When the attribute is retrieved like print(self.name), then __get__ is called.

import re


class Name:
    def __get__(self, obj):
        return self.value

    def __set__(self, obj, value):
        if len(value) > 20:
            raise ValueError("Name cannot exceed 20 characters.")
        self.value = value


class Email:
    def __get__(self, obj):
        return self.value

    def __set__(self, obj, value):
        regex = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$"
        if not re.match(regex, value):
            raise ValueError("It's not an email address.")
        self.value = value


class Age:
    def __get__(self, obj):
        return self.value

    def __set__(self, obj, value):
        if value < 0:
            raise ValueError("Age cannot be negative.")
        self.value = value


class Citizen:

    name = Name()
    email = Email()
    age = Age()

    def __init__(self, id, name, email, age):
        self.id = id
        self.name = name
        self.email = email
        self.age = age



citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)

citizen_err = Citizen("id1", "John Smith1234567890123456789", "john_smith@gmail.com", 27)
# ValueError: Name cannot exceed 20 characters.

citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.c", 27)
# ValueError: It's not an email address.

citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.com", -27)
# ValueError: Age cannot be negative.

This solution is comparable to @property . It works better when the descriptors can be reused in multiple classes. For example, in the class of Employee, we can simply reuse previous descriptors without creating many boilerplate code.

class Salary:
    def __get__(self, obj):
        self.value

    def __set__(self, obj, value):
        if value < 1000:
            raise ValueError("Salary cannot be lower than 1000.")
        self.value = value
        
class Employee:
    name = Name()
    email = Email()
    age = Age()
    salary = Salary()

    def __init__(self, id, name, email, age, salary):
        self.id = id
        self.name = name
        self.email = email
        self.age = age
        self.salary = salary
        
emp = Employee("id1", "John Smith", "john_smith@gmail.com", 27, 1000)

emp = Employee("id1", "John Smith", "john_smith@gmail.com", 27, 900)
# ValueError: Salary cannot be lower than 1000.

Python Descriptors does not allow an attribute with an invalid value to be updated after initialization.

citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)

citizen_ok.age = -4
# ValueError: Age cannot be negative.

Combine Decorator and Descriptor

A variant of option3 is to combine decorator and descriptor. The end result looks like the following where the rules are encapsulated in those decorators.

def email(attr):
    def decorator(cls):
        setattr(cls, attr, Email())
        return cls
    return decorator

def age(attr):
    def decorator(cls):
        setattr(cls, attr, Age())
        return cls
    return decorator

def name(attr):
    def decorator(cls):
        setattr(cls, attr, Name())
        return cls
    return decorator

@email("email")
@age("age")
@name("name")
class Citizen:
    def __init__(self, id, name, email, age):
        self.id = id
        self.name = name
        self.email = email
        self.age = age

These decorators can be extended quite easily. For example, you can have more generic rules with multiple attributes applied such as @positive_number(attr1,attr2).

Until now, we have gone through 4 options using only built-in functions. In my opinion, Python built-in functions are already powerful enough to cover what we often need for data validation. But let’s also look around and see some third-party libraries.

Object Validation in Python @dataclass

Another way to create a class in Python is using @dataclass . Dataclass provides a decorator for automatically generating __init__() method.

Besides, @dataclass also introduces a special method called __post_init__() , which is invoked from the hidden __init__(). __post_init__ is the place to initialize a field based on other fields or include validation rules.

from dataclasses import dataclass
import re

@dataclass
class Citizen:

    id: str
    name: str
    email: str
    age: int

    def __post_init__(self):
        if self.age < 0:
            raise ValueError("Age cannot be negative.")
        regex = "^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$"
        if not re.match(regex, self.email):
            raise ValueError("It's not an email address.")
        if len(self.name) > 20:
            raise ValueError("Name cannot exceed 20 characters.")



citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)

citizen_err = Citizen("id1", "John Smith1234567890123456789", "john_smith@gmail.com", 27)
# ValueError: Name cannot exceed 20 characters.

citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.c", 27)
# ValueError: It's not an email address.

citizen_err = Citizen("id1", "John Smith", "john_smith@gmail.com", -27)
# ValueError: Age cannot be negative.

This option has the same effect as option 1, but using @dataclass style. If you prefer using @dataclass rather than the traditional class, then this option could be something for you.

Dataclass allows attribute update with invalid value after the initialization.

citizen_ok = Citizen("id1", "John Smith", "john_smith@gmail.com", 27)

citizen_ok.age = -4
citizen_ok
# Citizen(id='id1', name='John Smith', email='john_smith@gmail.com', age=-4)

Use the third-party library — Pydantic

Pydantic is a library similar to Marshmallow. It also follows the idea of creating a schema or model for the object and meanwhile provides many pre-cooked validation classes like PositiveInt , EmailStr , etc. Compared to Marshmallow , Pydantic integrates validation rules into the object class rather than creating a separate schema class.

Here is how we can achieve the same goal using Pydantic. ValidationError stores all 3 errors found in the object.

import re
from datetime import datetime
from pydantic import BaseModel, ValidationError, validator, PositiveInt, EmailStr


class HomeAddress(BaseModel):
    postcode: str
    city: str
    country: str

    class Config:
        anystr_strip_whitespace = True

    @validator('postcode')
    def dutch_postcode(cls, v):
        if not re.match("^\d{4}\s?\w{2}$", v):
            raise ValueError("must follow regex ^\d{4}\s?\w{2}$")
        return v


class Citizen(BaseModel):
    id: str
    name: str
    birthday: str
    email: EmailStr
    age: PositiveInt
    address: HomeAddress

    @validator('birthday')
    def valid_date(cls, v):
        try:
            datetime.strptime(v, "%Y-%m-%d")
            return v
        except ValueError:
            raise ValueError("date must be in YYYY-MM-DD format.")


try:
    citizen = Citizen(
        id="1234",
        name="john_smith_1234567889901234567890",
        birthday="1998-01-32",
        email="john_smith@gmail.",
        age=0,
        address=HomeAddress(
            postcode="1095AB", city=" Amsterdam", country="NL"
        ),
    )
    print(citizen)
except ValidationError as e:
    print(e)
 
   
# 3 validation errors for Citizen
# birthday
#   date must be in YYYY-MM-DD format. (type=value_error)
# email
#   value is not a valid email address (type=value_error.email)
# age
#   ensure this value is greater than 0 (type=value_error.number.not_gt; limit_value=0)

Actually, Pydantic could do much more than that. It could also export a schema via schema_json method.

print(Citizen.schema_json(indent=2))

{
  "title": "Citizen",
  "type": "object",
  "properties": {
    "id": {
      "title": "Id",
      "type": "string"
    },
    "name": {
      "title": "Name",
      "type": "string"
    },
    "birthday": {
      "title": "Birthday",
      "type": "string"
    },
    "email": {
      "title": "Email",
      "type": "string",
      "format": "email"
    },
    "age": {
      "title": "Age",
      "exclusiveMinimum": 0,
      "type": "integer"
    },
    "address": {
      "$ref": "#/definitions/HomeAddress"
    }
  },
  "required": [
    "id",
    "name",
    "birthday",
    "email",
    "age",
    "address"
  ],
  "definitions": {
    "HomeAddress": {
      "title": "HomeAddress",
      "type": "object",
      "properties": {
        "postcode": {
          "title": "Postcode",
          "type": "string"
        },
        "city": {
          "title": "City",
          "type": "string"
        },
        "country": {
          "title": "Country",
          "type": "string"
        }
      },
      "required": [
        "postcode",
        "city",
        "country"
      ]
    }
  }
}

The schema is compliant with JSON Schema Core , JSON Schema Validation and OpenAPI .

Pydantic allows attribute update with invalid value after the initialization.

citizen = Citizen(
    id="1",
    name="John Smith",
    birthday="1990-01-01",
    email="john_smith@gmail.com",
    age=28,
    address=HomeAddress(
        postcode="109505", city=" Amsterdam", country="NL"
    ),
)

citizen
# Citizen(id='1', name='John Smith', birthday='1990-01-01', email='john_smith@gmail.com', age=28, address=HomeAddress(postcode='109505', city='Amsterdam', country='NL'))

citizen.age = -4
citizen
# Citizen(id='1', name='John Smith', birthday='1990-01-01', email='john_smith@gmail.com', age=-4, address=HomeAddress(postcode='109505', city='Amsterdam', country='NL'))