How to Create Fake Data with Faker

How to Create Fake Data with Faker

Faker is an open-source python library that allows you to create your own dataset i.e you can generate random data with random attributes like name, age, location, etc. It supports all major locations and languages which is beneficial for generating data based on locality.

Let’s say you want to create data with certain data types (bool, float, text, integers) with special characteristics (names, address, color, email, phone number, location) to test some Python libraries or specific implementation. But it takes time to find that specific kind of data. You wonder: is there a quick way that you can create your own data?

This can be done with Faker, a Python package that generates fake data for you, ranging from a specific data type to specific characteristics of that data, and the origin or language of the data. Let’s discover how we can use Faker to create fake data.

Basics of Faker

Start with installing the package

pip install Faker

Some basic methods of Faker:

>>> from faker import Faker

>>> fake = Faker()

>>> fake = Faker()
>>> fake.color_name()

'Kyle Johnson'

>>> fake.address()
'0891 Chloe Manors Apt. 227\nSavagechester, MI 27550'

>>> fake.date_of_birth(minimum_age=25), 9, 16)

>>> fake.job()
'Call centre manager'

'Lake Jim'

But what if I need the Information to be Specific to one Location?

Luckily, we can also specify the location of the data we want to fake. Maybe the character you want to create is from Italy. You also want to create instances of her friends. Since you are from the US, it is difficult for you to generate relevant information to that location. That can be easily taken care of by adding location parameter in the class Faker.

>>> fake = Faker('it_IT')
>>> for _ in range(10):
...     print(
Gianmarco Falloppio
Goffredo Toscani
Dott. Filippa Musatti
Adelasia Pontecorvo 
Eleanora Giannotti-Solari
Dina Tremonti
Dott. Gastone Poerio 
Flavia Moschino
Pompeo Guglielmi
Rosa Cafarchia 

Or create information from multiple locations:

>>> fake = Faker(['it_IT', 'en_US', 'es_ES'])
>>> for _ in range(10):
...     print(
Lake Mary
Quarto Fernanda
Filippini umbro
Falier sardo
East Gwendolyn

Create Random Text

We can create random text with:

>>> fake.text()
'Sport southern with per support mouth. Girl real resource product. Character make record think rich charge could. Computer special employee allow body director action.\nBoy like behind environmental.'

Try with the Japanese language:

>>> fake = Faker('ja')
>>> fake.text()       

Create Text from Selected Words

We can also create sentences by using our own defined word library which contains words of our choice and the faker will generate fake sentences using those words.

>>> from faker import Faker
>>> fake = Faker()    
>>> my_words = ['My', 'dog', 'is', '3', 'years', 'old', 'and', 'his', 'name', 'is', 'Jessie']
>>> fake.sentence(ext_word_list=my_words)
'Is years name My.'
>>> fake.sentence(ext_word_list=my_words)
'Name old his My 3.'

Create a Quick Profile Data

We can quickly create a profile with:

>>> fake = Faker()
>>> fake.profile()
{'job': 'Learning mentor', 'company': 'Alvarez, Scott and Martinez', 'ssn': '006-95-9713', 'residence': '579 Joshua Glens Suite 372\nLeahland, IA 88987', 'current_location': (Decimal('56.059606'), Decimal('177.275739')), 'blood_group': 'O+', 'website': [''], 'username': 'joseph44', 'name': 'Richard Brady', 'sex': 'M', 'address': '646 Dawson Common Apt. 159\nPort Rachel, VT 57481', 'mail': '', 'birthdate':, 4, 9)}

or with specific fields:

>>> fake.profile(fields=['name', 'job', 'mail'])
{'job': 'Hospital doctor', 'name': 'Alexander Keller', 'mail': ''}

Create a fake dataset using faker

Now we will use the faker object functions and generate a dataset that contains profiles of 100 unique people that are fake. Email for people is in ascii chars, so we need install unidecode package pip install unidecode For this, we will also use pandas to store these profiles into a data frame.

from faker import Faker
import pandas as pd
import unidecode

class User(object):
    f = Faker()
    def __init__(self):
        self.first_name    = User.f.first_name()
        self.last_name     = User.f.last_name()          = "{} {}".format(self.first_name, self.last_name)
        self.age           = User.f.pyint(min_value=18, max_value=65)
        self.private_email = unidecode.unidecode("{}.{}@{}".format(self.first_name, self.last_name, User.f.free_email_domain()).lower())

data = [User().__dict__ for i in range(100)]
df = pd.DataFrame(data)


  first_name last_name               name  age                private_email
0       Eric     James         Eric James   53
1   Michelle   Gardner   Michelle Gardner   47
2      Julie    Oliver       Julie Oliver   53
3     Olivia  Delacruz    Olivia Delacruz   62
4   Kimberly  Williams  Kimberly Williams   57

Instead of x.__dict__, it’s actually more pythonic to use builtin vars(x) function. So we can use:
data = [vars(User()) for i in range(100)]