How to Create Fake Data with Faker

How to Create Fake Data with Faker

Faker is an open-source python library that allows you to create your own dataset i.e you can generate random data with random attributes like name, age, location, etc. It supports all major locations and languages which is beneficial for generating data based on locality.

Let’s say you want to create data with certain data types (bool, float, text, integers) with special characteristics (names, address, color, email, phone number, location) to test some Python libraries or specific implementation. But it takes time to find that specific kind of data. You wonder: is there a quick way that you can create your own data?

This can be done with Faker, a Python package that generates fake data for you, ranging from a specific data type to specific characteristics of that data, and the origin or language of the data. Let’s discover how we can use Faker to create fake data.

Basics of Faker


Start with installing the package

pip install Faker

Some basic methods of Faker:

>>> from faker import Faker

>>> fake = Faker()

>>> fake = Faker()
>>> fake.color_name()
'Red'

>>> fake.name()
'Kyle Johnson'

>>> fake.address()
'0891 Chloe Manors Apt. 227\nSavagechester, MI 27550'

>>> fake.date_of_birth(minimum_age=25)
datetime.date(1951, 9, 16)

>>> fake.job()
'Call centre manager'

>>> fake.city()
'Lake Jim'

But what if I need the Information to be Specific to one Location?


Luckily, we can also specify the location of the data we want to fake. Maybe the character you want to create is from Italy. You also want to create instances of her friends. Since you are from the US, it is difficult for you to generate relevant information to that location. That can be easily taken care of by adding location parameter in the class Faker.

>>> fake = Faker('it_IT')
>>> for _ in range(10):
...     print(fake.name())
... 
Gianmarco Falloppio
Goffredo Toscani
Dott. Filippa Musatti
Adelasia Pontecorvo 
Eleanora Giannotti-Solari
Dina Tremonti
Dott. Gastone Poerio 
Flavia Moschino
Pompeo Guglielmi
Rosa Cafarchia 

Or create information from multiple locations:

>>> fake = Faker(['it_IT', 'en_US', 'es_ES'])
>>> for _ in range(10):
...     print(fake.city())
... 
Lake Mary
Quarto Fernanda
Filippini umbro
Navarra
Cassandramouth
Cáceres
Falier sardo
Robertfort
León
East Gwendolyn

Create Random Text


We can create random text with:

>>> fake.text()
'Sport southern with per support mouth. Girl real resource product. Character make record think rich charge could. Computer special employee allow body director action.\nBoy like behind environmental.'

Try with the Japanese language:

>>> fake = Faker('ja')
>>> fake.text()       
'賞賛する月花嫁タワー協力犯罪者器官。ヒット索引今緩む意図。\n明らかにする教会保持する販売装置バーゲンリフト大統領。トス運リハビリ。楽しんで主人ささやき鉱山。\n参加する編組リンク追放する。パーセント教授意図合計。\n残る本質的な柔らかいトス賞賛するコミュニティ。持ってるバナーブランチないストレージ必要。\n創傷野球は埋め込む緩む主人。あなた自身スペルじぶんの合計。尊敬するピック器官職人ささやき催眠術。'

Create Text from Selected Words


We can also create sentences by using our own defined word library which contains words of our choice and the faker will generate fake sentences using those words.

>>> from faker import Faker
>>> fake = Faker()    
>>> my_words = ['My', 'dog', 'is', '3', 'years', 'old', 'and', 'his', 'name', 'is', 'Jessie']
>>> fake.sentence(ext_word_list=my_words)
'Is years name My.'
>>> fake.sentence(ext_word_list=my_words)
'Name old his My 3.'

Create a Quick Profile Data


We can quickly create a profile with:

>>> fake = Faker()
>>> fake.profile()
{'job': 'Learning mentor', 'company': 'Alvarez, Scott and Martinez', 'ssn': '006-95-9713', 'residence': '579 Joshua Glens Suite 372\nLeahland, IA 88987', 'current_location': (Decimal('56.059606'), Decimal('177.275739')), 'blood_group': 'O+', 'website': ['http://www.solis-weiss.org/'], 'username': 'joseph44', 'name': 'Richard Brady', 'sex': 'M', 'address': '646 Dawson Common Apt. 159\nPort Rachel, VT 57481', 'mail': 'harrisonmaria@hotmail.com', 'birthdate': datetime.date(1960, 4, 9)}

or with specific fields:

>>> fake.profile(fields=['name', 'job', 'mail'])
{'job': 'Hospital doctor', 'name': 'Alexander Keller', 'mail': 'nvasquez@gmail.com'}

Create a fake dataset using faker


Now we will use the faker object functions and generate a dataset that contains profiles of 100 unique people that are fake. Email for people is in ascii chars, so we need install unidecode package pip install unidecode For this, we will also use pandas to store these profiles into a data frame.

from faker import Faker
import pandas as pd
import unidecode

class User(object):
    f = Faker()
    def __init__(self):
        self.first_name    = User.f.first_name()
        self.last_name     = User.f.last_name()
        self.name          = "{} {}".format(self.first_name, self.last_name)
        self.age           = User.f.pyint(min_value=18, max_value=65)
        self.private_email = unidecode.unidecode("{}.{}@{}".format(self.first_name, self.last_name, User.f.free_email_domain()).lower())


data = [User().__dict__ for i in range(100)]
df = pd.DataFrame(data)

df.head()

# OUTPUT:
  first_name last_name               name  age                private_email
0       Eric     James         Eric James   53         eric.james@yahoo.com
1   Michelle   Gardner   Michelle Gardner   47   michelle.gardner@yahoo.com
2      Julie    Oliver       Julie Oliver   53     julie.oliver@hotmail.com
3     Olivia  Delacruz    Olivia Delacruz   62    olivia.delacruz@gmail.com
4   Kimberly  Williams  Kimberly Williams   57  kimberly.williams@gmail.com

Instead of x.__dict__, it’s actually more pythonic to use builtin vars(x) function. So we can use:
data = [vars(User()) for i in range(100)]