Building Python Real World Application - part 01 - Interactive Dictionary

Building Python Real World Application - part 01 - Interactive Dictionary

The first application we are going to build is a dictionary. An interactive dictionary. Now, what the dictionary will do? It will retrieve the definition for the word which user has entered, that’s what dictionaries do, right? In addition to that, if the user has made a typo while entering a word, our program will suggest the closest word saying ‘did you mean this instead?’, and if the word has more than one definition, retrieve all of them. Doesn’t seem that easy now, huh? Let’s find out.

Step 1 — The Data


To know how the dictionary will work, we must understand what data it will use to perform those actions. So, let’s get to it. The Data here is in JSON format. If you already know what JSON is, feel free to skip through the next several lines. However, on the other hand, if you have heard the word JSON for the first time or you need a little refresher, I’ve got you covered. I encourage you to look at the data which we are going to use for a better understanding of the JSON format. You will find the data here .

JSON, or JavaScript Object Notation, is a minimal, readable (for both — humans as well as computers)format for structuring data. It mainly has two things, a key and a value associated with the key. Let’s take example from our data only, below is a key/value pair. The key is "abandoned industrial site", the word and the value is "Site that cannot be used for any purpose, being contaminated by pollutants", the definition. To understand more about JSON, refer to this article .

"abandoned industrial site": ["Site that cannot be used for any purpose, being contaminated by pollutants."]

Let us start with the code now. First, we import the JSON library, followed by that we use the load method of that library to load our data which is in .json format. The important thing here is that we load data in .json format but it will be stored in "data" variable as python ‘dictionary’. If you are not aware of python dictionaries, think of it as a storage. It is almost the same as a JSON format and it follows the same key-value ritual.

#Import library
import json

#Loading the json data as python dictionary
#Try typing "type(data)" in terminal after executing first two line of this snippet
data = json.load(open("data.json"))

#Function for retriving definition
def retrive_definition(word):
    return data[word]

#Input from user
word_user = input("Enter a word: ")

#Retrive the definition using function and print the result
print(retrive_definition(word_user))

Output

user1@srv1:~$ python3 interactive_dictionary_01.py 
Enter a word: rain
['Precipitation in the form of liquid water drops with diameters greater than 0.5 millimetres.', 'To fall from the clouds in drops of water.']
user1@srv1:~$ python3 interactive_dictionary_01.py 
Enter a word: dictionary
['A reference book containing an explanatory alphabetical list of words, identifying usually, the phonetic, grammatical, and semantic value of each word, often with etymology, citations, and usage guidance and other information.']

Step 2 — Check for non-existing words


Usage of the basic if-else statement will help you check for the non-existing words. If the word is not present in the data, simply let the user know. In our case, it will print "The word doesn’t exist, please double check it".

#Import library
import json

#Loading the json data as python dictionary
#Try typing "type(data)" in terminal after executing first two line of this snippet
data = json.load(open("dictionary.json"))

#Function for retriving definition
def retrive_definition(word):
    #Check for non existing words
    if word in data:
        return data[word]
    else:
        return ("The word doesn't exist, please double check it.")

#Input from user
word_user = input("Enter a word: ")

#Retrive the definition using function and print the result
print(retrive_definition(word_user))

Output

user1@srv1:~$ python3 interactive_dictionary_02.py 
Enter a word: haha
The word doesn't exist, please double check it.
user1@srv1:~$ python3 interactive_dictionary_02.py 
Enter a word: hoho
The word doesn't exist, please double check it.
user1@srv1:~$ python3 interactive_dictionary_02.py 
Enter a word: rain
['Precipitation in the form of liquid water drops with diameters greater than 0.5 millimetres.', 'To fall from the clouds in drops of water.']

Step 3 — The case-sensitivity


Each user has his or her own way of writing. While some may write in all lower-case, some of them might want to write the same word in title-case, our objective is that the output to the word must remain the same. For example 'Rain' and 'rain' will give the same output. In order to do that, we are going to convert the word which the user has entered to all lower case because our data has the same format. We will do that with Python’s own lower() method.

  • Case 1 — To make sure the program returns the definition of words that start with a capital letter (e.g. Delhi, Texas) we will also check for the title case letters in else-if condition.

  • Case 2 — To make sure the program returns the definition of acronyms (e.g. USA, NATO) we will check for the uppercase as well.

#Import library
import json

#Loading the json data as python dictionary
#Try typing "type(data)" in terminal after executing first two line of this snippet
data = json.load(open("dictionary.json"))

#Function for retriving definition
def retrive_definition(word):
    #Removing the case-sensitivity from the program
    #For example 'Rain' and 'rain' will give same output
    #Converting all letters to lower because out data is in that format
    word = word.lower()

    #Check for non existing words
    #1st elif: To make sure the program return the definition of words that start with a capital letter (e.g. Delhi, Texas)
    #2nd elif: To make sure the program return the definition of acronyms (e.g. USA, NATO)
    if word in data:
        return data[word]
    elif word.title() in data:
        return data[word.title()]
    elif word.upper() in data:
        return data[word.upper()]

#Input from user
word_user = input("Enter a word: ")

#Retrive the definition using function and print the result
print(retrive_definition(word_user))

Output:

user1@srv1:~$ python3 interactive_dictionary_03.py  
Enter a word: Texas
['The 28th state of the United States of America, located in the southern US.']
user1@srv1:~$ python3 interactive_dictionary_03.py 
Enter a word: NATO
['An international organization created in 1949 by the North Atlantic Treaty for purposes of collective security.']

Your dictionary is ready to serve its basic purpose, retrieve the definition. But let us take it to a step further. Do you find it cool when Google suggest you the correct word while you make a typing error in the search bar?

google incorrect word google incorrect word

What if we can do the same with our dictionary? Cool, right? Before doing that it Step 5, let’s understand the mechanism behind it in Step 4.

Step 4 — Closest Match


Now, to if the user has made a typo while entering a word, you might want to suggest the closest word and ask them if they want the meaning of this word instead. We can do that with Python’s library difflib. There are two methods to do that, we will understand both and then use the effective one.

Method 1 — Sequence Matcher


Let’s understand this method. First, we will import the library and fetch method from it. The SequenceMatcher() function takes total 3 parameters.

  • First one is junk, which means if the word has white spaces or blank lines, in our case that’s none.

  • Second and third parameters are the words between which you want to find similarities.

And appended ratio method will give you the result in the number.

#Import library
import json
# This is a python library for 'Text Processing Serveices', as the offcial site suggests.
import difflib
from difflib import SequenceMatcher

#Let's load the same data again
data = json.load(open("dictionary.json"))

#Run a Sequence Matcher
#First parameter is 'Junk' which includes white spaces, blank lines and so onself.
#Second and third parameters are the words you want to find similarities in-between.
#Ratio is used to find how close those two words are in numerical terms
value = SequenceMatcher(None, "rainn", "rain").ratio()

#Print out the value
print(value)

Output:

user1@srv1:~$ python3 interactive_dictionary_04-1.py 
0.8888888888888888

As you can see, the similarity between the word "rainn" and "rain" is 0.89, which is 89%. This is one way to do it. However; there is another method in the same library which fetches close match to the word directly, no numbers involved.

Method 2 — Get Close Matches


The method works as follows:

  • the first parameter is, of course, the word for which you want to find close matches.
  • The second parameter is a list of words to match against.
  • The third one indicates how many matches do you want as an output.
  • And the last one is cut off. Do you remember we got a number in the previous method? 0.89? Cutoff uses that number to know when to stop considering a word as a close match (0.99 being the closest to the word). You can set that according to your criteria.
#Import library
import json
# This is a python library for 'Text Processing Serveices', as the offcial site suggests.
import difflib
from difflib import get_close_matches

#Let's load the same data again
data = json.load(open("dictionary.json"))

#Before you dive in, the basic template of this function is as follows
#get_close_matches(word, posibilities, n=3, cutoff=0.66)
#First parameter is of course the word for which you want to find close matches
#Second is a list of sequences against which to match the word
#[optional]Third is maximum number of close matches
#[optional]where to stop considering a word as a match (0.99 being the closest to word while 0.0 being otherwise)

output = get_close_matches("rain", ["help","mate","rainy"], n=1, cutoff = 0.75)

# Print out output, any guesses?
print(output)

Output:

user1@srv1:~$ python3 interactive_dictionary_04-2.py 
['rainy']

I don’t think I need to explain the output, right? The closest word of all three is rainy, hence ['rainy']. If you have come this far, appreciate yourself because you’ve learned the hard part. Now, we just have to insert this into our code to get the output.

Step 5 — Did you mean this instead?


For ease of readability, I’ve just added if-else part of the code. You are familiar with the first two else-if statements, let’s understand the third one. It is first checking for the length of the close matches it got because we can print only if the word has 1 or more close matches. Get close matches function takes the word the user has entered as the first parameter and our whole data set to match against that word. Here, the key is the words in our data and value is their definition, as we learned it earlier. The [0] in return statement indicates the first close match of all matches.

# Import library
import json
# This is a python library for 'Text Processing Serveices', as the offcial site suggests.
import difflib
from difflib import get_close_matches


# Loading the json data as python dictionary
# Try typing "type(data)" in terminal after executing first two line of this snippet
data = json.load(open("dictionary.json"))

# Function for retriving definition
def retrive_definition(word):
    #Removing the case-sensitivity from the program
    #For example 'Rain' and 'rain' will give same output
    #Converting all letters to lower because out data is in that format
    word = word.lower()

    # Check for non existing words
    # 1st elif: To make sure the program return the definition of words that start with a capital letter (e.g. Delhi, Texas)
    # 2nd elif: To make sure the program return the definition of acronyms (e.g. USA, NATO)
    if word in data:
        return data[word]
    elif word.title() in data:
        return data[word.title()]
    elif word.upper() in data:
        return data[word.upper()]
    # 3rd elif: To find a similar word
    # -- len > 0 because we can print only when the word has 1 or more close matches
    # -- In the return statement, the last [0] represents the first element from the list of close matches
    elif len(get_close_matches(word, data.keys())) > 0:
        return ("Did you mean %s instead?" % get_close_matches(word, data.keys())[0])

# Input from user
word_user = input("Enter a word: ")

# Retrive the definition using function and print the result
print(retrive_definition(word_user))

Output:

user1@srv1:~$ python3 interactive_dictionary_05.py 
Enter a word: rainn
Did you mean rain instead?

Yes, yes, that’s what I meant. So, what now? you can’t just leave that with a question. If that’s what user meant, you must retrieve the definition of that word. That’s what we are going to do in the next step.

Step 6 — Retrieving the definition


One more input from the user and one more layer of if-else and there it is. The definition of the suggested word.

#Import library
import json
from difflib import get_close_matches

#Loading the json data as python dictionary
#Try typing "type(data)" in terminal after executing first two line of this snippet
data = json.load(open("data.json"))

#Function for retriving definition
def retrive_definition(word):
    #Removing the case-sensitivity from the program
    #For example 'Rain' and 'rain' will give same output
    #Converting all letters to lower because out data is in that format
    word = word.lower()

    #Check for non existing words
    #1st elif: To make sure the program return the definition of words that start with a capital letter (e.g. Delhi, Texas)
    #2nd elif: To make sure the program return the definition of acronyms (e.g. USA, NATO)
    #3rd elif: To find a similar word
    #-- len > 0 because we can print only when the word has 1 or more close matches
    #-- In the return statement, the last [0] represents the first element from the list of close matches
    if word in data:
        return data[word]
    elif word.title() in data:
        return data[word.title()]
    elif word.upper() in data:
        return data[word.upper()]
    elif len(get_close_matches(word, data.keys())) > 0:
        action = input("Did you mean %s instead? [y or n]: " % get_close_matches(word, data.keys())[0])
        #-- If the answers is yes, retrive definition of suggested word
        if (action == "y"):
            return data[get_close_matches(word, data.keys())[0]]
        elif (action == "n"):
            return ("The word doesn't exist, yet.")
        else:
            return ("We don't understand your entry. Apologies.")

#Input from user
word_user = input("Enter a word: ")

#Retrive the definition using function and print the result
print(retrive_definition(word_user))

Output:

user1@srv1:~$ python3 interactive_dictionary_06.py 
Enter a word: rainny
Did you mean rainy instead? [y or n]: y
['Abounding with rain.']

Step 7 — The Icing on the cake


Sure it gets us the definition of rain, but there are square braces and all around it, doesn’t look good huh? Let’s remove them and give it a more cleaner look. The word rain has more than one definition, did you notice? There are several words with more than one definition, so we will iterate through the output of those words having more than one definition and simply print out the ones with a single definition.

#Import library
import json
from difflib import get_close_matches

#Loading the json data as python dictionary
#Try typing "type(data)" in terminal after executing first two line of this snippet
data = json.load(open("data.json"))

#Function for retriving definition
def retrive_definition(word):
    #Removing the case-sensitivity from the program
    #For example 'Rain' and 'rain' will give same output
    #Converting all letters to lower because out data is in that format
    word = word.lower()

    #Check for non existing words
    #1st elif: To make sure the program return the definition of words that start with a capital letter (e.g. Delhi, Texas)
    #2nd elif: To make sure the program return the definition of acronyms (e.g. USA, NATO)
    #3rd elif: To find a similar word
    #-- len > 0 because we can print only when the word has 1 or more close matches
    #-- In the return statement, the last [0] represents the first element from the list of close matches
    if word in data:
        return data[word]
    elif word.title() in data:
        return data[word.title()]
    elif word.upper() in data:
        return data[word.upper()]
    elif len(get_close_matches(word, data.keys())) > 0:
        action = input("Did you mean %s instead? [y or n]: " % get_close_matches(word, data.keys())[0])
        #-- If the answers is yes, retrive definition of suggested word
        if (action == "y"):
            return data[get_close_matches(word, data.keys())[0]]
        elif (action == "n"):
            return ("The word doesn't exist, yet.")
        else:
            return ("We don't understand your entry. Apologies.")

#Input from user
word_user = input("Enter a word: ")

#Retrive the definition using function and print the result
output = retrive_definition(word_user)

#If a word has more than one definition, print them recursively
if type(output) == list:
    for item in output:
        print("-",item)
#For words having single definition
else:
    print("-",output)

Output:

user1@srv1:~$ python3 interactive_dictionary_07.py 
Enter a word: rain
- Precipitation in the form of liquid water drops with diameters greater than 0.5 millimetres.
- To fall from the clouds in drops of water.
user1@srv1:~$ python3 interactive_dictionary_07.py 
Enter a word: telephone
- An electronic device used for calling people.
- To contact someone using the telephone.
- To speak with a person by telephone.

End Notes


You learn a lot while you are involved in it. Today, you learned about the JSON data, basic functionalities of python, a new library called ‘difflib’ and equally important how to write clean code. Pick your different data set and apply all your skills, because that’s the only way you can become a master in Data Science field.

Sources


https://towardsdatascience.com/master-python-through-building-real-world-applications-part-1-b040b2b7faad

SUBSCRIBE FOR NEW ARTICLES

@
comments powered by Disqus