Python Sorting

Python Sorting

Sorting HOW TO

Python lists have a built-in list.sort() method that modifies the list in-place. There is also a sorted() built-in function that builds a new sorted list from an iterable.

In this article, we explore the various techniques for sorting data using Python.

sort() function

sort(*, key=None, reverse=False)

The sort() method has two optional parameters: the key parameter and reverse parameter. The key parameter takes in a function that takes a single argument and returns a key to use for sorting. By default, the sort() method will sort a list of numbers by their values and a list of strings alphabetically. The reverse parameter accepts a boolean value of True or False. The default value for reverse is False, meaning it sorts in ascending order. To sort in descending order, we would set reverse=True. These parameters will make much more sense as we look at some examples below.

Let’s say that we have a list of numbers and we want to sort it in ascending order.

num_list = [1,-5,3,-9,25,10]

num_list.sort()

print(num_list)
# [-9,-5,1,3,10,25]

num_list.sort(reverse=True)
print(num_list)
# [25, 10, 3, 1, -5, -9]

So we have a list of numbers, or num_list. We call the sort() method on this list. Notice how we didn’t pass in a value for the key parameter. Thus, it just sorted this list of numbers by their actual values. And since we didn’t set reverse = True, it sorted in ascending order. The sort() method modified our num_list.

Remember that the sort() method returns None. Thus, if we set the output, or return value, of the sort() method to a new variable, we get None as follows:

new_list = num_list.sort(key = absolute_value)

print(new_list)
# None

The key argument for sort() function is explained below in sorted() function

sorted() function

sorted(iterable, *, key=None, reverse=False)
  • Return a new sorted list from the items in iterable.
  • Has two optional arguments which must be specified as keyword arguments.
  • key specifies a function of one argument that is used to extract a comparison key from each element in iterable (for example, key=str.lower). The default value is None (compare the elements directly).
  • reverse is a boolean value. If set to True, then the list elements are sorted as if each comparison were reversed.

The built-in sorted() function is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal — this is helpful for sorting in multiple passes (for example, sort by department, then by salary grade).

For sorting examples and a brief sorting tutorial, see Sorting HOW TO

The sorted() function can accept three parameters: the iterable, the key, and reverse. In other words, the sort() method only works on lists, but the sorted() function can work on any iterable, such as lists, tuples, dictionaries, and others. However, unlike the sort() method which returns None and modifies the original list, the sorted() function returns a new list while leaving the original object unchanged.

The easiest way to sort is with the sorted(iterable) function, which takes a iterable and returns a new list with those elements in sorted order. The original iterable is not changed.

a = [5, 1, 4, 3]
print(sorted(a))
# output: [1, 3, 4, 5] 
print(a)
# output: [5, 1, 4, 3]
reverse=True argument

The sorted() function can be customized through optional arguments. The sorted() optional argument reverse=True, e.g. sorted(list, reverse=True), makes it sort backwards.

example

mylist =  [4, 5, 3, 2, 1]
print(sorted(mylist))
# output: [1, 2, 3, 4, 5]
print(sorted(mylist, reverse=True))
# output: [5, 4, 3, 2, 1]

Key Functions

Both list.sort() and sorted() have a key parameter to specify a function to be called on each list element prior to making comparisons.

For more complex custom sorting, sorted() takes an optional key= specifying a key function that transforms each element before comparison. The key function takes in 1 value and returns 1 value, and the returned proxy value is used for the comparisons within the sort.

For example with a list of strings, specifying key=len (the built in len() function) sorts the strings by length, from shortest to longest. The sort calls len() for each string to get the list of proxy length values, and then sorts with those proxy values.

strs = ['ccc', 'aaaa', 'd', 'bb']
print(sorted(strs, key=len))
# output: ['d', 'bb', 'ccc', 'aaaa']

# or the same with lambda function
print(sorted(strs, key=lambda x: len(x)))
# output: ['d', 'bb', 'ccc', 'aaaa']

python sorted() key argument python sorted() key argument

another example:

strs = ['aa', 'BB', 'zz', 'CC']
print(sorted(strs, key=str.lower)) ## "key" argument specifying str.lower function to use for sorting proxy values ['aa', 'bb', 'cc', 'zz']
# output: ['aa', 'BB', 'CC', 'zz']
Sorting a List of Tuples

Let’s say that we have a list of tuples. Each element of the list is a tuple that contains three elements: the name, age, and salary.

list_of_tuples = [
    ('john', 27, 45000),
    ('jane', 25, 65000),
    ('beth', 31, 70000)
]

We can sort this list alphabetically, by age, or by salary. We can specify which we want to use with the key parameter.

To sort by age, we can use the following code:

sorted_age_list = sorted(list_of_tuples, key = lambda age: age[1])

print(sorted_age_list) 
# [('jane', 25, 65000), ('john', 27, 45000), ('beth', 31, 70000)]

Each element of the list_of_tuples is passed in to the lambda function as the age parameter. The element at index 1 of each tuple is returned. That is the value that is used to sort the list, which is the age.

itemgetter() function

Instead of using a lambda expression to access the name, age, or salary elements from the tuples, we can instead use the itemgetter() function that is available in the operator module. We can specify which element to access in the tuple by passing in the index. Each tuple of the list is passed in to the itemgetter() function, and the specific element from that tuple is returned based on the index specified.

import operator 

# sort by name
sorted(list_of_tuples, key = operator.itemgetter(0))
# [('beth', 31, 70000), ('jane', 25, 65000), ('john', 27, 45000)]

# sort by age
sorted(list_of_tuples, key = operator.itemgetter(1))
# [('jane', 25, 65000), ('john', 27, 45000), ('beth', 31, 70000)]

# sort by salary
sorted(list_of_tuples, key = operator.itemgetter(2))
# [('john', 27, 45000), ('jane', 25, 65000), ('beth', 31, 70000)]

The itemgetter() function allows multiple levels of sorting. For example, let’s say we had this list of tuples instead:

list_of_tuples = [
 ('john', 27, 45000),
 ('jane', 25, 65000),
 ('joe', 25, 35000),
 ('beth', 31, 70000)
]

Notice how jane and joe both have the same age. So if we wanted to sort this list by age first, and by salary second, we can pass in two values to the itemgetter() function:

import operator

print(sorted(list_of_tuples, key=operator.itemgetter(1,2)))
[('joe', 25, 35000), ('jane', 25, 65000), ('john', 27, 45000), ('beth', 31, 70000)]

Since the index for age is the first value passed in, that will be used to sort the elements first. If the age is the same, salary will be used to order the elements.

Note: The operator module also has the attrgetter() function that can be used to sort objects with named attributes. For example, if we wrote our own class and instantiated objects of that class, we can order those objects using a specific named attribute by using the attrgetter() function. We would just pass in the name of the attribute to the attrgetter() function, and then pass that function into the key parameter of the sorted() function. For example, to order the objects by age, we would pass in the following to the key parameter: key = attrgetter('age').

How to Order Dict Output in Python

Dicts are awesome, even for a beginner like me. What isn't so awesome is trying to figure out how to list out their contents for the first time! Lists are easy enough but how on earth do you list out the key/value contents of a dict, let alone in any sort of order?

Listing the Contents of a Dict

Let's start by simply listing out the dict contents. In the below example I have a dict stored in the ages variable.

ages = {'julian': 20, 'bob': 23, 'zack': 3, 'anthony': 95, 'daniel': 41}
for k, v in ages.items():
    print(k, v)

# output:
anthony 95
zack 3
julian 20
bob 23
daniel 41
Using a Lambda as Key Argument to Order the Output in Alphabetical Order

If you're unsure of what a Lambda is, I strongly urge you to read this article by Dan Bader . It was my source for learning what they were and how to use them. It's a great post!

The previous output is great but what if I wanted to print the ages data in alphabetical order? Not only do I need to sort it by the letter but also make sure I point my sorting method at the key in the dict. I can do this with a lambda!

First, let's sort it alphabetically with the help of a lambda:

ages = {'julian': 20, 'bob': 23, 'zack': 3, 'anthony': 95, 'daniel': 41}
sorted(ages.items(), key=lambda x: x[0])
# output: [('anthony', 95), ('bob', 23), ('daniel', 41), ('julian', 20), ('zack', 3)]
  • First, note that we're going to use sorted. This will sort everything between the () in ascending order. Run help(sorted) to see the available options to sorted. You'll see that we can specify a key function to help sort the data.
  • ages.items() is called to break the ages dict up into the five individual items. Note that these "items" I'm referring to are actually tuples!
  • We then use a lambda function as the key to help sort. lambda x at this point will be the individual item in ages.items().
  • The function of lambda x is to sort by x[0] The contents of x[] is the key/value pair in the dict. For example, {'julian', 20}. The 0 indicates the first position in the pair, the key, which in this case is the name 'julian'.
  • The output is then sorted by the key position in ascending, alphabetical order.

example 2: Sort based on lenght of the name

ages = {'julian': 20, 'bob': 23, 'zack': 3, 'anthony': 95, 'daniel': 41}
sorted(ages.items(), key=lambda x: len(x[0]))
# output: [('bob', 23), ('zack', 3), ('julian', 20), ('daniel', 41), ('anthony', 95)]
Sorting the Output in Numerical Order

Now for the flip side. What if I wanted to sort it in numerical order which would be by the value in this case?

Identical as the above sort with one tiny change:

ages = {'julian': 20, 'bob': 23, 'zack': 3, 'anthony': 95, 'daniel': 41}
sorted(ages.items(), key=lambda x: x[1])
# output: [('zack', 3), ('julian', 20), ('bob', 23), ('daniel', 41), ('anthony', 95)]

Yep! All we do is change the lambda x function to point at position x[1], the value.

Storing the Sorted Output in a Dict

You'll have noticed that we still have the output in a list and haven't used print() yet. There's a reason for that.

The thing is, it's a lot harder and less Pythonic to print the output of a dict as a list, then iterate over that to get our friendlier print() output.

It'd be much better to iterate over the output like we did at the start of this post but to do that, our sorted() output would need to be a dict. How do we do that if we know sorted() always returns a list?

ages = {'julian': 20, 'bob': 23, 'zack': 3, 'anthony': 95, 'daniel': 41}
dict(sorted(ages.items(), key=lambda x: x[0]))
# output: {'anthony': 95, 'zack': 3, 'julian': 20, 'bob': 23, 'daniel': 41}
Bonus: Printing the Highest/Lowest Dict Item

Okay this is way out of scope for this post but I got playing and figured I'd add it in for good measure.

What if I wanted to list out the oldest chap in this list? Well, we don't need to sort anything, we just need to know the max number/age right?

We use max and min python built-in functions.

ages = {'julian': 20, 'bob': 23, 'zack': 3, 'anthony': 95, 'daniel': 41}
max(ages.items(), key=lambda x: x[0])
# output: ('zack', 3)
max(ages.items(), key=lambda x: x[1])
# output: ('anthony', 95)
ages = {'julian': 20, 'bob': 23, 'zack': 3, 'anthony': 95, 'daniel': 41}
min(ages.items(), key=lambda x: x[0])   
# output: ('anthony', 95)
min(ages.items(), key=lambda x: x[1])
# output: ('zack', 3)

Another Examples


my_list = [('a2', ['e', 2]),
           ('a4', ['s', 2]),
           ('a3', ['h', 3]),
           ('a1', ['g', 6]),
           ('a6', ['y', 7]),
           ('a5', ['j', 9])]

sorted(my_list, key=lambda kv: kv[1][0])
[('a2', ['e', 2]), ('a1', ['g', 6]), ('a3', ['h', 3]), ('a5', ['j', 9]), ('a4', ['s', 2]), ('a6', ['y', 7])]

sorted(my_list, key=lambda kv: kv[1][1])        
[('a2', ['e', 2]), ('a4', ['s', 2]), ('a3', ['h', 3]), ('a1', ['g', 6]), ('a6', ['y', 7]), ('a5', ['j', 9])]

sorted(my_list, key=lambda kv: kv[0])   
[('a1', ['g', 6]), ('a2', ['e', 2]), ('a3', ['h', 3]), ('a4', ['s', 2]), ('a5', ['j', 9]), ('a6', ['y', 7])]
Sort unordered dictionary
import json #only for pretty print with indent

my_dict = {
            'item1': {'age' : 54, 'name' : 'x', 'input' : 2000, 'output' : 4000},
            'item4': {'name' : 'a', 'age' : 23, 'output' : 6000, 'input' : 5000},
            'item2': {'age' : 43, 'input' : 6000, 'name' : 'r', 'output' : 2000},
            'item3': {'output' : 1000, 'name' : 'f', 'input' : 4000, 'age' : 23}
          }

# sort based only on 'age' key
ordered_dict = dict(sorted(my_dict.items(), key=lambda kv: kv[1]['age']))
print(json.dumps(ordered_dict, indent=4))
{
    "item4": {
        "name": "a",
        "age": 23,
        "output": 6000,
        "input": 5000
    },
    "item3": {
        "output": 1000,
        "name": "f",
        "input": 4000,
        "age": 23
    },
    "item2": {
        "age": 43,
        "input": 6000,
        "name": "r",
        "output": 2000
    },
    "item1": {
        "age": 54,
        "name": "x",
        "input": 2000,
        "output": 4000
    }
}

# sort based on 'age' and then on 'output' key
ordered_dict = dict(sorted(my_dict.items(), key=lambda kv: (kv[1]['age'], kv[1]['output'])))
print(json.dumps(ordered_dict, indent=4))                                                   
{
    "item3": {
        "output": 1000,
        "name": "f",
        "input": 4000,
        "age": 23
    },
    "item4": {
        "name": "a",
        "age": 23,
        "output": 6000,
        "input": 5000
    },
    "item2": {
        "age": 43,
        "input": 6000,
        "name": "r",
        "output": 2000
    },
    "item1": {
        "age": 54,
        "name": "x",
        "input": 2000,
        "output": 4000
    }
}

Sources

https://developers.google.com/edu/python/sorting
https://pybit.es/dict-ordering.html
https://docs.python.org/3/howto/sorting.html
https://docs.python.org/3/library/functions.html
https://dbader.org/blog/python-lambda-functions

SUBSCRIBE FOR NEW ARTICLES

@
comments powered by Disqus