Sequence Types¶

Motivation of composite data type¶

The following code calculates the average of five numbers:

def average_five_numbers(n1, n2, n3, n4, n5):
    return (n1 + n2 + n3 + n4 + n5) / 5


average_five_numbers(1, 2, 3, 4, 5)

3.0

What about using the above function to compute the average household income in Hong Kong.
The labor size in Hong Kong is close to 4 million.

Should we create a variable to store the income of each individual?
Should we recursively apply the function to groups of five numbers?

What we need is

a composite data type that can keep a variable number of items, so that
we can then define a function that takes an object of the composite data type,
and returns the average of all items in the object.

How to store a sequence of items in Python?

We learned a composite data type that stores a sequence of characters. What is it?

tuple and list are two other built-in sequence types for ordered collections of objects. Unlike string, they can store items of possibly different types.

Indeed, we have already used tuples and lists before.

%%mytutor -h 300
a_list = "1 2 3".split()
a_tuple = (lambda *args: args)(1, 2, 3)
a_list[0] = 0
a_tuple[0] = 0

What is the difference between tuple and list?

List is mutable so programmers can change its items.
Tuple is immutable like int, float, and str, so
- programmers can be certain the content stay unchanged, and
- Python can preallocate a fixed amount of memory to store its content.

Constructing sequences¶

How to create tuple/list?

Mathematicians often represent a set of items in two different ways:

Roster notation, which enumerates the elements in the sequence, e.g.,

\[ \{0, 1, 4, 9, 16, 25, 36, 49, 64, 81\} \]

Set-builder notation, which describes the content using a rule for constructing the elements, e.g.,

\[ \{x^2| x\in \mathbb{N}, x< 10 \}, \]

namely the set of perfect squares less than 100.

Python also provides two corresponding ways to create a tuple/list:

How to create a tuple/list by enumerating its items?

To create a tuple, we enclose a comma separated sequence by parentheses:

%%mytutor -h 450
empty_tuple = ()
singleton_tuple = (0,)   # why not (0)?
heterogeneous_tuple = (singleton_tuple, (1, 2.0), print)
enclosed_starred_tuple = (*range(2), *"23")

Note that:

If the enclosed sequence has one term, there must be a comma after the term.
The elements of a tuple can have different types.
The unpacking operator * can unpack an iterable into a sequence in an enclosure.

To create a list, we use square brackets to enclose a comma separated sequence of objects.

%%mytutor -h 450
empty_list = []
singleton_list = [0]  # no need to write [0,]
heterogeneous_list = [singleton_list, (1, 2.0), print]
enclosed_starred_list = [*range(2), *"23"]

We can also create a tuple/list from other iterables using the constructors tuple/list as well as addition and multiplication similar to str.

%%mytutor -h 950
str2list = list("Hello")
str2tuple = tuple("Hello")
range2list = list(range(5))
range2tuple = tuple(range(5))
tuple2list = list((1, 2, 3))
list2tuple = tuple([1, 2, 3])
concatenated_tuple = (1,) + (2, 3)
concatenated_list = [1, 2] + [3]
duplicated_tuple = (1,) * 2
duplicated_list = 2 * [1]

Exercise Explain the difference between following two expressions. Why a singleton tuple must have a comma after the item.

print((1 + 2) * 2, (1 + 2,) * 2, sep="\n")

6
(3, 3)

(1+2)*2 evaluates to 6 but (1+2,)*2 evaluates to (3,3).

The parentheses in (1+2) indicate the addition needs to be performed first, but
the parentheses in (1+2,) creates a tuple.

Hence, singleton tuple must have a comma after the item to differentiate these two use cases.

How to use a rule to construct a tuple/list?

We can specify the rule using a comprehension,
which we have used in a generator expression.
E.g., the following is a python one-liner that returns a generator for prime numbers.

all?
prime_sequence = lambda stop: (
    x for x in range(2, stop) if all(x % divisor for divisor in range(2, x))
)
print(*prime_sequence(100))

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

There are two comprehensions used:

In all(x % divisor for divisor in range(2, x)), the comprehension creates a generator of remainders to the function all, which returns True if all the remainders are non-zero else False.
In the return value (x for x in range(2, stop) if ...) of the anonymous function, the comprehension creates a generator of numbers from 2 to stop-1 that satisfy the condition of the if clause.

Exercise Use comprehension to define a function composite_sequence that takes a non-negative integer stop and returns a generator of composite numbers strictly smaller than stop. Use any instead of all to check if a number is composite.

any?
### BEGIN SOLUTION
composite_sequence = lambda stop: (
    x for x in range(2, stop) if any(x % divisor == 0 for divisor in range(2, x))
)
### END SOLUTION

print(*composite_sequence(100))

4 6 8 9 10 12 14 15 16 18 20 21 22 24 25 26 27 28 30 32 33 34 35 36 38 39 40 42 44 45 46 48 49 50 51 52 54 55 56 57 58 60 62 63 64 65 66 68 69 70 72 74 75 76 77 78 80 81 82 84 85 86 87 88 90 91 92 93 94 95 96 98 99

We can construct a list instead of a generator using list comprehension:

[x ** 2 for x in range(10)]  # Enclose comprehension by brackets

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Is the list comprehension the same as applying list to a generator expression?

list(x ** 2 for x in range(10))  # Enclose comprehension by brackets

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

List comprehension is more efficient as it does not need to create generator first:

%%timeit
[x ** 2 for x in range(10)]

1.96 µs ± 1.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit
list(x ** 2 for x in range(10))

2.15 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Exercise The following are two different ways to use comprehension to construct a tuple. Which one is faster? Try predicting the results before running them.

%%timeit
tuple(x for x in range(100))

3.67 µs ± 19 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit
tuple([x for x in range(100)])

2.49 µs ± 12.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The second method is often faster because the list of items can be created faster with list comprehension instead of generator expression. This benefits appear to out-weight the cost in converting a list to a tuple.

With list comprehension, we can simulate a sequence of biased coin flips.

from random import random as rand

p = rand()  # unknown bias
coin_flips = ["H" if rand() <= p else "T" for i in range(1000)]
print("Chance of head:", p)
print("Coin flips:", *coin_flips)

Chance of head: 0.5861498409916134
Coin flips: H T H H H T H T T H H H H H H H H H T H H H T T H H H T H H H T H T H H H T T T H H T H H H T H H T H H T H T T T H H T H T T H H T H H H T H H H T T H H H H H H T H T T H H T H T T H H H H H H H H H T T H T H T H T T H T H H H H T H H H T H T H T H T T T H H T H H H H T T T T T H H H H T T H T T H H H T H H T H T H H T T T H H H H T T H T H H H T H H T H T H H T H H H H T H T T T T H T H H H H T H T H H H H T T T T H T H T T H T H T T T H H T H T H H T H H H H H H H T T T H H H H T H H H T T H T T T T T T H T H T T T H T T H H H H T H T T T T T H T T H H H H T T H H T H T T H T H T T H H T H H T H H H H H H H H T H T T H H H H T H H T H T H H H T H T H H H T T H T H T T H T T H T T T H T H H H H H H H T H H H H H H T H H H H T H H T T H T H T T H H T T T T H H H T H H T T H H H T T T T H T H H T H T H H T T T H H T T T H T H T T H T T T T H H H H H H H T H T H H H H T H H T H H H H T T T T H T H T H T H H H H H H T H H H H H T H T T T T H T T H H T T T H H T H T T H T H H T T H H T H T T H T H H H T H T H H T T H T T H T T T H H T T H T H T H H T T T H H H H H H T H T T T H T H T T T T T H H H T H H H H H H T T H T H H H H H H T T H H H H T T H H H T H H H T T H H T T T T H H H T T H T T H H H H T T H T H T H T H H T H H H H T H H T H H H T T T T T T H T H H H H H H H H H T H H H H H H T H H T T T H H H T H T H H T T H H H T T H T T H H T H H H T H T T H H T T H T H T H H H T H H T H T H T T H T T T H H H T H T H H T T H H T T T H T H H H H T T H H T H H H H T H H T T H H T H H H T T T T T H H T H T H H T T H T H T H H H H T H H T H H H T H T H T H H T H T H T H T H T T H T T H H H H T H T H H H H T T T T T H H T H T H T H H H T H T H H H H H H H H H H T H T T T H T H H T H H H H H T H H H T H T H H H T H H H T T H T H H H H T T H T H H T H H T H T H H T T H T H H T T T T T H H H T T T H T T H T H T H H H H T T H H H H T T T T H T H T T T H H T H H H H T T H H H T H H H T H T H H H T H T T H T H H H H T T H H H T H T T T T H H T T T T H H T T H H T H T H H H H H T

We can then estimate the bias by the fraction of heads coming up.

def average(seq):
    return sum(seq) / len(seq)


head_indicators = [1 if outcome == "H" else 0 for outcome in coin_flips]
fraction_of_heads = average(head_indicators)
print("Fraction of heads:", fraction_of_heads)

Fraction of heads: 0.576

Note that sum and len returns the sum and length of the sequence.

Exercise Define a function variance that takes in a sequence seq and returns the variance of the sequence.

def variance(seq):
    ### BEGIN SOLUTION
    return sum(i ** 2 for i in seq) / len(seq) - average(seq) ** 2
    ### END SOLUTION


delta = (variance(head_indicators) / len(head_indicators)) ** 0.5
print("95% confidence interval: [{:.2f},{:.2f}]".format(p - 2 * delta, p + 2 * delta))

95% confidence interval: [0.55,0.62]

Selecting items in a sequence¶

How to traverse a tuple/list?

Instead of calling the dunder method directly, we can use a for loop to iterate over all the items in order.

a = (*range(5),)
for item in a:
    print(item, end=" ")

0 1 2 3 4

To do it in reverse, we can use the reversed function.

reversed?
a = [*range(5)]
for item in reversed(a):
    print(item, end=" ")

4 3 2 1 0

We can also traverse multiple tuples/lists simultaneously by zipping them.

zip?
a = (*range(5),)
b = reversed(a)
for item1, item2 in zip(a, b):
    print(item1, item2)

How to select an item in a sequence?

Sequence objects such as str/tuple/list implements the getter method __getitem__ to return their items.

We can select an item of a sequence a by subscription

a[i]

where a is a list and i is an integer index.

A non-negative index indicates the distance from the beginning.

\[\boldsymbol{a} = (a_0, ... , a_{n-1})\]

a = (*range(10),)
print(a)
print("Length:", len(a))
print("First element:", a[0])
print("Second element:", a[1])
print("Last element:", a[len(a) - 1])
print(a[len(a)])  # IndexError

(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
Length: 10
First element: 0
Second element: 1
Last element: 9

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-22-2b19badaedfe> in <module>
      5 print("Second element:", a[1])
      6 print("Last element:", a[len(a) - 1])
----> 7 print(a[len(a)])  # IndexError

IndexError: tuple index out of range

a[i] with i >= len(a) results in an IndexError.

A negative index represents a negative offset from an imaginary element one past the end of the sequence.

\[\begin{split}\begin{aligned} \boldsymbol{a} &= (a_0, ... , a_{n-1})\\ & = (a_{-n}, ..., a_{-1}) \end{aligned}\end{split}\]

a = [*range(10)]
print(a)
print("Last element:", a[-1])
print("Second last element:", a[-2])
print("First element:", a[-len(a)])
print(a[-len(a) - 1])  # IndexError

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Last element: 9
Second last element: 8
First element: 0

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-23-738e893c8a70> in <module>
      4 print("Second last element:", a[-2])
      5 print("First element:", a[-len(a)])
----> 6 print(a[-len(a) - 1])  # IndexError

IndexError: list index out of range

a[i] with i < -len(a) results in an IndexError.

How to select multiple items?

We can use slicing to select a range of items as follows:

a[start:stop]
a[start:stop:step]

The selected items corresponds to those indexed using range:

(a[i] for i in range(start, stop))
(a[i] for i in range(start, stop, step))

a = (*range(10),)
print(a[1:4])
print(a[1:4:2])

(1, 2, 3)
(1, 3)

Unlike range, the parameters for slicing take their default values if missing or equal to None:

a = [*range(10)]
print(a[:4])  # start defaults to 0
print(a[1:])  # stop defaults to len(a)
print(a[1:4:])  # step defaults to 1

[0, 1, 2, 3]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3]

The parameters can also take negative values:

print(a[-1:])
print(a[:-1])
print(a[::-1])  # What are the default values used here?

[9]
[0, 1, 2, 3, 4, 5, 6, 7, 8]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

A mixture of negative and postive values are also okay:

print(a[-1:1])      # equal [a[-1], a[0]]?
print(a[1:-1])      # equal []?
print(a[1:-1:-1])   # equal [a[1], a[0]]?
print(a[-100:100])  # result in IndexError like subscription?

[]
[1, 2, 3, 4, 5, 6, 7, 8]
[]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Exercise (Challenge) Complete the following function to return a tuple (start, stop, step) such that range(start, stop, step) gives the non-negative indexes of the sequence of elements selected by a[i:j:k].

Hint: See note 3-5 in the python documentation.

def sss(a, i=None, j=None, k=None):
    ### BEGIN SOLUTION
    l = len(a)
    step = 1 if k is None else k
    m = l if step > 0 else l - 1
    start = 0 if i is None else min(i if i > 0 else max(i + l, 0), m)
    stop = l if j is None else min(j if j > 0 else max(j + l, 0), m)
    ### END SOLUTION
    return start, stop, step


a = [*range(10)]
assert sss(a, -1, 1) == (9, 1, 1)
assert sss(a, 1, -1) == (1, 9, 1)
assert sss(a, 1, -1, -1) == (1, 9, -1)
assert sss(a, -100, 100) == (0, 10, 1)

Exercise With slicing, we can now implement a practical sorting algorithm called quicksort to sort a sequence. Explain how the code works:

def quicksort(seq):
    """Return a sorted list of items from seq."""
    if len(seq) <= 1:
        return list(seq)
    i = random.randint(0, len(seq) - 1)
    pivot, others = seq[i], [*seq[:i], *seq[i + 1 :]]
    left = quicksort([x for x in others if x < pivot])
    right = quicksort([x for x in others if x >= pivot])
    return [*left, pivot, *right]


seq = [random.randint(0, 99) for i in range(10)]
print(seq, quicksort(seq), sep="\n")

[28, 5, 42, 34, 18, 71, 17, 92, 0, 52]
[0, 5, 17, 18, 28, 34, 42, 52, 71, 92]

The above recursion creates a sorted list as [*left, pivot, *right] where

pivot is a randomly selected item in seq,
left is the sorted list of items smaller than pivot, and
right is the sorted list of items no smaller than pivot.

The base case happens when seq contains at most one item, in which case seq is already sorted.

CS1302

Sequence Types¶

Motivation of composite data type¶

Constructing sequences¶

Selecting items in a sequence¶