Sequence Types

City University of Hong Kong

CS1302 Introduction to Computer Programming


import random
%reload_ext divewidgets

Motivation of composite data type

The following code calculates the average of five numbers:

def average_five_numbers(n1, n2, n3, n4, n5):
    return (n1 + n2 + n3 + n4 + n5) / 5


average_five_numbers(1, 2, 3, 4, 5)

What about using the above function to compute the average household income in Hong Kong.
The labor size in Hong Kong is close to 4 million.

  • Should we create a variable to store the income of each individual?
  • Should we recursively apply the function to groups of five numbers?

What we need is

  • a composite data type that can keep a variable number of items, so that
  • we can then define a function that takes an object of the composite data type,
  • and returns the average of all items in the object.

How to store a sequence of items in Python?

We learned a composite data type that stores a sequence of characters. What is it?

tuple and list are two other built-in sequence types for ordered collections of objects. Unlike string, they can store items of possibly different types.

Indeed, we have already used tuples and lists before.

%%optlite -h 400
a_list = "1 2 3".split()
a_tuple = (lambda *args: args)(1, 2, 3)
a_list[0] = 0
a_tuple[0] = 0

What is the difference between tuple and list?

Constructing sequences

How to create tuple/list?

Mathematicians often represent a set of items in two different ways:

  1. Roster notation, which enumerates the elements in the sequence, e.g.,
{0,1,4,9,16,25,36,49,64,81}\{0, 1, 4, 9, 16, 25, 36, 49, 64, 81\}
  1. Set-builder notation, which describes the content using a rule for constructing the elements, e.g.,
{x2xN,x<10},\{x^2| x\in \mathbb{N}, x< 10 \},

namely the set of perfect squares less than 100.

How to create a tuple/list by enumerating its items?

To create a tuple, we enclose a comma separated sequence by parentheses:

%%optlite -h 450
empty_tuple = ()
singleton_tuple = (0,)   # why not (0)?
heterogeneous_tuple = (singleton_tuple, (1, 2.0), print)
enclosed_starred_tuple = (*range(2), *"23")

To create a list, we use square brackets to enclose a comma separated sequence of objects.

%%optlite -h 400
empty_list = []
singleton_list = [0]  # no need to write [0,]
heterogeneous_list = [singleton_list, (1, 2.0), print]
enclosed_starred_list = [*range(2), *"23"]

We can also create a tuple/list from other iterables using the constructors tuple/list as well as addition and multiplication similar to str.

%%optlite -l -h 900
str2list = list("Hello")
str2tuple = tuple("Hello")
range2list = list(range(5))
range2tuple = tuple(range(5))
tuple2list = list((1, 2, 3))
list2tuple = tuple([1, 2, 3])
concatenated_tuple = (1,) + (2, 3)
concatenated_list = [1, 2] + [3]
duplicated_tuple = (1,) * 2
duplicated_list = 2 * [1]
print((1 + 2) * 2, (1 + 2,) * 2, sep="\n")
Solution to Exercise 1

(1+2)*2 evaluates to 6 but (1+2,)*2 evaluates to (3,3).

  • The parentheses in (1+2) indicate the addition needs to be performed first, but
  • the parentheses in (1+2,) creates a tuple.

Hence, singleton tuple must have a comma after the item to differentiate these two use cases.

How to use a rule to construct a tuple/list?

We can specify the rule using a comprehension,
which we have used in a generator expression.
E.g., the following is a python one-liner that returns a generator for prime numbers.

all?
prime_sequence = lambda stop: (
    x for x in range(2, stop) if all(x % divisor for divisor in range(2, x))
)
print(*prime_sequence(100))

There are two comprehensions used:

  • In all(x % divisor for divisor in range(2, x)), the comprehension creates a generator of remainders to the function all, which returns True if all the remainders are non-zero else False.
  • In the return value (x for x in range(2, stop) if ...) of the anonymous function, the comprehension creates a generator of numbers from 2 to stop-1 that satisfy the condition of the if clause.
any?
### BEGIN SOLUTION
composite_sequence = lambda stop: (
    x for x in range(2, stop) if any(x % divisor == 0 for divisor in range(2, x))
)
### END SOLUTION

print(*composite_sequence(100))

We can construct a list instead of a generator using list comprehension:

[x ** 2 for x in range(10)]  # Enclose comprehension by brackets

Is the list comprehension the same as applying list to a generator expression?

list(x ** 2 for x in range(10))  # Enclose comprehension by brackets

List comprehension is more efficient as it does not need to create generator first:

%%timeit
[x ** 2 for x in range(10)]
%%timeit
list(x ** 2 for x in range(10))
%%timeit
tuple(x for x in range(100))
%%timeit
tuple([x for x in range(100)])
Solution to Exercise 3

The second method is often faster because the list of items can be created faster with list comprehension instead of generator expression. This benefits appear to out-weigh the cost in converting a list to a tuple.

With list comprehension, we can simulate a sequence of biased coin flips.

from random import random as rand

p = rand()  # unknown bias
coin_flips = ["H" if rand() <= p else "T" for i in range(1000)]
print("Chance of head:", p)
print("Coin flips:", *coin_flips)

We can then estimate the bias by the fraction of heads coming up.

def average(seq):
    return sum(seq) / len(seq)


head_indicators = [1 if outcome == "H" else 0 for outcome in coin_flips]
fraction_of_heads = average(head_indicators)
print("Fraction of heads:", fraction_of_heads)
def variance(seq):
    ### BEGIN SOLUTION
    return sum(i ** 2 for i in seq) / len(seq) - average(seq) ** 2
    ### END SOLUTION


delta = (variance(head_indicators) / len(head_indicators)) ** 0.5
print("95% confidence interval: [{:.2f},{:.2f}]".format(p - 2 * delta, p + 2 * delta))

Selecting items in a sequence

How to traverse a tuple/list?

Instead of calling the dunder method directly, we can use a for loop to iterate over all the items in order.

a = (*range(5),)
for item in a:
    print(item, end=" ")

To do it in reverse, we can use the reversed function.

reversed?
a = [*range(5)]
for item in reversed(a):
    print(item, end=" ")

We can also traverse multiple tuples/lists simultaneously by zipping them.

zip?
a = (*range(5),)
b = reversed(a)
for item1, item2 in zip(a, b):
    print(item1, item2)

How to select an item in a sequence?

We can select an item of a sequence a by subscription

a[i]

where a is a list and i is an integer index.

A non-negative index indicates the distance from the beginning.

a=(a0,...,an1)\boldsymbol{a} = (a_0, ... , a_{n-1})
%%optlite -h 500
a = (*range(10),)
print(a)
print("Length:", len(a))
print("First element:", a[0])
print("Second element:", a[1])
print("Last element:", a[len(a) - 1])
print(a[len(a)])  # IndexError

A negative index represents a negative offset from an imaginary element one past the end of the sequence.

a=(a0,...,an1)=(an,...,a1)\begin{aligned} \boldsymbol{a} &= (a_0, ... , a_{n-1})\\ & = (a_{-n}, ..., a_{-1}) \end{aligned}
%%optlite -h 500
a = [*range(10)]
print(a)
print("Last element:", a[-1])
print("Second last element:", a[-2])
print("First element:", a[-len(a)])
print(a[-len(a) - 1])  # IndexError

How to select multiple items?

We can use slicing to select a range of items as follows:

a[start:stop]
a[start:stop:step]

The selected items corresponds to those indexed using range:

(a[i] for i in range(start, stop))
(a[i] for i in range(start, stop, step))
a = (*range(10),)
print(a[1:4])
print(a[1:4:2])

Unlike range, the parameters for slicing take their default values if missing or equal to None:

a = [*range(10)]
print(a[:4])  # start defaults to 0
print(a[1:])  # stop defaults to len(a)
print(a[1:4:])  # step defaults to 1

The parameters can also take negative values:

print(a[-1:])
print(a[:-1])
print(a[::-1])  # What are the default values used here?

A mixture of negative and postive values are also okay:

print(a[-1:1])      # equal [a[-1], a[0]]?
print(a[1:-1])      # equal []?
print(a[1:-1:-1])   # equal [a[1], a[0]]?
print(a[-100:100])  # result in IndexError like subscription?
def sss(a, i=None, j=None, k=None):
    ### BEGIN SOLUTION
    l = len(a)
    step = 1 if k is None else k
    m = l if step > 0 else l - 1
    start = 0 if i is None else min(i if i > 0 else max(i + l, 0), m)
    stop = l if j is None else min(j if j > 0 else max(j + l, 0), m)
    ### END SOLUTION
    return start, stop, step


a = [*range(10)]
assert sss(a, -1, 1) == (9, 1, 1)
assert sss(a, 1, -1) == (1, 9, 1)
assert sss(a, 1, -1, -1) == (1, 9, -1)
assert sss(a, -100, 100) == (0, 10, 1)
def quicksort(seq):
    """Return a sorted list of items from seq."""
    if len(seq) <= 1:
        return list(seq)
    i = random.randint(0, len(seq) - 1)
    pivot, others = seq[i], [*seq[:i], *seq[i + 1 :]]
    left = quicksort([x for x in others if x < pivot])
    right = quicksort([x for x in others if x >= pivot])
    return [*left, pivot, *right]


seq = [random.randint(0, 99) for i in range(10)]
print(seq, quicksort(seq), sep="\n")
Solution to Exercise 6

The above recursion creates a sorted list as [*left, pivot, *right] where

  • pivot is a randomly selected item in seq,
  • left is the sorted list of items smaller than pivot, and
  • right is the sorted list of items no smaller than pivot.

The base case happens when seq contains at most one item, in which case seq is already sorted.