QSEN #39 – TypeIs

Hello!

In scientific software, it is not uncommon to write a function that can accept objects of more than one input type for one or more of its arguments. For instance, one might write a function that can transform either a single number or a list of numbers. In most cases, dispatch for different types is implemented either using singledispatch or if statements with isinstance checks. However, when structural subtyping is involved, one cannot use singledispatch, and incorrect use of if statements can lead to code that runs fine, but triggers type-checker errors. In this newsletter, we will show you how to deal with this problem using quite recent Python feature called TypeIs.

Nominal vs structural subtyping

Let us start by a short reminder about difference between nominal and structural subtyping in Python. The nominal subtyping describes usual relationship between types based on class hierarchy. For instance, (3, 4) and [3, 4] have different nominal type, the first one is a tuple and the second one is a list. You can reveal the nominal type at runtime using type function.

In contrast, structural subtyping refers to object’s properties and behaviors, without necessarily relying on class hierarchy. It is only used for static analysis, like type checking using mypy. As an example, (3, 4) can be described as tuple[int, int] and (3, 4, 5) as tuple[int, int, int]. They both are also Iterable[int] and Sequence[int]. Nevertheless, at runtime, running type((3, 4)) and type((3, 4, 5)) gives us the same output - the tuple class.

While structural subtyping can make our code more expressive, and combined with static analysis tools like mypy or language servers makes the code more resistant to many bugs that can be identified before the code is even run. However, it also has some drawbacks.

The main drawback of structural subtyping is that it doesn’t play well with dispatching execution based on input types at runtime. Distinguishing between int and str is simple and can be done using single isinstance check. Distinguishing between Sequence[str] and Sequence[str | float] is more involved, and would require inspecting all elements of the sequence. Not all cases are as severe, but the bottom line is: dispatching based on structural subtypes is usually more involved. And sometimes, attempt to do it will confuse the type checker, which will be the main focus of today’s toy problem.

The problem

Suppose we want to write a function mean that can accept one of two possible types of inputs:

A sequence of numbers, in which case it should return their arithmetic mean.
A sequence of pair (weight, number) in which case it will compute the weighted arithmetic mean.

The signature of our function could look something like this:

from collections.abc import Sequence


def mean(data: Sequence[float] | Sequence[tuple[float, float]]) -> float:
    ...

We can immediately see that we cannot use singledispatch and isinstance(data, some_type) check to implement the desired behavior. That’s because at runtime, both types can actually be represented by the same sequence type, like list or tuple. We actually have to inspect the elements of the sequence to determine what to do. Assuming the input is otherwise correct, we actually only need to inspect the first element. Therefore, our function could look something like this:


from collections.abc import Sequence


def mean(data: Sequence[float] | Sequence[tuple[float, float]]) -> float:
    if not data:
        msg = "Cannot compute mean of an empty sequence."
        raise ValueError(msg)
    if isinstance(data[0], tuple):
        sum_of_weights = sum(item[0] for item in data)
        return sum(weight / sum_of_weights * x for weight, x in data)
    else:
        return sum(data) / len(data)

It works fine, but running mypy on our function will raise several errors. Assuming we saved our code exactly as above in a file called means.py, this is what the output would look like:

means.py:9: error: Value of type "float | tuple[float, float]" is not indexable  [index]
means.py:9: error: Generator has incompatible item type "Any | float"; expected "bool"  [misc]
means.py:10: error: "float" object is not iterable  [misc]
means.py:10: error: Generator has incompatible item type "Any | float"; expected "bool"  [misc]
means.py:12: error: Argument 1 to "sum" has incompatible type "Sequence[float] | Sequence[tuple[float, float]]"; expected "Iterable[bool]"  [arg-type]

The first error warns us that we try to grab item[0], but item can be either float or tuple[float, float], and therefore it is, in general, not indexable. All other errors relate to the same thing - mypy cannot tell that looking at the first item in data should actually narrow the type of the whole thing.

Of course, having a working implementation is more important than strict type checking, but in this case we can actually make mypy correctly recognize our code as correct, while also making the function itself more readable.

The solution: using `TypeIs`

In Python 3.13, a TypeIs[T] generic type has been added. It can be used as a return value of a boolean function to mark its input as being of type T. It might sound confusing, so it’s best to check the example. If you are on an older version of Python, you can install the typing_extensions module, and import TypeIs from there instead.

from collections.abc import Sequence
from typing import TypeIs

def is_sequence_of_pairs(
    data: Sequence[float] | Sequence[tuple[float, float]]
) -> TypeIs[Sequence[tuple[float, float]]]:
    return isinstance(data[0], tuple)

Let’s examine what’s going on here. The function accepts the same thing as our mean function does, which is either a sequence of numbers or a sequence of pairs of numbers. It returns True if the first element of the sequence is a tuple, and False otherwise. Therefore, assuming the input is correct, this function answers the question “Is this sequence a sequence of pairs?”. From the logical standpoint, it also answers the complementary question “Is this sequence a sequence of numbers?”. The TypeIs return type tells type checkers to treat it exactly like this. If it returns True, the input has to be considered to be of Sequence[tuple[float, float]]type, and otherwise to be its complement narrowed to the original input type, which in our case happens to be Sequence[float]. Let’s now combine it with our function and see what mypy has to say!

from collections.abc import Sequence
from typing import TypeIs


def is_sequence_of_pairs(
    data: Sequence[float] | Sequence[tuple[float, float]]
) -> TypeIs[Sequence[tuple[float, float]]]:
    return isinstance(data[0], tuple)


def mean(data: Sequence[float] | Sequence[tuple[float, float]]) -> float:
    if not data:
        msg = "Cannot compute mean of an empty sequence."
        raise ValueError(msg)
    if is_sequence_of_pairs(data):
        sum_of_weights = sum(item[0] for item in data)
        return sum(weight / sum_of_weights * x for weight, x in data)
    else:
        return sum(data) / len(data)

Running mypy on our file now reveals no errors. Also, our function is a bit more readable!

Before we let you on your way to test TypeIs in your code, we want to warn you about one caveat: the TypeIs is very flexible, to the point you can do a lot of silly things with it. Suppose we add a third element to our input union, like this:

def is_sequence_of_pairs(
    data: Sequence[float] | Sequence[tuple[float, float]] | Sequence[tuple[float, float, float]]
) -> TypeIs[Sequence[tuple[float, float]]]:
    return isinstance(data[0], tuple)

Would the rest of the code still pass type checking? Yes, and it actually is correct in this context, because the input type to mean cannot be a sequence of triples anyway. But what if we used it in other contexts where the triples can appear? Let’s find out.

from collections.abc import Sequence
from typing import TypeIs


def is_sequence_of_pairs(
    data:  Sequence[float] | Sequence[tuple[float, float]] | Sequence[tuple[float, float, float]]
) -> TypeIs[Sequence[tuple[float, float]]]:
    return isinstance(data[0], tuple)


def print_data_type(data: Sequence[tuple[float, float]] | Sequence[tuple[float, float, float]]) -> None:
    if is_sequence_of_pairs(data):
        print("Got sequence of pairs.")
    else:
        print("Got sequence of triples.")


print_data_type([(2, 3, 4), (4, 5, 6)])

This example would pass type check, but it would, incorrectly, print “Got sequence of triples”. Now, the flaw in the logic is obvious, but the point we are trying to make is: if you make mistakes in the TypeIs guards, you might introduce some hard to find bugs that are technically related to type dispatch, but mypy would be unable to help you find them.

And that’s it for today! We’re curious to know if you use TypeIs in your projects, or if you just found out about it!

Konrad & Michał

QSEN #39 – TypeIs

Nominal vs structural subtyping

The problem

The solution: using TypeIs

The solution: using `TypeIs`