Testing, Formatting, Linting, Type-Checking, and Continuous Integration¶
Structure¶
Talk a tiny bit about each topic, but the main focus is showing how to add each of these processes to your code and how they work.
In between each section I will let everyone try to add the step we just discussed to their own code. If you run into issues or are confused, let us know. We can then answer that question for the whole group.
Motivation ??¶
⚠️ ⚠️ ANECDOTE ⚠️ ⚠️
This is somewhat 'expected knowledge' for software devs but not typically for scientists. If you understand testing and continuous integration (CI) configuration well enough, it will increase your salary offers in industry.
Regardless of industry or academia, CI systems can be used for tons of different tasks including whole processing pipelines (for free***)
please don't abuse CI systems, i am trying to get a job at GitHub research and I don't want them to be angry at me
Testing¶
Testing¶
Testing consist in writing small functions that test:
- whether small units of your code function as expected
- whether these small units integrate well together
- whether your code takes care of edge cases
- whether your code's inputs and outputs are correctly treated
Running your tests allow checking that new changes to the code base did not break anything in your code (at least what your are testing for!).
Testing¶
As you write tests, you get to experience what it takes to use your library and this might lead you to refactor parts of your code; refactoring is an important part of a software life-cycle!
Formatting¶
Formatting¶
Formatting does two things:
- Stops all the arguments about: tabs vs spaces, new lines in the middle of operations, etc. NO MORE STUPID ARGUMENTS!
Formatting¶
Formatting does two things:
- Stops all the arguments about: tabs vs spaces, line length, new lines in the middle of operations, etc. NO MORE STUPID ARGUMENTS!
MORE IMPORTANTLY
- Keeps a consistent style across the whole project. Whether you wrote the code or someone else, it will look the same.
Linting¶
Linting¶
Formatting focuses on directly changing the code to some "standard" style. Linting looks for common problems in the code, common bugs for example.
A nice example of the difference is that formatting won't change the following:
from package_a import foo
from package_b import bar
from package_a import foo, baz
But linting should inform you (if not automatically fix) that foo
from package_a
is imported twice and a nicer import block should look like:
from package_a import baz, foo # alphabetical
from package_b import bar
Linting¶
- It will also alert you of unused variables that you may have forgot to use or accidentially left from debugging etc.
- It will remind you about function and module documentation standards.
- It will help you fix code which is doing extra bits of work (i.e. accidental double looping / possible standard library replacements).
I won't list all of the rules it checks against but there are A LOT.
Type Checking¶
Type Checking¶
Type checking is the most thorough analysis of your code you can do before you even run it.
Python added optional types in version 3.5. At first they were simply for analysis of the code prior to running to check for even more bugs and possible error cases but in the latest versions of Python, they are now being used to speed up your programs.
They can be annoying to add and work with. Very annoying, because they are optional in the language so there is a lot of hacky ways they can be manipulated, but in my opinion they are entirely worth it.
Type Checking¶
Typing is handled by decorations in the code i.e.
def example(a):
print(f"Hello {a}")
def example_typed(a: str) -> None:
print(f"Hello {a}")
Type Checking¶
And they can get complicated...
import random
# yay only a single value
var_int: int = 5
var_float: float = 0.5
var_bool: bool = False
var_string: str = "wow!"
# oh no multiple values
list_of_strings: list[str] = ["hello", "world"]
list_of_mixed: list[str | int | bool] = ["hello", 3, True]
# lists don't have any constraints on size but tuples do
tuple_of_two_values: tuple[str, int] = ("number", 10)
tuple_of_n_values: tuple[str, ...] = tuple(["python why did you add this" for i in range(random.randint(1, 5))])
print(tuple_of_n_values)
# sub-objects
dict_of_dicts: dict[str, dict[str, str | int]] = {
"eva": {"name": "eva maxfield brown", "age": 28},
"bob": {"name": "Bob Boberson"},
}
('python why did you add this',)
Type Checking¶
An example with our favourite pattern, the factory:
from abc import ABC, abstractmethod
from typing import Type
class Localizer(ABC):
@abstractmethod
def localize(self, text: str) -> str:
raise NotImplementedError()
class EnglishLocalizer(Localizer):
def localize(self, text: str) -> str:
return "Hello world"
localizers = {
"en": EnglishLocalizer,
}
def get_localizer_not_initialized(lang: str) -> Type[Localizer]:
return localizers[lang]
def get_localizer_initialized(lang: str) -> Localizer:
return localizers[lang]()
Type Checking¶
Some tips / don't worry too much...
import numpy as np
from dataclasses import dataclass
# numpy has types, pandas has types, etc.
var_array: np.ndarray = np.random.random((3, 2))
# list of dict of list etc etc can get hard...
# instead of...
dict_of_dicts: dict[str, dict[str, str | int]] = {
"eva": {"name": "eva maxfield brown", "age": 28},
"bob": {"name": "Bob Boberson"},
}
# use `dataclass`
@dataclass
class PersonDetails:
name: str
age: int | None = None
# easier to follow typing because sub-objects are separately defined
dict_of_person_details: dict[str, PersonDetails] = {
"eva": PersonDetails(name="Eva Maxfield Brown", age=28),
"bob": PersonDetails(name="Bob Boberson")
}
dict_of_person_details["eva"]
PersonDetails(name='Eva Maxfield Brown', age=28)
Type Checking¶
It is hard to learn but do practice using it. It will make writing code easier the more you do it as it will check a lot of assumptions for you.
Some resources to continue learning / lookup in the future:
Continuous Integration¶
Continuous Integration¶
Continuous Integration (CI) is meant to reduce or remove bugs entering code over time. Whether the code base is changing (new features, bug fixes, etc.) or upstream dependencies are changing (new releases).
The main idea is that by checking your tests, formatting, linting, and types for each commit / PR you we know that nothing is breaking.
If something does break, you know the exact commit / PR which broke something.
Continuous Integration¶
Further, it allows you to test on more machine setups than your own. For example, I have a MacOS laptop and a Linux desktop and I use Python 3.11 on both. But I have users who use Windows and Python 3.9.
CI systems (GitHub Actions, GitLab CI/CD, Azure Pipelines, etc.) allow you to run the same suite of tests across all of these machines setups.
Fortunately, once you have testing, formatting, linting, and type-checking setup locally, it is pretty easy to add CI to your repo!
Everything At Once¶
Everything At Once¶
A pull request (PR) which has everything we just talked about together: https://github.com/evamaxfield/winter-school-lectures/pull/1
Credit¶
Much of the written content came directly from https://pydev-guide.github.io/.
It is still under development but some tutorials are already available. I highly recommend starring it / checking back to it every few months.