Diving Deeper

Matrix math, from scratch

Adapted from:

https://www.youtube.com/watch?v=6StU6UtZEbU&list=PLfYUBJiXbdtRUvTUYpLdfHHp9a58nWVXP&index=3&t=1623s

Matrix notation and playing with data

Get the data

download_mnist

 download_mnist (path_gz=Path('data/mnist.pkl.gz'))

source

download_file

 download_file (url, destination)

data_fp = download_mnist()
data_fp

PosixPath('data/mnist.pkl.gz')

with gzip.open(data_fp, "rb") as f:
    data = pickle.load(f, encoding="latin-1")
    ((X_trn, y_trn), (X_vld, y_vld), _) = data
X_trn.shape

Note that \(784 = 28^2\)

fig, ax = plt.subplots(1, 1)
ax.imshow(X_trn[0].reshape(28, 28))

Writing a matrix class

@dataclass
class Matrix:
    xs: List[List[float]]

    def __getitem__(self, idxs):
        x, y = idxs
        return self.xs[x][y]

This is implemented by PyTorch (along with all the auto-differentiation stuff)

torch.Tensor?

Init signature: torch.Tensor(self, /, *args, **kwargs)
Docstring:      <no docstring>
File:           ~/miniforge3/envs/slowai/lib/python3.9/site-packages/torch/__init__.py
Type:           _TensorMeta
Subclasses:     Parameter, UninitializedBuffer, FakeTensor, MaskedTensor

History of Tensor Programming

This goes back to the invention of the APL language. It started with a mathematical notation that was later adapted as a programming language in the 1960s by Kenneth Iverson and Adin Falkoff. This was extended with their physics research into Tensor Analysis.

You can try it here.

APL functionality

Defining a tensor (or “arrays,” using their own terminology), a.

a ⃪ 3 5 6

Multiplying by a scalar

a ⨉ 3

Element-wise division

b ⃪ 7 8 9
a ÷ b

Numpy was influenced by APL, PyTorch was influenced by numpy.

One thing that differs is that scalars are just “1-rank” tensors in numpy, whereas they have special scalars have special semantics in numpy.

Random numbers

There is no way to generate random numbers from a typical computer. You have to look at natural phenomenon for true randomness.

Generally, all we need is pseudo-randomness, like the Wichmann Hill algorithm.

@dataclass
class RNG:
    x = None
    y = None
    z = None

    def seed(self, a):
        a, x = divmod(a, 30268)
        a, y = divmod(a, 30306)
        a, z = divmod(a, 30322)
        self.x = int(x) + 1
        self.y = int(y) + 1
        self.z = int(z) + 1

    def random(self):
        self.x = (171 * self.x) % 30269
        self.y = (172 * self.y) % 30307
        self.z = (170 * self.z) % 30323
        return (self.x / 30268 + self.y / 30306 + self.z / 30322) % 1


rng = RNG()
rng.seed(42)
rng.random(), rng.random(), rng.random()

(0.25421176102342913, 0.4689255225976794, 0.19544471247365425)

It is important to keep in mind the Unix process semantics for random numbers. What happens if we fork the process?

if os.fork():
    print(f"parent process: {rng.random():.2f}")
else:
    print(f"child process: {rng.random():.2f}")

parent process: 0.86
child process: 0.86

These are the same number! This is because the internals of the RNG are forked. You need to reseed the RNG for each process, like so.

if os.fork():
    rng.seed(42)
    print(f"parent process: {rng.random():.2f}")
else:
    rng.seed(43)
    print(f"child process: {rng.random():.2f}")

parent process: 0.25
child process: 0.26