aesthetics()
Investigating the loss kink while training diffusion U-Net
Why does the loss hover around 1?
= get_fashion_dls(bs=512) dls
= TUnet(
m =1,
color_channels=(32, 64, 128, 256, 384),
nfs=(3, 2, 1, 1, 1, 1),
n_blocks
)= [*m.downblocks, *m.upblocks]
blocks = StoreModuleStatsCB(
stats sum([b.convs for b in blocks], nn.ModuleList()),
={"periodicity": 1},
hook_kwargs
)
train(
m,
dls,=4e-3,
lr=2,
n_epochs=[stats],
extra_cbs
) stats.mean_std_plot()
loss | epoch | train |
---|---|---|
0.121 | 0 | train |
0.074 | 0 | eval |
0.044 | 1 | train |
0.040 | 1 | eval |
CPU times: user 19min 6s, sys: 4min 29s, total: 23min 35s
Wall time: 6min 56s
class KTUnet(TUnet, KaimingMixin):
...
= KTUnet.kaiming(
m =1,
color_channels=(32, 64, 128, 256, 384),
nfs=(3, 2, 1, 1, 1, 1),
n_blocks
)= [*m.downblocks, *m.upblocks]
blocks = StoreModuleStatsCB(
stats sum([b.convs for b in blocks], nn.ModuleList()),
={"periodicity": 1},
hook_kwargs
)
train(
m,
dls,=1e-3,
lr=2,
n_epochs=[stats],
extra_cbs
) stats.mean_std_plot()
loss | epoch | train |
---|---|---|
0.866 | 0 | train |
0.754 | 0 | eval |
0.128 | 1 | train |
0.097 | 1 | eval |
CPU times: user 17min 40s, sys: 7.59 s, total: 17min 48s
Wall time: 4min 10s
This experiment demonstrates that the issue is with Kaiming initialization, which I analyze further here in the FastAI forums.