word_drop_embedding

Embedding layer with word-drop regularization.

swem.models.word_drop_embedding.EmbeddingConfig

Configuration for embedding layers.

swem.models.word_drop_embedding.WordDropEmbedding

Embedding layer with word-drop regularization.

EmbeddingConfig

class swem.models.word_drop_embedding.EmbeddingConfig(embedding_dim: int, num_embeddings: int, padding_idx: int | None = None, max_norm: float | None = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, p: float | None = None, type: Literal['Embedding', 'WordDropEmbedding'] = 'Embedding')

Configuration for embedding layers.

Parameters
  • embedding_dim (int) –

  • num_embeddings (int) –

  • padding_idx (int | None) –

  • max_norm (float | None) –

  • norm_type (float) –

  • scale_grad_by_freq (bool) –

  • sparse (bool) –

  • p (float | None) –

  • type (Literal[('Embedding', 'WordDropEmbedding')]) –

Return type

None

WordDropEmbedding

class swem.models.word_drop_embedding.WordDropEmbedding(*args, p: float, **kwargs)

Embedding layer with word-drop regularization.

During training drops certain words (token ids) entirely from batches (zeroing the corresponding vectors). This layer can be used as a drop-in replacement for the usual nn.Embedding.

Parameters
  • p (float) – Probability with which to drop words (if 0 this layer behaves just like a usual embedding layer).

  • args/kwargs – Other args and kwargs are handled by nn.Embedding.

Examples:

>>> emb = WordDropEmbedding(10, 2, p=0.3)
>>> input = torch.arange(0, 10, dtype=torch.int64)
>>> output = emb(input)
>>> print(output)
tensor([[-0.8393,  1.3216],
        [-0.3652,  0.2879],
        [ 0.1899,  2.2358],
        [ 1.7776, -1.8437],
        [-0.6406,  0.7939],
        [ 1.0874, -3.2290],
        [ 0.0000, -0.0000],
        [ 0.0000,  0.0000],
        [-0.9666,  1.0100],
        [-0.2444, -2.2618]], grad_fn=<DivBackward0>)