word_drop_embedding
Embedding layer with word-drop regularization.
Configuration for embedding layers. |
|
Embedding layer with word-drop regularization. |
EmbeddingConfig
- class swem.models.word_drop_embedding.EmbeddingConfig(embedding_dim: int, num_embeddings: int, padding_idx: int | None = None, max_norm: float | None = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, p: float | None = None, type: Literal['Embedding', 'WordDropEmbedding'] = 'Embedding')
Configuration for embedding layers.
- Parameters
embedding_dim (int) –
num_embeddings (int) –
padding_idx (int | None) –
max_norm (float | None) –
norm_type (float) –
scale_grad_by_freq (bool) –
sparse (bool) –
p (float | None) –
type (Literal[('Embedding', 'WordDropEmbedding')]) –
- Return type
None
WordDropEmbedding
- class swem.models.word_drop_embedding.WordDropEmbedding(*args, p: float, **kwargs)
Embedding layer with word-drop regularization.
During training drops certain words (token ids) entirely from batches (zeroing the corresponding vectors). This layer can be used as a drop-in replacement for the usual nn.Embedding.
- Parameters
p (float) – Probability with which to drop words (if 0 this layer behaves just like a usual embedding layer).
args/kwargs – Other args and kwargs are handled by nn.Embedding.
Examples:
>>> emb = WordDropEmbedding(10, 2, p=0.3) >>> input = torch.arange(0, 10, dtype=torch.int64) >>> output = emb(input) >>> print(output) tensor([[-0.8393, 1.3216], [-0.3652, 0.2879], [ 0.1899, 2.2358], [ 1.7776, -1.8437], [-0.6406, 0.7939], [ 1.0874, -3.2290], [ 0.0000, -0.0000], [ 0.0000, 0.0000], [-0.9666, 1.0100], [-0.2444, -2.2618]], grad_fn=<DivBackward0>)