rau.models
¶
- rau.models.get_unidirectional_transformer_encoder(input_vocabulary_size, output_vocabulary_size, tie_embeddings, num_layers, d_model, num_heads, feedforward_size, dropout, use_padding, shared_embeddings=None, positional_encoding_cacher=None, tag=None)¶
Construct a causally-masked transformer encoder.
- Return type:
- class rau.models.UnidirectionalTransformerEncoderLayers¶
Bases:
Unidirectional
- class State¶
Bases:
State
- __init__(encoder, previous_inputs)¶
- fastforward(input_sequence)¶
Feed a sequence of inputs to this state and return the resulting state.
- forward(input_sequence, return_state, include_first)¶
Like
Unidirectional.forward()
, but start with this state as the initial state.This can often be done more efficiently than using
next()
iteratively.- Parameters:
input_sequence (
Tensor
) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.return_state (
bool
) – Whether to return the lastState
of the module.include_first (
bool
) – Whether to prepend an extra tensor to the beginning of the output corresponding to a prediction for the first element in the input.
- Return type:
- Returns:
See
Unidirectional.forward()
.
- next(input_tensor)¶
Feed an input to this hidden state and produce the next hidden state.
- output()¶
Get the output associated with this state.
For example, this can be the hidden state vector itself, or the hidden state passed through an affine transformation.
The return value is either a tensor or a tuple whose first element is a tensor. The other elements of the tuple can be used to return extra outputs.
- Return type:
- Returns:
A \(B \times \cdots\) tensor, or a tuple whose first element is a tensor. The other elements of the tuple can contain extra outputs. If there are any extra outputs, then the output of
forward()
andUnidirectional.forward()
will contain the same number of extra outputs, where each extra output is alist
containing all the outputs across all timesteps.
- states(input_sequence, include_first)¶
Feed a sequence of inputs to this state and generate all the states produced after each input.
- Parameters:
- Return type:
- Returns:
Sequence of states produced by reading
input_sequence
.
- transform_tensors(func)¶
Return a copy of this state with all tensors passed through a function.
- __init__(num_layers, d_model, num_heads, feedforward_size, dropout, use_final_layer_norm)¶
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- forward(input_sequence, is_padding_mask=None, initial_state=None, return_state=False, include_first=True)¶
Run this module on an entire sequence of inputs all at once.
This can often be done more efficiently than processing each input one by one.
- Parameters:
input_sequence (
Tensor
) – A \(B \times n \times \cdots\) tensor representing a sequence of \(n\) input tensors.initial_state (
State
|None
) – An optional initial state to use instead of the default initial state created byinitial_state()
.return_state (
bool
) – Whether to return the lastState
of the module as an additional output. This state can be used to initialize a subsequent run.include_first (
bool
) – Whether to prepend an extra tensor to the beginning of the output corresponding to a prediction for the first element in the input. Ifinclude_first
is true, then the length of the output tensor will be \(n + 1\). Otherwise, it will be \(n\).args – Extra arguments passed to
initial_state()
.kwargs – Extra arguments passed to
initial_state()
.
- Return type:
- Returns:
A
Tensor
or aForwardResult
that contains the output tensor. The output tensor will be of size \(B \times n+1 \times \cdots\) ifinclude_first
is true and \(B \times n \times \cdots\) otherwise. IfUnidirectional.State.output()
returns extra outputs at each timestep, then they will be aggregated over all timesteps and returned aslist
s inForwardResult.extra_outputs
. Ifreturn_state
is true, then the finalState
will be returned inForwardResult.state
. If there are no extra outputs and there is no state to return, just the output tensor is returned.
- rau.models.get_transformer_encoder(vocabulary_size, shared_embeddings, positional_encoding_cacher, num_layers, d_model, num_heads, feedforward_size, dropout, use_padding, tag=None)¶
Construct a bidirectional transformer [Vaswani et al., 2017] encoder.
The transformer uses pre-norm instead of post-norm [Nguyen and Salazar, 2019, Wang et al., 2019].
- Parameters:
vocabulary_size (
int
) – The size of the input vocabulary.shared_embeddings (
Tensor
|None
) – An optional matrix of input embeddings that can be shared elsewhere.positional_encoding_cacher (
SinusoidalPositionalEncodingCacher
|None
) – Optional cache for computing positional encodings.num_layers (
int
) – Number of layers.d_model (
int
) – The size of the vector representations used in the model, or \(d_\mathrm{model}\).num_heads (
int
) – Number of attention heads per layer.feedforward_size (
int
) – Number of hidden units in each feedforward sublayer.dropout (
float
) – Dropout rate used throughout the transformer. Dropout is applied to the same places as in [Vaswani et al., 2017] and also to the hidden units of feedforward sublayers and the attention probabilities of the attention mechanism.use_padding (
bool
) – Whether to add a reserved padding index automatically.tag (
str
|None
) – An optional tag to add to the innerTransformerEncoderLayers
for argument routing.
- Return type:
- Returns:
A module. Unless
tag
is given, it accepts the same arguments asTransformerEncoderLayers
.
- class rau.models.TransformerEncoderLayers¶
Bases:
Module
A cascade of transformer layers.
- __init__(num_layers, d_model, num_heads, feedforward_size, dropout, use_final_layer_norm)¶
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- rau.models.get_transformer_decoder(input_vocabulary_size, output_vocabulary_size, shared_embeddings, positional_encoding_cacher, num_layers, d_model, num_heads, feedforward_size, dropout, use_padding, tag=None)¶
Construct a transformer decoder.
- Return type:
- class rau.models.TransformerDecoderLayers¶
Bases:
Unidirectional
- class State¶
Bases:
State
- __init__(decoder, encoder_sequence, encoder_is_padding_mask, previous_inputs)¶
- fastforward(input_sequence)¶
Feed a sequence of inputs to this state and return the resulting state.
- forward(input_sequence, return_state, include_first)¶
Like
Unidirectional.forward()
, but start with this state as the initial state.This can often be done more efficiently than using
next()
iteratively.- Parameters:
input_sequence (
Tensor
) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.return_state (
bool
) – Whether to return the lastState
of the module.include_first (
bool
) – Whether to prepend an extra tensor to the beginning of the output corresponding to a prediction for the first element in the input.
- Return type:
- Returns:
See
Unidirectional.forward()
.
- next(input_tensor)¶
Feed an input to this hidden state and produce the next hidden state.
- output()¶
Get the output associated with this state.
For example, this can be the hidden state vector itself, or the hidden state passed through an affine transformation.
The return value is either a tensor or a tuple whose first element is a tensor. The other elements of the tuple can be used to return extra outputs.
- Return type:
- Returns:
A \(B \times \cdots\) tensor, or a tuple whose first element is a tensor. The other elements of the tuple can contain extra outputs. If there are any extra outputs, then the output of
forward()
andUnidirectional.forward()
will contain the same number of extra outputs, where each extra output is alist
containing all the outputs across all timesteps.
- states(input_sequence, include_first)¶
Feed a sequence of inputs to this state and generate all the states produced after each input.
- Parameters:
- Return type:
- Returns:
Sequence of states produced by reading
input_sequence
.
- transform_tensors(func)¶
Return a copy of this state with all tensors passed through a function.
-
decoder:
TransformerDecoderLayers
¶
- __init__(num_layers, d_model, num_heads, feedforward_size, dropout, use_final_layer_norm)¶
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- forward(input_sequence, encoder_sequence, input_is_padding_mask=None, encoder_is_padding_mask=None, initial_state=None, return_state=False, include_first=True)¶
- Parameters:
input_sequence (
Tensor
) – The target sequence that is given as input to the decoder.encoder_sequence (
Tensor
) – The output sequence of the encoder.input_is_padding_mask (
Tensor
|None
) – A boolean tensor indicating which positions in the input to the decoder correspond to padding symbols that should be ignored. Important note: If padding only occurs at the end of a sequence, then providing this mask is not necessary, because the attention mechanism is causally masked anyway.
- Return type:
- rau.models.get_transformer_encoder_decoder(source_vocabulary_size, target_input_vocabulary_size, target_output_vocabulary_size, tie_embeddings, num_encoder_layers, num_decoder_layers, d_model, num_heads, feedforward_size, dropout, use_source_padding=True, use_target_padding=True)¶
Construct a transformer encoder-decoder.
- class rau.models.TransformerEncoderDecoder¶
Bases:
Module
- __init__(encoder, decoder)¶
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- forward(source_sequence, target_sequence, source_is_padding_mask=None, target_is_padding_mask=None)¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Return type:
- class rau.models.SinusoidalPositionalEncodingCacher¶
Bases:
Module
A module that caches a tensor of sinusoidal positional encodings.
Note that it is highly recommended to set a maximum size up-front before training to avoid CUDA memory fragmentation.
- __init__()¶
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- clear()¶
- get_encodings(sequence_length, d_model)¶
- set_allow_reallocation(value)¶
- class rau.models.SimpleRNN¶
Bases:
UnidirectionalBuiltinRNN
A simple RNN wrapped in the
Unidirectional
API.- __init__(input_size, hidden_units, layers=1, dropout=None, nonlinearity='tanh', bias=True, learned_hidden_state=False, use_extra_bias=False)¶
- Parameters:
input_size (
int
) – The size of the input vectors to the RNN.hidden_units (
int
) – The number of hidden units in each layer.layers (
int
) – The number of layers in the RNN.dropout (
float
|None
) – The amount of dropout applied in between layers. Iflayers
is 1, then this value is ignored.nonlinearity (
Literal
['tanh'
,'relu'
]) – The non-linearity applied to hidden units. Either'tanh'
or'relu'
.bias (
bool
) – Whether to use bias terms.learned_hidden_state (
bool
) – Whether the initial hidden state should be a learned parameter. If true, the initial hidden state will be the result of passing learned parameters through the activation function. If false, the initial state will be zeros.use_extra_bias (
bool
) – The built-in PyTorch implementation of the RNN includes redundant bias terms, resulting in more parameters than necessary. If this is true, the extra bias terms are kept. Otherwise, they are removed.
- class rau.models.LSTM¶
Bases:
UnidirectionalBuiltinRNN
An LSTM wrapped in the
Unidirectional
API.- __init__(input_size, hidden_units, layers=1, dropout=None, bias=True, learned_hidden_state=False, use_extra_bias=False)¶
- Parameters:
input_size (
int
) – The size of the input vectors to the LSTM.hidden_units (
int
) – The number of hidden units in each layer.layers (
int
) – The number of layers in the LSTM.dropout (
float
|None
) – The amount of dropout applied in between layers. Iflayers
is 1, then this value is ignored.bias (
bool
) – Whether to use bias terms.learned_hidden_state (
bool
) – Whether the initial hidden state should be a learned parameter. If true, the initial hidden state will be the result of passing learned parameters through the tanh activation function. If false, the initial state will be zeros. The initial memory cell is always zeros.use_extra_bias (
bool
) – The built-in PyTorch implementation of the LSTM includes redundant bias terms, resulting in more parameters than necessary. If this is true, the extra bias terms are kept. Otherwise, they are removed.