rau.models

rau.models.get_unidirectional_transformer_encoder(input_vocabulary_size, output_vocabulary_size, tie_embeddings, num_layers, d_model, num_heads, feedforward_size, dropout, use_padding, shared_embeddings=None, positional_encoding_cacher=None, tag=None)

Construct a causally-masked transformer encoder.

Return type:

Unidirectional

class rau.models.UnidirectionalTransformerEncoderLayers

Bases: Unidirectional

class State

Bases: State

__init__(encoder, previous_inputs)
batch_size()

Get the batch size of the tensors in this state.

Return type:

int

fastforward(input_sequence)

Feed a sequence of inputs to this state and return the resulting state.

Parameters:

input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

Return type:

State

Returns:

Updated state after reading input_sequence.

forward(input_sequence, return_state, include_first)

Like Unidirectional.forward(), but start with this state as the initial state.

This can often be done more efficiently than using next() iteratively.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • return_state (bool) – Whether to return the last State of the module.

  • include_first (bool) – Whether to prepend an extra tensor to the beginning of the output corresponding to a prediction for the first element in the input.

Return type:

Tensor | ForwardResult

Returns:

See Unidirectional.forward().

next(input_tensor)

Feed an input to this hidden state and produce the next hidden state.

Parameters:

input_tensor (Tensor) – A tensor of size \(B \times \cdots\), representing an input for a single timestep.

Return type:

State

output()

Get the output associated with this state.

For example, this can be the hidden state vector itself, or the hidden state passed through an affine transformation.

The return value is either a tensor or a tuple whose first element is a tensor. The other elements of the tuple can be used to return extra outputs.

Return type:

Tensor | tuple[Tensor, ...]

Returns:

A \(B \times \cdots\) tensor, or a tuple whose first element is a tensor. The other elements of the tuple can contain extra outputs. If there are any extra outputs, then the output of forward() and Unidirectional.forward() will contain the same number of extra outputs, where each extra output is a list containing all the outputs across all timesteps.

outputs(input_sequence, include_first)

Like states(), but return the states’ outputs.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • include_first (bool) – Whether to include the output of self as the first output.

Return type:

Iterable[Tensor] | Iterable[tuple[Tensor, ...]]

states(input_sequence, include_first)

Feed a sequence of inputs to this state and generate all the states produced after each input.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • include_first (bool) – Whether to include self as the first state in the returned sequence of states.

Return type:

Iterable[State]

Returns:

Sequence of states produced by reading input_sequence.

transform_tensors(func)

Return a copy of this state with all tensors passed through a function.

Parameters:

func (Callable[[Tensor], Tensor]) – A function that will be applied to all tensors in this state.

Return type:

State

encoder: UnidirectionalTransformerEncoderLayers
previous_inputs: Tensor
__init__(num_layers, d_model, num_heads, feedforward_size, dropout, use_final_layer_norm)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(input_sequence, is_padding_mask=None, initial_state=None, return_state=False, include_first=True)

Run this module on an entire sequence of inputs all at once.

This can often be done more efficiently than processing each input one by one.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor representing a sequence of \(n\) input tensors.

  • initial_state (State | None) – An optional initial state to use instead of the default initial state created by initial_state().

  • return_state (bool) – Whether to return the last State of the module as an additional output. This state can be used to initialize a subsequent run.

  • include_first (bool) – Whether to prepend an extra tensor to the beginning of the output corresponding to a prediction for the first element in the input. If include_first is true, then the length of the output tensor will be \(n + 1\). Otherwise, it will be \(n\).

  • args – Extra arguments passed to initial_state().

  • kwargs – Extra arguments passed to initial_state().

Return type:

Tensor | ForwardResult

Returns:

A Tensor or a ForwardResult that contains the output tensor. The output tensor will be of size \(B \times n+1 \times \cdots\) if include_first is true and \(B \times n \times \cdots\) otherwise. If Unidirectional.State.output() returns extra outputs at each timestep, then they will be aggregated over all timesteps and returned as lists in ForwardResult.extra_outputs. If return_state is true, then the final State will be returned in ForwardResult.state. If there are no extra outputs and there is no state to return, just the output tensor is returned.

initial_state(batch_size)

Get the initial state of the RNN.

Parameters:
  • batch_size (int) – Batch size.

  • args – Extra arguments passed from forward().

  • kwargs – Extra arguments passed from forward().

Return type:

State

Returns:

A state.

rau.models.get_transformer_encoder(vocabulary_size, shared_embeddings, positional_encoding_cacher, num_layers, d_model, num_heads, feedforward_size, dropout, use_padding, tag=None)

Construct a bidirectional transformer [Vaswani et al., 2017] encoder.

The transformer uses pre-norm instead of post-norm [Nguyen and Salazar, 2019, Wang et al., 2019].

Parameters:
  • vocabulary_size (int) – The size of the input vocabulary.

  • shared_embeddings (Tensor | None) – An optional matrix of input embeddings that can be shared elsewhere.

  • positional_encoding_cacher (SinusoidalPositionalEncodingCacher | None) – Optional cache for computing positional encodings.

  • num_layers (int) – Number of layers.

  • d_model (int) – The size of the vector representations used in the model, or \(d_\mathrm{model}\).

  • num_heads (int) – Number of attention heads per layer.

  • feedforward_size (int) – Number of hidden units in each feedforward sublayer.

  • dropout (float) – Dropout rate used throughout the transformer. Dropout is applied to the same places as in [Vaswani et al., 2017] and also to the hidden units of feedforward sublayers and the attention probabilities of the attention mechanism.

  • use_padding (bool) – Whether to add a reserved padding index automatically.

  • tag (str | None) – An optional tag to add to the inner TransformerEncoderLayers for argument routing.

Return type:

Module

Returns:

A module. Unless tag is given, it accepts the same arguments as TransformerEncoderLayers.

class rau.models.TransformerEncoderLayers

Bases: Module

A cascade of transformer layers.

__init__(num_layers, d_model, num_heads, feedforward_size, dropout, use_final_layer_norm)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(source_sequence, is_padding_mask=None)
Parameters:
  • source_sequence (Tensor) – Input tensor.

  • is_padding_mask (Tensor | None) – A Boolean tensor indicating which positions in the input should be treated as padding symbols and ignored.

Return type:

Tensor

rau.models.get_transformer_decoder(input_vocabulary_size, output_vocabulary_size, shared_embeddings, positional_encoding_cacher, num_layers, d_model, num_heads, feedforward_size, dropout, use_padding, tag=None)

Construct a transformer decoder.

Return type:

Unidirectional

class rau.models.TransformerDecoderLayers

Bases: Unidirectional

class State

Bases: State

__init__(decoder, encoder_sequence, encoder_is_padding_mask, previous_inputs)
batch_size()

Get the batch size of the tensors in this state.

Return type:

int

fastforward(input_sequence)

Feed a sequence of inputs to this state and return the resulting state.

Parameters:

input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

Return type:

State

Returns:

Updated state after reading input_sequence.

forward(input_sequence, return_state, include_first)

Like Unidirectional.forward(), but start with this state as the initial state.

This can often be done more efficiently than using next() iteratively.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • return_state (bool) – Whether to return the last State of the module.

  • include_first (bool) – Whether to prepend an extra tensor to the beginning of the output corresponding to a prediction for the first element in the input.

Return type:

Tensor | ForwardResult

Returns:

See Unidirectional.forward().

next(input_tensor)

Feed an input to this hidden state and produce the next hidden state.

Parameters:

input_tensor (Tensor) – A tensor of size \(B \times \cdots\), representing an input for a single timestep.

Return type:

State

output()

Get the output associated with this state.

For example, this can be the hidden state vector itself, or the hidden state passed through an affine transformation.

The return value is either a tensor or a tuple whose first element is a tensor. The other elements of the tuple can be used to return extra outputs.

Return type:

Tensor | tuple[Tensor, ...]

Returns:

A \(B \times \cdots\) tensor, or a tuple whose first element is a tensor. The other elements of the tuple can contain extra outputs. If there are any extra outputs, then the output of forward() and Unidirectional.forward() will contain the same number of extra outputs, where each extra output is a list containing all the outputs across all timesteps.

outputs(input_sequence, include_first)

Like states(), but return the states’ outputs.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • include_first (bool) – Whether to include the output of self as the first output.

Return type:

Iterable[Tensor] | Iterable[tuple[Tensor, ...]]

states(input_sequence, include_first)

Feed a sequence of inputs to this state and generate all the states produced after each input.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • include_first (bool) – Whether to include self as the first state in the returned sequence of states.

Return type:

Iterable[State]

Returns:

Sequence of states produced by reading input_sequence.

transform_tensors(func)

Return a copy of this state with all tensors passed through a function.

Parameters:

func (Callable[[Tensor], Tensor]) – A function that will be applied to all tensors in this state.

Return type:

State

decoder: TransformerDecoderLayers
encoder_sequence: Tensor
encoder_is_padding_mask: Tensor
previous_inputs: Tensor
__init__(num_layers, d_model, num_heads, feedforward_size, dropout, use_final_layer_norm)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(input_sequence, encoder_sequence, input_is_padding_mask=None, encoder_is_padding_mask=None, initial_state=None, return_state=False, include_first=True)
Parameters:
  • input_sequence (Tensor) – The target sequence that is given as input to the decoder.

  • encoder_sequence (Tensor) – The output sequence of the encoder.

  • input_is_padding_mask (Tensor | None) – A boolean tensor indicating which positions in the input to the decoder correspond to padding symbols that should be ignored. Important note: If padding only occurs at the end of a sequence, then providing this mask is not necessary, because the attention mechanism is causally masked anyway.

Return type:

Tensor | ForwardResult

initial_state(batch_size, encoder_sequence, encoder_is_padding_mask)

Get the initial state of the RNN.

Parameters:
  • batch_size (int) – Batch size.

  • args – Extra arguments passed from forward().

  • kwargs – Extra arguments passed from forward().

Return type:

State

Returns:

A state.

rau.models.get_transformer_encoder_decoder(source_vocabulary_size, target_input_vocabulary_size, target_output_vocabulary_size, tie_embeddings, num_encoder_layers, num_decoder_layers, d_model, num_heads, feedforward_size, dropout, use_source_padding=True, use_target_padding=True)

Construct a transformer encoder-decoder.

class rau.models.TransformerEncoderDecoder

Bases: Module

__init__(encoder, decoder)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(source_sequence, target_sequence, source_is_padding_mask=None, target_is_padding_mask=None)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Return type:

Tensor

initial_decoder_state(source_sequence, source_is_padding_mask)
Return type:

State

class rau.models.SinusoidalPositionalEncodingCacher

Bases: Module

A module that caches a tensor of sinusoidal positional encodings.

Note that it is highly recommended to set a maximum size up-front before training to avoid CUDA memory fragmentation.

__init__()

Initialize internal Module state, shared by both nn.Module and ScriptModule.

clear()
get_encodings(sequence_length, d_model)
set_allow_reallocation(value)
class rau.models.SimpleRNN

Bases: UnidirectionalBuiltinRNN

A simple RNN wrapped in the Unidirectional API.

RNN_CLASS

alias of RNN

__init__(input_size, hidden_units, layers=1, dropout=None, nonlinearity='tanh', bias=True, learned_hidden_state=False, use_extra_bias=False)
Parameters:
  • input_size (int) – The size of the input vectors to the RNN.

  • hidden_units (int) – The number of hidden units in each layer.

  • layers (int) – The number of layers in the RNN.

  • dropout (float | None) – The amount of dropout applied in between layers. If layers is 1, then this value is ignored.

  • nonlinearity (Literal['tanh', 'relu']) – The non-linearity applied to hidden units. Either 'tanh' or 'relu'.

  • bias (bool) – Whether to use bias terms.

  • learned_hidden_state (bool) – Whether the initial hidden state should be a learned parameter. If true, the initial hidden state will be the result of passing learned parameters through the activation function. If false, the initial state will be zeros.

  • use_extra_bias (bool) – The built-in PyTorch implementation of the RNN includes redundant bias terms, resulting in more parameters than necessary. If this is true, the extra bias terms are kept. Otherwise, they are removed.

class rau.models.LSTM

Bases: UnidirectionalBuiltinRNN

An LSTM wrapped in the Unidirectional API.

RNN_CLASS

alias of LSTM

__init__(input_size, hidden_units, layers=1, dropout=None, bias=True, learned_hidden_state=False, use_extra_bias=False)
Parameters:
  • input_size (int) – The size of the input vectors to the LSTM.

  • hidden_units (int) – The number of hidden units in each layer.

  • layers (int) – The number of layers in the LSTM.

  • dropout (float | None) – The amount of dropout applied in between layers. If layers is 1, then this value is ignored.

  • bias (bool) – Whether to use bias terms.

  • learned_hidden_state (bool) – Whether the initial hidden state should be a learned parameter. If true, the initial hidden state will be the result of passing learned parameters through the tanh activation function. If false, the initial state will be zeros. The initial memory cell is always zeros.

  • use_extra_bias (bool) – The built-in PyTorch implementation of the LSTM includes redundant bias terms, resulting in more parameters than necessary. If this is true, the extra bias terms are kept. Otherwise, they are removed.

rau.models.get_shared_embeddings(tie_embeddings, input_vocabulary_size, output_vocabulary_size, embedding_size, use_padding)

Construct a matrix of embedding vectors that can be used as both input embeddings and output embeddings.

Return type:

Tensor | None