rau.unidirectional

class rau.unidirectional.Unidirectional

Bases: Module

An API for unidirectional sequential neural networks (including RNNs and transformer decoders).

Let \(B\) be batch size, and \(n\) be the length of the input sequence.

class State

Bases: object

Represents the hidden state of the module after processing a certain number of inputs.

batch_size()

Get the batch size of the tensors in this state.

Return type:

int

detach()

Return a copy of this state with all tensors detached.

Return type:

State

fastforward(input_sequence)

Feed a sequence of inputs to this state and return the resulting state.

Parameters:

input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

Return type:

State

Returns:

Updated state after reading input_sequence.

forward(input_sequence, return_state, include_first)

Like Unidirectional.forward(), but start with this state as the initial state.

This can often be done more efficiently than using next() iteratively.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • return_state (bool) – Whether to return the last State of the module.

  • include_first (bool) – Whether to prepend an extra tensor to the beginning of the output corresponding to a prediction for the first element in the input.

Return type:

Tensor | ForwardResult

Returns:

See Unidirectional.forward().

next(input_tensor)

Feed an input to this hidden state and produce the next hidden state.

Parameters:

input_tensor (Tensor) – A tensor of size \(B \times \cdots\), representing an input for a single timestep.

Return type:

State

output()

Get the output associated with this state.

For example, this can be the hidden state vector itself, or the hidden state passed through an affine transformation.

The return value is either a tensor or a tuple whose first element is a tensor. The other elements of the tuple can be used to return extra outputs.

Return type:

Tensor | tuple[Tensor, ...]

Returns:

A \(B \times \cdots\) tensor, or a tuple whose first element is a tensor. The other elements of the tuple can contain extra outputs. If there are any extra outputs, then the output of forward() and Unidirectional.forward() will contain the same number of extra outputs, where each extra output is a list containing all the outputs across all timesteps.

outputs(input_sequence, include_first)

Like states(), but return the states’ outputs.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • include_first (bool) – Whether to include the output of self as the first output.

Return type:

Iterable[Tensor] | Iterable[tuple[Tensor, ...]]

slice_batch(s)

Return a copy of this state with only certain batch elements included, determined by the slice s.

Parameters:

s (slice) – The slice object used to determine which batch elements to keep.

Return type:

State

states(input_sequence, include_first)

Feed a sequence of inputs to this state and generate all the states produced after each input.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • include_first (bool) – Whether to include self as the first state in the returned sequence of states.

Return type:

Iterable[State]

Returns:

Sequence of states produced by reading input_sequence.

transform_tensors(func)

Return a copy of this state with all tensors passed through a function.

Parameters:

func (Callable[[Tensor], Tensor]) – A function that will be applied to all tensors in this state.

Return type:

State

__init__(tags=None)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

as_composable()
Return type:

Composable

forward(input_sequence, *args, initial_state=None, return_state=False, include_first=True, **kwargs)

Run this module on an entire sequence of inputs all at once.

This can often be done more efficiently than processing each input one by one.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor representing a sequence of \(n\) input tensors.

  • initial_state (State | None) – An optional initial state to use instead of the default initial state created by initial_state().

  • return_state (bool) – Whether to return the last State of the module as an additional output. This state can be used to initialize a subsequent run.

  • include_first (bool) – Whether to prepend an extra tensor to the beginning of the output corresponding to a prediction for the first element in the input. If include_first is true, then the length of the output tensor will be \(n + 1\). Otherwise, it will be \(n\).

  • args (Any) – Extra arguments passed to initial_state().

  • kwargs (Any) – Extra arguments passed to initial_state().

Return type:

Tensor | ForwardResult

Returns:

A Tensor or a ForwardResult that contains the output tensor. The output tensor will be of size \(B \times n+1 \times \cdots\) if include_first is true and \(B \times n \times \cdots\) otherwise. If Unidirectional.State.output() returns extra outputs at each timestep, then they will be aggregated over all timesteps and returned as lists in ForwardResult.extra_outputs. If return_state is true, then the final State will be returned in ForwardResult.state. If there are no extra outputs and there is no state to return, just the output tensor is returned.

initial_state(batch_size, *args, **kwargs)

Get the initial state of the RNN.

Parameters:
  • batch_size (int) – Batch size.

  • args (Any) – Extra arguments passed from forward().

  • kwargs (Any) – Extra arguments passed from forward().

Return type:

State

Returns:

A state.

main()
Return type:

Unidirectional

tag(tag)
Return type:

Unidirectional

class rau.unidirectional.ForwardResult

Bases: object

The output of a call to Unidirectional.forward() or Unidirectional.State.forward().

__init__(output, extra_outputs, state)
output: Tensor

The main output tensor of the module.

extra_outputs: Sequence[Sequence[Any]]

A list of extra outputs returned alongside the main output.

state: State | None

An optional state representing the updated state of the module after reading the inputs.

class rau.unidirectional.SimpleUnidirectional

Bases: Unidirectional

A sequential module that has no temporal recurrence, but applies some function to every timestep.

class State

Bases: State

__init__(parent, input_tensor, batch_size, args, kwargs)
batch_size()

Get the batch size of the tensors in this state.

Return type:

int

fastforward(input_sequence)

Feed a sequence of inputs to this state and return the resulting state.

Parameters:

input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

Return type:

State

Returns:

Updated state after reading input_sequence.

forward(input_sequence, return_state, include_first)

Like Unidirectional.forward(), but start with this state as the initial state.

This can often be done more efficiently than using next() iteratively.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • return_state (bool) – Whether to return the last State of the module.

  • include_first (bool) – Whether to prepend an extra tensor to the beginning of the output corresponding to a prediction for the first element in the input.

Return type:

Tensor | ForwardResult

Returns:

See Unidirectional.forward().

next(input_tensor)

Feed an input to this hidden state and produce the next hidden state.

Parameters:

input_tensor (Tensor) – A tensor of size \(B \times \cdots\), representing an input for a single timestep.

Return type:

State

output()

Get the output associated with this state.

For example, this can be the hidden state vector itself, or the hidden state passed through an affine transformation.

The return value is either a tensor or a tuple whose first element is a tensor. The other elements of the tuple can be used to return extra outputs.

Return type:

Tensor | tuple[Tensor, ...]

Returns:

A \(B \times \cdots\) tensor, or a tuple whose first element is a tensor. The other elements of the tuple can contain extra outputs. If there are any extra outputs, then the output of forward() and Unidirectional.forward() will contain the same number of extra outputs, where each extra output is a list containing all the outputs across all timesteps.

outputs(input_sequence, include_first)

Like states(), but return the states’ outputs.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • include_first (bool) – Whether to include the output of self as the first output.

Return type:

Iterable[Tensor] | Iterable[tuple[Tensor, ...]]

transform_tensors(func)

Return a copy of this state with all tensors passed through a function.

Parameters:

func (Callable[[Tensor], Tensor]) – A function that will be applied to all tensors in this state.

Return type:

State

parent: SimpleUnidirectional
input_tensor: Tensor | None
args: list[Any]
kwargs: dict[str, Any]
forward_sequence(input_sequence, *args, **kwargs)

Transform a sequence of tensors.

Parameters:

input_sequence (Tensor) – A tensor of size \(B \times n \times \cdots\) representing a sequence of tensors.

Return type:

Tensor

Returns:

A tensor of size \(B \times n \cdots\).

forward_single(input_tensor, *args, **kwargs)

Transform an input tensor for a single timestep.

Parameters:

input_tensor (Tensor) – A tensor of size \(B \times \cdots\) representing a tensor for a single timestep.

Return type:

Tensor

Returns:

A tensor of size \(B \times cdots\).

initial_output(batch_size, *args, **kwargs)

Get the output of the initial state. By default, this simply raises an error.

Parameters:

batch_size (int) – Batch size.

Return type:

Tensor

Returns:

A tensor of size \(B \times \cdots\).

initial_state(batch_size, *args, **kwargs)

Get the initial state of the RNN.

Parameters:
  • batch_size (int) – Batch size.

  • args (Any) – Extra arguments passed from forward().

  • kwargs (Any) – Extra arguments passed from forward().

Return type:

State

Returns:

A state.

transform_args(args, func)
Return type:

list[Any]

transform_kwargs(kwargs, func)
Return type:

dict[str, Any]

class rau.unidirectional.SimpleLayerUnidirectional

Bases: SimpleUnidirectional

__init__(func)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward_sequence(input_sequence, *args, **kwargs)

Transform a sequence of tensors.

Parameters:

input_sequence (Tensor) – A tensor of size \(B \times n \times \cdots\) representing a sequence of tensors.

Return type:

Tensor

Returns:

A tensor of size \(B \times n \cdots\).

forward_single(input_tensor, *args, **kwargs)

Transform an input tensor for a single timestep.

Parameters:

input_tensor (Tensor) – A tensor of size \(B \times \cdots\) representing a tensor for a single timestep.

Return type:

Tensor

Returns:

A tensor of size \(B \times cdots\).

class rau.unidirectional.SimpleReshapingLayerUnidirectional

Bases: SimpleLayerUnidirectional

forward_single(input_tensor, *args, **kwargs)

Transform an input tensor for a single timestep.

Parameters:

input_tensor (Tensor) – A tensor of size \(B \times \cdots\) representing a tensor for a single timestep.

Return type:

Tensor

Returns:

A tensor of size \(B \times cdots\).

class rau.unidirectional.PositionalUnidirectional

Bases: Unidirectional

class State

Bases: State

__init__(parent, position, input_tensor)
batch_size()

Get the batch size of the tensors in this state.

Return type:

int

fastforward(input_sequence)

Feed a sequence of inputs to this state and return the resulting state.

Parameters:

input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

Return type:

State

Returns:

Updated state after reading input_sequence.

forward(input_sequence, return_state, include_first)

Like Unidirectional.forward(), but start with this state as the initial state.

This can often be done more efficiently than using next() iteratively.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • return_state (bool) – Whether to return the last State of the module.

  • include_first (bool) – Whether to prepend an extra tensor to the beginning of the output corresponding to a prediction for the first element in the input.

Return type:

Tensor | ForwardResult

Returns:

See Unidirectional.forward().

next(input_tensor)

Feed an input to this hidden state and produce the next hidden state.

Parameters:

input_tensor (Tensor) – A tensor of size \(B \times \cdots\), representing an input for a single timestep.

Return type:

State

output()

Get the output associated with this state.

For example, this can be the hidden state vector itself, or the hidden state passed through an affine transformation.

The return value is either a tensor or a tuple whose first element is a tensor. The other elements of the tuple can be used to return extra outputs.

Return type:

Tensor | tuple[Tensor, ...]

Returns:

A \(B \times \cdots\) tensor, or a tuple whose first element is a tensor. The other elements of the tuple can contain extra outputs. If there are any extra outputs, then the output of forward() and Unidirectional.forward() will contain the same number of extra outputs, where each extra output is a list containing all the outputs across all timesteps.

outputs(input_sequence, include_first)

Like states(), but return the states’ outputs.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • include_first (bool) – Whether to include the output of self as the first output.

Return type:

Iterable[Tensor] | Iterable[tuple[Tensor, ...]]

transform_tensors(func)

Return a copy of this state with all tensors passed through a function.

Parameters:

func (Callable[[Tensor], Tensor]) – A function that will be applied to all tensors in this state.

Return type:

State

parent: PositionalUnidirectional
position: int
input_tensor: Tensor | None
forward_at_position(input_tensor, position)

Compute the output for a single input at a certain position.

Parameters:
  • input_tensor (Tensor) – A tensor of size \(B \times \cdots\) representing an input tensor for a single timestep.

  • position (int) – An index indicating the current timestep. The first timestep has index 0.

Return type:

Tensor

Returns:

A tensor of size \(B \times \cdots\) representing the output tensor corresponding to the input tensor.

forward_from_position(input_sequence, position)

Compute the outputs for a sequence of inputs, starting at a certain position.

Parameters:
  • input_sequence (Tensor) – A tensor of size \(B \times n \times \cdots\) representing a sequence of input tensors.

  • position (int) – An index indicating the timestep corresponding to the first input of input_sequence. The first timestep has index 0.

Return type:

Tensor

Returns:

A tensor of size \(B \times n' \times \cdots\) representing a sequence of output tensors.

initial_state(batch_size, *args, **kwargs)

Get the initial state of the RNN.

Parameters:
  • batch_size (int) – Batch size.

  • args (Any) – Extra arguments passed from forward().

  • kwargs (Any) – Extra arguments passed from forward().

Return type:

State

Returns:

A state.

class rau.unidirectional.ComposedUnidirectional

Bases: Unidirectional

Stacks one undirectional model on another, so that the outputs of the first are fed as inputs to the second.

class State

Bases: State

State(first_state: rau.unidirectional.unidirectional.Unidirectional.State, second_state: rau.unidirectional.unidirectional.Unidirectional.State)

__init__(first_state, second_state)
batch_size()

Get the batch size of the tensors in this state.

Return type:

int

detach()

Return a copy of this state with all tensors detached.

Return type:

State

fastforward(input_sequence)

Feed a sequence of inputs to this state and return the resulting state.

Parameters:

input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

Return type:

State

Returns:

Updated state after reading input_sequence.

forward(input_sequence, return_state, include_first)

Like Unidirectional.forward(), but start with this state as the initial state.

This can often be done more efficiently than using next() iteratively.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • return_state (bool) – Whether to return the last State of the module.

  • include_first (bool) – Whether to prepend an extra tensor to the beginning of the output corresponding to a prediction for the first element in the input.

Return type:

Tensor | ForwardResult

Returns:

See Unidirectional.forward().

next(input_tensor)

Feed an input to this hidden state and produce the next hidden state.

Parameters:

input_tensor (Tensor) – A tensor of size \(B \times \cdots\), representing an input for a single timestep.

Return type:

State

output()

Get the output associated with this state.

For example, this can be the hidden state vector itself, or the hidden state passed through an affine transformation.

The return value is either a tensor or a tuple whose first element is a tensor. The other elements of the tuple can be used to return extra outputs.

Return type:

Tensor | tuple[Tensor, ...]

Returns:

A \(B \times \cdots\) tensor, or a tuple whose first element is a tensor. The other elements of the tuple can contain extra outputs. If there are any extra outputs, then the output of forward() and Unidirectional.forward() will contain the same number of extra outputs, where each extra output is a list containing all the outputs across all timesteps.

outputs(input_sequence, include_first)

Like states(), but return the states’ outputs.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • include_first (bool) – Whether to include the output of self as the first output.

Return type:

Iterable[Tensor] | Iterable[tuple[Tensor, ...]]

slice_batch(s)

Return a copy of this state with only certain batch elements included, determined by the slice s.

Parameters:

s (slice) – The slice object used to determine which batch elements to keep.

Return type:

State

transform_tensors(func)

Return a copy of this state with all tensors passed through a function.

Parameters:

func (Callable[[Tensor], Tensor]) – A function that will be applied to all tensors in this state.

Return type:

State

first_state: State
second_state: State
__init__(first, second)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(input_sequence, *args, initial_state=None, return_state=False, include_first=True, tag_kwargs=None, **kwargs)

Run this module on an entire sequence of inputs all at once.

This can often be done more efficiently than processing each input one by one.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor representing a sequence of \(n\) input tensors.

  • initial_state (State | None) – An optional initial state to use instead of the default initial state created by initial_state().

  • return_state (bool) – Whether to return the last State of the module as an additional output. This state can be used to initialize a subsequent run.

  • include_first (bool) – Whether to prepend an extra tensor to the beginning of the output corresponding to a prediction for the first element in the input. If include_first is true, then the length of the output tensor will be \(n + 1\). Otherwise, it will be \(n\).

  • args (Any) – Extra arguments passed to initial_state().

  • kwargs (Any) – Extra arguments passed to initial_state().

Return type:

Tensor | ForwardResult

Returns:

A Tensor or a ForwardResult that contains the output tensor. The output tensor will be of size \(B \times n+1 \times \cdots\) if include_first is true and \(B \times n \times \cdots\) otherwise. If Unidirectional.State.output() returns extra outputs at each timestep, then they will be aggregated over all timesteps and returned as lists in ForwardResult.extra_outputs. If return_state is true, then the final State will be returned in ForwardResult.state. If there are no extra outputs and there is no state to return, just the output tensor is returned.

initial_state(batch_size, *args, tag_kwargs=None, **kwargs)

Get the initial state of the RNN.

Parameters:
  • batch_size – Batch size.

  • args – Extra arguments passed from forward().

  • kwargs – Extra arguments passed from forward().

Returns:

A state.

class rau.unidirectional.DropoutUnidirectional

Bases: SimpleLayerUnidirectional

__init__(dropout)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

class rau.unidirectional.EmbeddingUnidirectional

Bases: SimpleLayerUnidirectional

__init__(vocabulary_size, output_size, use_padding, shared_embeddings=None)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

class rau.unidirectional.OutputUnidirectional

Bases: SimpleLayerUnidirectional

__init__(input_size, vocabulary_size, shared_embeddings=None, bias=True)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

class rau.unidirectional.ResidualUnidirectional

Bases: Unidirectional

class State

Bases: State

State(input_tensor: torch.Tensor | None, wrapped_state: rau.unidirectional.unidirectional.Unidirectional.State)

__init__(input_tensor, wrapped_state)
batch_size()

Get the batch size of the tensors in this state.

Return type:

int

fastforward(input_sequence)

Feed a sequence of inputs to this state and return the resulting state.

Parameters:

input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

Return type:

State

Returns:

Updated state after reading input_sequence.

forward(input_sequence, return_state, include_first)

Like Unidirectional.forward(), but start with this state as the initial state.

This can often be done more efficiently than using next() iteratively.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • return_state (bool) – Whether to return the last State of the module.

  • include_first (bool) – Whether to prepend an extra tensor to the beginning of the output corresponding to a prediction for the first element in the input.

Return type:

Tensor | ForwardResult

Returns:

See Unidirectional.forward().

next(input_tensor)

Feed an input to this hidden state and produce the next hidden state.

Parameters:

input_tensor (Tensor) – A tensor of size \(B \times \cdots\), representing an input for a single timestep.

Return type:

State

output()

Get the output associated with this state.

For example, this can be the hidden state vector itself, or the hidden state passed through an affine transformation.

The return value is either a tensor or a tuple whose first element is a tensor. The other elements of the tuple can be used to return extra outputs.

Return type:

Tensor | tuple[Tensor, ...]

Returns:

A \(B \times \cdots\) tensor, or a tuple whose first element is a tensor. The other elements of the tuple can contain extra outputs. If there are any extra outputs, then the output of forward() and Unidirectional.forward() will contain the same number of extra outputs, where each extra output is a list containing all the outputs across all timesteps.

outputs(input_sequence, include_first)

Like states(), but return the states’ outputs.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • include_first (bool) – Whether to include the output of self as the first output.

Return type:

Iterable[Tensor] | Iterable[tuple[Tensor, ...]]

states(input_sequence, include_first)

Feed a sequence of inputs to this state and generate all the states produced after each input.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor, representing \(n\) input tensors.

  • include_first (bool) – Whether to include self as the first state in the returned sequence of states.

Return type:

Iterable[State]

Returns:

Sequence of states produced by reading input_sequence.

transform_tensors(func)

Return a copy of this state with all tensors passed through a function.

Parameters:

func (Callable[[Tensor], Tensor]) – A function that will be applied to all tensors in this state.

Return type:

State

input_tensor: Tensor | None
wrapped_state: State
__init__(module)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(input_sequence, initial_state=None, return_state=False, include_first=True, **kwargs)

Run this module on an entire sequence of inputs all at once.

This can often be done more efficiently than processing each input one by one.

Parameters:
  • input_sequence (Tensor) – A \(B \times n \times \cdots\) tensor representing a sequence of \(n\) input tensors.

  • initial_state (State | None) – An optional initial state to use instead of the default initial state created by initial_state().

  • return_state (bool) – Whether to return the last State of the module as an additional output. This state can be used to initialize a subsequent run.

  • include_first (bool) – Whether to prepend an extra tensor to the beginning of the output corresponding to a prediction for the first element in the input. If include_first is true, then the length of the output tensor will be \(n + 1\). Otherwise, it will be \(n\).

  • args – Extra arguments passed to initial_state().

  • kwargs (Any) – Extra arguments passed to initial_state().

Return type:

Tensor | ForwardResult

Returns:

A Tensor or a ForwardResult that contains the output tensor. The output tensor will be of size \(B \times n+1 \times \cdots\) if include_first is true and \(B \times n \times \cdots\) otherwise. If Unidirectional.State.output() returns extra outputs at each timestep, then they will be aggregated over all timesteps and returned as lists in ForwardResult.extra_outputs. If return_state is true, then the final State will be returned in ForwardResult.state. If there are no extra outputs and there is no state to return, just the output tensor is returned.

initial_state(batch_size, *args, **kwargs)

Get the initial state of the RNN.

Parameters:
  • batch_size (int) – Batch size.

  • args (Any) – Extra arguments passed from forward().

  • kwargs (Any) – Extra arguments passed from forward().

Return type:

State

Returns:

A state.