You're reading the documentation for a development version. For the latest released version, please have a look at v0.2.0.
rau.vocab
¶
This module provides tools for mapping token types to integer IDs.
- class rau.vocab.Vocabulary¶
Bases:
object
An abstract base class that represents a mapping between token types and integer IDs.
- class rau.vocab.VocabularyBuilder¶
Bases:
Generic
[V
]An abstract base class that can be used for constructing
Vocabulary
objects of a certain type.- catchall(token)¶
Build a vocabulary that maps all token types to a single token type. This implements the behavior of an UNK token.
- content(tokens)¶
Build a vocabulary that assigns consecutive integer IDs to a list of token strings. These are “content” tokens in the sense that they come from a corpus and are not special tokens.
- reserved(tokens)¶
Build a vocabulary that assigns consecutive integer IDs to a list of special reserved tokens. The token strings of these special tokens are for display purposes only and will never conflict with content tokens.
- rau.vocab.build_to_int_vocabulary(func)¶
- Return type:
- class rau.vocab.ToIntVocabulary¶
Bases:
Vocabulary
- class rau.vocab.ToIntVocabularyBuilder¶
Bases:
VocabularyBuilder
[ToIntVocabulary
]- catchall(token)¶
- Return type:
- content(tokens)¶
- Return type:
- reserved(tokens)¶
- Return type:
- rau.vocab.build_to_string_vocabulary(func)¶
- Return type:
- class rau.vocab.ToStringVocabulary¶
Bases:
Vocabulary
- __init__(reserved_names)¶
- class rau.vocab.ToStringVocabularyBuilder¶
Bases:
VocabularyBuilder
[ToStringVocabulary
]- catchall(token)¶
- Return type:
- content(tokens)¶
- Return type:
- reserved(tokens)¶
- Return type: