SentencePieceTokenizer.CountTokens Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Overloads
| CountTokens(String, Boolean, Boolean, Boolean, Boolean, String, Int32, Int32) |
Get the number of tokens that the input text will be encoded to. |
| CountTokens(ReadOnlySpan<Char>, Boolean, Boolean, Boolean, Boolean, String, Int32, Int32) |
Get the number of tokens that the input text will be encoded to. |
| CountTokens(String, Boolean, Boolean, Boolean, Boolean) |
Get the number of tokens that the input text will be encoded to. |
| CountTokens(ReadOnlySpan<Char>, Boolean, Boolean, Boolean, Boolean) |
Get the number of tokens that the input text will be encoded to. |
| CountTokens(String, ReadOnlySpan<Char>, EncodeSettings) |
Get the number of tokens that the input text will be encoded to. |
CountTokens(String, Boolean, Boolean, Boolean, Boolean, String, Int32, Int32)
- Source:
- SentencePieceTokenizer.cs
- Source:
- SentencePieceTokenizer.cs
- Source:
- SentencePieceTokenizer.cs
Get the number of tokens that the input text will be encoded to.
public int CountTokens(string text, bool addBeginningOfSentence, bool addEndOfSentence, bool considerPreTokenization, bool considerNormalization, out string? normalizedText, out int charsConsumed, int maxTokenCount = 2147483647);
override this.CountTokens : string * bool * bool * bool * bool * string * int * int -> int
Public Function CountTokens (text As String, addBeginningOfSentence As Boolean, addEndOfSentence As Boolean, considerPreTokenization As Boolean, considerNormalization As Boolean, ByRef normalizedText As String, ByRef charsConsumed As Integer, Optional maxTokenCount As Integer = 2147483647) As Integer
Parameters
- text
- String
The text to encode.
- addBeginningOfSentence
- Boolean
Indicate emitting the beginning of sentence token during the encoding.
- addEndOfSentence
- Boolean
Indicate emitting the end of sentence token during the encoding.
- considerPreTokenization
- Boolean
Indicate whether to consider pre-tokenization before tokenization.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
- normalizedText
- String
If the tokenizer's normalization is enabled or <paramRef name="considerNormalization"></paramRef> is false, this will be set to <paramRef name="text"></paramRef> in its normalized form; otherwise, this value will be set to null.
- charsConsumed
- Int32
The length of the text that encompasses the maximum encoded tokens.
- maxTokenCount
- Int32
The maximum number of tokens to encode.
Returns
The number of tokens that the input text will be encoded to.
Applies to
CountTokens(ReadOnlySpan<Char>, Boolean, Boolean, Boolean, Boolean, String, Int32, Int32)
- Source:
- SentencePieceTokenizer.cs
- Source:
- SentencePieceTokenizer.cs
- Source:
- SentencePieceTokenizer.cs
Get the number of tokens that the input text will be encoded to.
public int CountTokens(ReadOnlySpan<char> text, bool addBeginningOfSentence, bool addEndOfSentence, bool considerPreTokenization, bool considerNormalization, out string? normalizedText, out int charsConsumed, int maxTokenCount = 2147483647);
override this.CountTokens : ReadOnlySpan<char> * bool * bool * bool * bool * string * int * int -> int
Public Function CountTokens (text As ReadOnlySpan(Of Char), addBeginningOfSentence As Boolean, addEndOfSentence As Boolean, considerPreTokenization As Boolean, considerNormalization As Boolean, ByRef normalizedText As String, ByRef charsConsumed As Integer, Optional maxTokenCount As Integer = 2147483647) As Integer
Parameters
- text
- ReadOnlySpan<Char>
The text to encode.
- addBeginningOfSentence
- Boolean
Indicate emitting the beginning of sentence token during the encoding.
- addEndOfSentence
- Boolean
Indicate emitting the end of sentence token during the encoding.
- considerPreTokenization
- Boolean
Indicate whether to consider pre-tokenization before tokenization.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
- normalizedText
- String
If the tokenizer's normalization is enabled or <paramRef name="considerNormalization"></paramRef> is false, this will be set to <paramRef name="text"></paramRef> in its normalized form; otherwise, this value will be set to null.
- charsConsumed
- Int32
The length of the text that encompasses the maximum encoded tokens.
- maxTokenCount
- Int32
The maximum number of tokens to encode.
Returns
The number of tokens that the input text will be encoded to.
Applies to
CountTokens(String, Boolean, Boolean, Boolean, Boolean)
- Source:
- SentencePieceTokenizer.cs
- Source:
- SentencePieceTokenizer.cs
- Source:
- SentencePieceTokenizer.cs
Get the number of tokens that the input text will be encoded to.
public int CountTokens(string text, bool addBeginningOfSentence, bool addEndOfSentence, bool considerPreTokenization = true, bool considerNormalization = true);
override this.CountTokens : string * bool * bool * bool * bool -> int
Public Function CountTokens (text As String, addBeginningOfSentence As Boolean, addEndOfSentence As Boolean, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As Integer
Parameters
- text
- String
The text to encode.
- addBeginningOfSentence
- Boolean
Indicate emitting the beginning of sentence token during the encoding.
- addEndOfSentence
- Boolean
Indicate emitting the end of sentence token during the encoding.
- considerPreTokenization
- Boolean
Indicate whether to consider pre-tokenization before tokenization.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
Returns
The number of token Ids that the input text will be encoded to.
Applies to
CountTokens(ReadOnlySpan<Char>, Boolean, Boolean, Boolean, Boolean)
- Source:
- SentencePieceTokenizer.cs
- Source:
- SentencePieceTokenizer.cs
- Source:
- SentencePieceTokenizer.cs
Get the number of tokens that the input text will be encoded to.
public int CountTokens(ReadOnlySpan<char> text, bool addBeginningOfSentence, bool addEndOfSentence, bool considerPreTokenization = true, bool considerNormalization = true);
override this.CountTokens : ReadOnlySpan<char> * bool * bool * bool * bool -> int
Public Function CountTokens (text As ReadOnlySpan(Of Char), addBeginningOfSentence As Boolean, addEndOfSentence As Boolean, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As Integer
Parameters
- text
- ReadOnlySpan<Char>
The text to encode.
- addBeginningOfSentence
- Boolean
Indicate emitting the beginning of sentence token during the encoding.
- addEndOfSentence
- Boolean
Indicate emitting the end of sentence token during the encoding.
- considerPreTokenization
- Boolean
Indicate whether to consider pre-tokenization before tokenization.
- considerNormalization
- Boolean
Indicate whether to consider normalization before tokenization.
Returns
The number of token Ids that the input text will be encoded to.
Applies to
CountTokens(String, ReadOnlySpan<Char>, EncodeSettings)
- Source:
- SentencePieceTokenizer.cs
- Source:
- SentencePieceTokenizer.cs
- Source:
- SentencePieceTokenizer.cs
Get the number of tokens that the input text will be encoded to.
protected override int CountTokens(string? text, ReadOnlySpan<char> textSpan, Microsoft.ML.Tokenizers.EncodeSettings settings);
override this.CountTokens : string * ReadOnlySpan<char> * Microsoft.ML.Tokenizers.EncodeSettings -> int
Protected Overrides Function CountTokens (text As String, textSpan As ReadOnlySpan(Of Char), settings As EncodeSettings) As Integer
Parameters
- text
- String
The text to encode.
- textSpan
- ReadOnlySpan<Char>
The span of the text to encode which will be used if the text is null.
- settings
- EncodeSettings
The settings used to encode the text.
Returns
The number of token Ids that the input text will be encoded to.