SentencePieceTokenizer.EncodeToIds Method

Definition

Namespace:: Microsoft.ML.Tokenizers

Assembly:: Microsoft.ML.Tokenizers.dll

Package:: Microsoft.ML.Tokenizers v1.0.1

Package:: Microsoft.ML.Tokenizers v0.22.0

Package:: Microsoft.ML.Tokenizers v2.0.0-preview.1.25125.4

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

Overloads

EncodeToIds(String, Boolean, Boolean, Int32, String, Int32, Boolean, Boolean)	Encodes input text to token Ids up to maximum number of tokens.
EncodeToIds(String, Boolean, Boolean, Boolean, Boolean)	Encodes input text to token Ids.
EncodeToIds(ReadOnlySpan<Char>, Boolean, Boolean, Int32, String, Int32, Boolean, Boolean)	Encodes input text to token Ids up to maximum number of tokens.
EncodeToIds(String, ReadOnlySpan<Char>, EncodeSettings)	Encodes input text to token Ids.
EncodeToIds(ReadOnlySpan<Char>, Boolean, Boolean, Boolean, Boolean)	Encodes input text to token Ids.

EncodeToIds(String, Boolean, Boolean, Int32, String, Int32, Boolean, Boolean)

Source:: SentencePieceTokenizer.cs

Source:: SentencePieceTokenizer.cs

Source:: SentencePieceTokenizer.cs

Encodes input text to token Ids up to maximum number of tokens.

public System.Collections.Generic.IReadOnlyList<int> EncodeToIds(string text, bool addBeginningOfSentence, bool addEndOfSentence, int maxTokenCount, out string? normalizedText, out int charsConsumed, bool considerPreTokenization = true, bool considerNormalization = true);

override this.EncodeToIds : string * bool * bool * int * string * int * bool * bool -> System.Collections.Generic.IReadOnlyList<int>

Public Function EncodeToIds (text As String, addBeginningOfSentence As Boolean, addEndOfSentence As Boolean, maxTokenCount As Integer, ByRef normalizedText As String, ByRef charsConsumed As Integer, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of Integer)

Parameters

text: String

The text to encode.

addBeginningOfSentence: Boolean

Indicate emitting the beginning of sentence token during the encoding.

addEndOfSentence: Boolean

Indicate emitting the end of sentence token during the encoding.

maxTokenCount: Int32

The maximum number of tokens to encode.

normalizedText: String

If the tokenizer's normalization is enabled or <paramRef name="considerNormalization"></paramRef> is false, this will be set to <paramRef name="text"></paramRef> in its normalized form; otherwise, this value will be set to null.

charsConsumed: Int32

The length of the text that encompasses the maximum encoded tokens.

considerPreTokenization: Boolean

Indicate whether to consider pre-tokenization before tokenization.

considerNormalization: Boolean

Indicate whether to consider normalization before tokenization.

Returns

IReadOnlyList<Int32>

The list of encoded Ids.

Applies to

EncodeToIds(String, Boolean, Boolean, Boolean, Boolean)

Source:: SentencePieceTokenizer.cs

Source:: SentencePieceTokenizer.cs

Source:: SentencePieceTokenizer.cs

Encodes input text to token Ids.

public System.Collections.Generic.IReadOnlyList<int> EncodeToIds(string text, bool addBeginningOfSentence, bool addEndOfSentence, bool considerPreTokenization = true, bool considerNormalization = true);

override this.EncodeToIds : string * bool * bool * bool * bool -> System.Collections.Generic.IReadOnlyList<int>

Public Function EncodeToIds (text As String, addBeginningOfSentence As Boolean, addEndOfSentence As Boolean, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of Integer)

Parameters

text: String

The text to encode.

addBeginningOfSentence: Boolean

Indicate emitting the beginning of sentence token during the encoding.

addEndOfSentence: Boolean

Indicate emitting the end of sentence token during the encoding.

considerPreTokenization: Boolean

Indicate whether to consider pre-tokenization before tokenization.

considerNormalization: Boolean

Indicate whether to consider normalization before tokenization.

Returns

IReadOnlyList<Int32>

The list of encoded Ids.

Applies to

EncodeToIds(ReadOnlySpan<Char>, Boolean, Boolean, Int32, String, Int32, Boolean, Boolean)

Source:: SentencePieceTokenizer.cs

Source:: SentencePieceTokenizer.cs

Source:: SentencePieceTokenizer.cs

Encodes input text to token Ids up to maximum number of tokens.

public System.Collections.Generic.IReadOnlyList<int> EncodeToIds(ReadOnlySpan<char> text, bool addBeginningOfSentence, bool addEndOfSentence, int maxTokenCount, out string? normalizedText, out int charsConsumed, bool considerPreTokenization = true, bool considerNormalization = true);

override this.EncodeToIds : ReadOnlySpan<char> * bool * bool * int * string * int * bool * bool -> System.Collections.Generic.IReadOnlyList<int>

Public Function EncodeToIds (text As ReadOnlySpan(Of Char), addBeginningOfSentence As Boolean, addEndOfSentence As Boolean, maxTokenCount As Integer, ByRef normalizedText As String, ByRef charsConsumed As Integer, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of Integer)

Parameters

text: ReadOnlySpan<Char>

The text to encode.

addBeginningOfSentence: Boolean

Indicate emitting the beginning of sentence token during the encoding.

addEndOfSentence: Boolean

Indicate emitting the end of sentence token during the encoding.

maxTokenCount: Int32

The maximum number of tokens to encode.

normalizedText: String

charsConsumed: Int32

The length of the text that encompasses the maximum encoded tokens.

considerPreTokenization: Boolean

Indicate whether to consider pre-tokenization before tokenization.

considerNormalization: Boolean

Indicate whether to consider normalization before tokenization.

Returns

IReadOnlyList<Int32>

The list of encoded Ids.

Applies to

EncodeToIds(String, ReadOnlySpan<Char>, EncodeSettings)

Source:: SentencePieceTokenizer.cs

Source:: SentencePieceTokenizer.cs

Source:: SentencePieceTokenizer.cs

Encodes input text to token Ids.

protected override Microsoft.ML.Tokenizers.EncodeResults<int> EncodeToIds(string? text, ReadOnlySpan<char> textSpan, Microsoft.ML.Tokenizers.EncodeSettings settings);

override this.EncodeToIds : string * ReadOnlySpan<char> * Microsoft.ML.Tokenizers.EncodeSettings -> Microsoft.ML.Tokenizers.EncodeResults<int>

Protected Overrides Function EncodeToIds (text As String, textSpan As ReadOnlySpan(Of Char), settings As EncodeSettings) As EncodeResults(Of Integer)

Parameters

text: String

The text to encode.

textSpan: ReadOnlySpan<Char>

The span of the text to encode which will be used if the text is null.

settings: EncodeSettings

The settings used to encode the text.

Returns

EncodeResults<Int32>

The encoded results containing the list of encoded Ids.

Applies to

EncodeToIds(ReadOnlySpan<Char>, Boolean, Boolean, Boolean, Boolean)

Source:: SentencePieceTokenizer.cs

Source:: SentencePieceTokenizer.cs

Source:: SentencePieceTokenizer.cs

Encodes input text to token Ids.

public System.Collections.Generic.IReadOnlyList<int> EncodeToIds(ReadOnlySpan<char> text, bool addBeginningOfSentence, bool addEndOfSentence, bool considerPreTokenization = true, bool considerNormalization = true);

override this.EncodeToIds : ReadOnlySpan<char> * bool * bool * bool * bool -> System.Collections.Generic.IReadOnlyList<int>

Public Function EncodeToIds (text As ReadOnlySpan(Of Char), addBeginningOfSentence As Boolean, addEndOfSentence As Boolean, Optional considerPreTokenization As Boolean = true, Optional considerNormalization As Boolean = true) As IReadOnlyList(Of Integer)

Parameters

text: ReadOnlySpan<Char>

The text to encode.

addBeginningOfSentence: Boolean

Indicate emitting the beginning of sentence token during the encoding.

addEndOfSentence: Boolean

Indicate emitting the end of sentence token during the encoding.

considerPreTokenization: Boolean

Indicate whether to consider pre-tokenization before tokenization.

considerNormalization: Boolean

Indicate whether to consider normalization before tokenization.

Returns

IReadOnlyList<Int32>

The list of encoded Ids.

Applies to

Condividi tramite

SentencePieceTokenizer.EncodeToIds Method

Definition

Overloads

EncodeToIds(String, Boolean, Boolean, Int32, String, Int32, Boolean, Boolean)

Parameters

Returns

Applies to

EncodeToIds(String, Boolean, Boolean, Boolean, Boolean)

Parameters

Returns

Applies to

EncodeToIds(ReadOnlySpan<Char>, Boolean, Boolean, Int32, String, Int32, Boolean, Boolean)

Parameters

Returns

Applies to

EncodeToIds(String, ReadOnlySpan<Char>, EncodeSettings)

Parameters

Returns

Applies to

EncodeToIds(ReadOnlySpan<Char>, Boolean, Boolean, Boolean, Boolean)

Parameters

Returns

Applies to