.NET で文字エンコードクラスを使用する方法

この記事では、さまざまなエンコードスキームを使用してテキストをエンコードおよびデコードするために .NET が提供するクラスを使用する方法について説明します。この手順では、「 .NET での文字エンコードの概要」を参照していることを前提としています。

エンコーダーとデコーダー

.NET には、さまざまなエンコードシステムを使用してテキストをエンコードおよびデコードするエンコードクラスが用意されています。たとえば、 UTF8Encoding クラスでは、UTF-8 に対するエンコードとデコードの規則について説明します。 .NET では、UnicodeEncoding インスタンスに UTF-16 エンコード (string クラスで表されます) が使用されます。エンコーダーとデコーダーは、他のエンコードスキームで使用できます。

エンコードとデコードには、検証を含めることもできます。たとえば、 UnicodeEncoding クラスは、サロゲート範囲内のすべての char インスタンスが有効なサロゲートペアにあることを確認します。フォールバック戦略は、エンコーダーが無効な文字を処理する方法、またはデコーダーが無効なバイトを処理する方法を決定します。

Warnung

.NET エンコードクラスは、文字データを格納および変換する方法を提供します。バイナリデータを文字列形式で格納するために使用しないでください。使用されるエンコードによっては、エンコードクラスを使用してバイナリデータを文字列形式に変換すると、予期しない動作が発生し、不正確または破損したデータが生成される可能性があります。バイナリデータを文字列形式に変換するには、 Convert.ToBase64String メソッドを使用します。

.NET のすべての文字エンコードクラスは、 System.Text.Encoding クラスを継承します。これは、すべての文字エンコーディングに共通する機能を定義する抽象クラスです。 .NET に実装されている個々のエンコードオブジェクトにアクセスするには、次の操作を行います。

.NET (ASCII、UTF-7、UTF-8、UTF-16、UTF-32) で使用できる標準文字エンコーディングを表すオブジェクトを返す、 Encoding クラスの静的プロパティを使用します。たとえば、 Encoding.Unicode プロパティは、 UnicodeEncoding オブジェクトを返します。各オブジェクトは、置換フォールバックを使用して、エンコードできない文字列とデコードできないバイトを処理します。詳細については、「代替フォールバック」を参照してください。
エンコードのクラスコンストラクターを呼び出します。この方法で、ASCII、UTF-7、UTF-8、UTF-16、UTF-32 エンコードのオブジェクトをインスタンス化できます。既定では、各オブジェクトは置換フォールバックを使用して、エンコードできない文字列とデコードできないバイトを処理しますが、代わりに例外をスローするように指定できます。詳細については、「代替フォールバックと例外フォールバック」を参照してください。
Encoding(Int32) コンストラクターを呼び出し、エンコードを表す整数を渡します。標準エンコードオブジェクトでは代替フォールバックが使用され、コードページおよび 2 バイト文字セット (DBCS) エンコードオブジェクトは、エンコードできない文字列とデコードできないバイトを処理するために最適なフォールバックを使用します。詳細については、「最適フォールバック」を参照してください。
Encoding.GetEncoding メソッドを呼び出します。このメソッドは、.NET で使用可能な標準、コードページ、または DBCS エンコードを返します。オーバーロードを使用すると、エンコーダーとデコーダーの両方にフォールバックオブジェクトを指定できます。

Encoding.GetEncodings メソッドを呼び出すことで、.NET で使用できるすべてのエンコードに関する情報を取得できます。 .NET では、次の表に示す文字エンコードスキームがサポートされています。

エンコーディングクラス	説明
Ascii	バイトの下位 7 ビットを使用して、限られた範囲の文字をエンコードします。このエンコードでは、 `U+0000` から `U+007F`までの文字値のみがサポートされるため、ほとんどの場合、国際化アプリケーションでは不十分です。
UTF-7	文字を 7 ビット ASCII 文字のシーケンスとして表します。 ASCII 以外の Unicode 文字は、ASCII 文字のエスケープシーケンスによって表されます。 UTF-7 では、電子メールやニュースグループなどのプロトコルがサポートされています。ただし、UTF-7 は特に安全または堅牢ではありません。場合によっては、1 ビットを変更すると、UTF-7 文字列全体の解釈が根本的に変更される場合があります。それ以外の場合は、異なる UTF-7 文字列で同じテキストをエンコードできます。 ASCII 以外の文字を含むシーケンスの場合、UTF-7 には UTF-8 よりも多くの領域が必要であり、エンコード/デコードが遅くなります。そのため、可能であれば UTF-7 の代わりに UTF-8 を使用する必要があります。
UTF-8	各 Unicode コードポイントを 1 ~ 4 バイトのシーケンスとして表します。 UTF-8 は 8 ビットのデータサイズをサポートし、多くの既存のオペレーティングシステムで適切に動作します。 ASCII 文字範囲の場合、UTF-8 は ASCII エンコードと同じであり、より広範な文字セットを使用できます。ただし、中国語Japanese-Korean (CJK) スクリプトの場合、UTF-8 では文字ごとに 3 バイトが必要になり、UTF-16 よりも大きなデータサイズが発生する可能性があります。 HTML タグなどの ASCII データの量によって、CJK 範囲のサイズの増加が正当化される場合があります。
UTF-16	各 Unicode コードポイントを、1 つまたは 2 つの 16 ビット整数のシーケンスとして表します。ほとんどの一般的な Unicode 文字では UTF-16 コードポイントが 1 つだけ必要ですが、Unicode 補助文字 (U+10000 以降) には 2 つの UTF-16 サロゲートコードポイントが必要です。リトルエンディアンとビッグエンディアンの両方のバイトオーダーがサポートされています。 UTF-16 エンコードは、 Char と String の値を表すために共通言語ランタイムによって使用され、 `WCHAR` 値を表すために Windows オペレーティングシステムによって使用されます。
UTF-32	各 Unicode コードポイントを 32 ビット整数として表します。リトルエンディアンとビッグエンディアンの両方のバイトオーダーがサポートされています。 UTF-32 エンコードは、エンコードされた領域が重要すぎるオペレーティングシステムで UTF-16 エンコードのサロゲートコードポイントの動作を回避する場合に使用されます。ディスプレイにレンダリングされた 1 つのグリフは、引き続き複数の UTF-32 文字でエンコードできます。
ANSI/ISO エンコード	さまざまなコードページのサポートを提供します。 Windows オペレーティングシステムでは、特定の言語または言語グループをサポートするためにコードページが使用されます。 .NET でサポートされているコードページの一覧を示すテーブルについては、 Encoding クラスを参照してください。 Encoding.GetEncoding(Int32) メソッドを呼び出すことで、特定のコードページのエンコードオブジェクトを取得できます。コードページには 256 個のコードポイントが含まれており、0 から始まります。ほとんどのコードページでは、コードポイント 0 から 127 は ASCII 文字セットを表し、コードポイント 128 から 255 はコードページ間で大きく異なります。たとえば、コードページ 1252 には、英語、ドイツ語、フランス語などのラテン語の書き込みシステムの文字が用意されています。コードページ 1252 の最後の 128 個のコードポイントには、アクセント文字が含まれています。コード・ページ 1253 は、ギリシャ書き込みシステムで必要な文字コードを提供します。コードページ 1253 の最後の 128 個のコードポイントには、ギリシャ文字が含まれています。その結果、ANSI コードページに依存するアプリケーションは、参照先のコードページを示す識別子が含まれている場合を除き、ギリシャ語とドイツ語を同じテキストストリームに格納できません。
2 バイト文字セット (DBCS) エンコード	256 文字を超える中国語、日本語、韓国語などの言語をサポートします。 DBCS では、コード・ポイントのペア (2 バイト) は各文字を表します。 Encoding.IsSingleByte プロパティは、DBCS エンコードの`false`を返します。 Encoding.GetEncoding(Int32) メソッドを呼び出すことで、特定の DBCS のエンコードオブジェクトを取得できます。アプリケーションが DBCS データを処理する場合、DBCS 文字 (先頭バイト) の最初のバイトは、その直後の証跡バイトと組み合わせて処理されます。 1 組の 2 バイトコードポイントはコードページによって異なる文字を表すことができるため、このスキームでは、日本語と中国語などの 2 つの言語を同じデータストリーム内で組み合わせて使用することはできません。

これらのエンコードを使用すると、Unicode 文字と、レガシアプリケーションで最も一般的に使用されるエンコードを操作できます。さらに、 Encoding から派生するクラスを定義し、そのメンバーをオーバーライドすることで、カスタムエンコードを作成できます。

.NET Core エンコードのサポート

既定では、.NET Core では、コードページ 28591 および UTF-8 や UTF-16 などの Unicode エンコード以外のコードページエンコードは使用できません。ただし、.NET を対象とする標準の Windows アプリにあるコードページエンコードをアプリに追加できます。詳細については、 CodePagesEncodingProvider トピックを参照してください。

エンコードクラスの選択

アプリケーションで使用するエンコードを選択する機会がある場合は、Unicode エンコード (できれば UTF8Encoding または UnicodeEncoding) を使用する必要があります。 (.NET では、3 番目の Unicode エンコード ( UTF32Encoding) もサポートされています)。

ASCII エンコード (ASCIIEncoding) を使用する予定の場合は、代わりに UTF8Encoding を選択します。 2 つのエンコードは ASCII 文字セットで同じですが、 UTF8Encoding には次の利点があります。

すべての Unicode 文字を表すことができますが、 ASCIIEncoding では U+0000 から U+007F までの Unicode 文字値のみがサポートされます。
エラー検出とセキュリティの向上を提供します。
可能な限り高速に調整されており、他のどのエンコードよりも高速である必要があります。完全に ASCII であるコンテンツの場合でも、 UTF8Encoding で実行される操作は、 ASCIIEncodingで実行される操作よりも高速です。

レガシアプリケーションにのみ ASCIIEncoding を使用することを検討する必要があります。ただし、レガシアプリケーションの場合でも、次の理由で (既定の設定を想定して) UTF8Encoding の方が適している可能性があります。

アプリケーションに厳密に ASCII ではないコンテンツがあり、 ASCIIEncodingでエンコードされている場合、ASCII 以外の各文字は疑問符 (?) としてエンコードされます。その後、アプリケーションがこのデータをデコードすると、情報は失われます。
アプリケーションに厳密に ASCII ではないコンテンツがあり、 UTF8Encodingでエンコードされている場合、ASCII と解釈された場合、結果は理解できないようです。ただし、アプリケーションで UTF-8 デコーダーを使用してこのデータをデコードする場合、データはラウンドトリップを正常に実行します。

Web アプリケーションでは、Web 要求に応答してクライアントに送信される文字は、クライアントで使用されるエンコードを反映する必要があります。ほとんどの場合、 HttpResponse.ContentEncoding プロパティを HttpRequest.ContentEncoding プロパティによって返される値に設定して、ユーザーが期待するエンコードでテキストを表示する必要があります。

Encoding オブジェクトの使用

エンコーダーは、文字の文字列 (最も一般的には Unicode 文字) を等価の数値 (バイト) に変換します。たとえば、コンソールに表示できるように、ASCII エンコーダーを使用して Unicode 文字を ASCII に変換できます。変換を実行するには、 Encoding.GetBytes メソッドを呼び出します。エンコードを実行する前にエンコードされた文字を格納するために必要なバイト数を確認する場合は、 GetByteCount メソッドを呼び出すことができます。

次の例では、1 バイト配列を使用して、2 つの個別の操作で文字列をエンコードします。 ASCII エンコードバイトの次のセットのバイト配列内の開始位置を示すインデックスが保持されます。 ASCIIEncoding.GetByteCount(String) メソッドを呼び出して、エンコードされた文字列を格納するのに十分な大きさのバイト配列を確保します。次に、 ASCIIEncoding.GetBytes(String, Int32, Int32, Byte[], Int32) メソッドを呼び出して、文字列内の文字をエンコードします。

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
      string[] strings= { "This is the first sentence. ",
                          "This is the second sentence. " };
      Encoding asciiEncoding = Encoding.ASCII;

      // Create array of adequate size.
      byte[] bytes = new byte[49];
      // Create index for current position of array.
      int index = 0;

      Console.WriteLine("Strings to encode:");
      foreach (var stringValue in strings) {
         Console.WriteLine($"   {stringValue}");

         int count = asciiEncoding.GetByteCount(stringValue);
         if (count + index >=  bytes.Length)
            Array.Resize(ref bytes, bytes.Length + 50);

         int written = asciiEncoding.GetBytes(stringValue, 0,
                                              stringValue.Length,
                                              bytes, index);

         index = index + written;
      }
      Console.WriteLine("\nEncoded bytes:");
      Console.WriteLine($"{ShowByteValues(bytes, index)}");
      Console.WriteLine();

      // Decode Unicode byte array to a string.
      string newString = asciiEncoding.GetString(bytes, 0, index);
      Console.WriteLine($"Decoded: {newString}");
   }

   private static string ShowByteValues(byte[] bytes, int last )
   {
      string returnString = "   ";
      for (int ctr = 0; ctr <= last - 1; ctr++) {
         if (ctr % 20 == 0)
            returnString += "\n   ";
         returnString += String.Format("{0:X2} ", bytes[ctr]);
      }
      return returnString;
   }
}
// The example displays the following output:
//       Strings to encode:
//          This is the first sentence.
//          This is the second sentence.
//
//       Encoded bytes:
//
//          54 68 69 73 20 69 73 20 74 68 65 20 66 69 72 73 74 20 73 65
//          6E 74 65 6E 63 65 2E 20 54 68 69 73 20 69 73 20 74 68 65 20
//          73 65 63 6F 6E 64 20 73 65 6E 74 65 6E 63 65 2E 20
//
//       Decoded: This is the first sentence. This is the second sentence.

Imports System.Text

Module Example
    Public Sub Main()
        Dim strings() As String = {"This is the first sentence. ",
                                    "This is the second sentence. "}
        Dim asciiEncoding As Encoding = Encoding.ASCII

        ' Create array of adequate size.
        Dim bytes(50) As Byte
        ' Create index for current position of array.
        Dim index As Integer = 0

        Console.WriteLine("Strings to encode:")
        For Each stringValue In strings
            Console.WriteLine("   {0}", stringValue)

            Dim count As Integer = asciiEncoding.GetByteCount(stringValue)
            If count + index >= bytes.Length Then
                Array.Resize(bytes, bytes.Length + 50)
            End If
            Dim written As Integer = asciiEncoding.GetBytes(stringValue, 0,
                                                            stringValue.Length,
                                                            bytes, index)

            index = index + written
        Next
        Console.WriteLine()
        Console.WriteLine("Encoded bytes:")
        Console.WriteLine("{0}", ShowByteValues(bytes, index))
        Console.WriteLine()

        ' Decode Unicode byte array to a string.
        Dim newString As String = asciiEncoding.GetString(bytes, 0, index)
        Console.WriteLine("Decoded: {0}", newString)
    End Sub

    Private Function ShowByteValues(bytes As Byte(), last As Integer) As String
        Dim returnString As String = "   "
        For ctr As Integer = 0 To last - 1
            If ctr Mod 20 = 0 Then returnString += vbCrLf + "   "
            returnString += String.Format("{0:X2} ", bytes(ctr))
        Next
        Return returnString
    End Function
End Module
' The example displays the following output:
'       Strings to encode:
'          This is the first sentence.
'          This is the second sentence.
'       
'       Encoded bytes:
'       
'          54 68 69 73 20 69 73 20 74 68 65 20 66 69 72 73 74 20 73 65
'          6E 74 65 6E 63 65 2E 20 54 68 69 73 20 69 73 20 74 68 65 20
'          73 65 63 6F 6E 64 20 73 65 6E 74 65 6E 63 65 2E 20
'       
'       Decoded: This is the first sentence. This is the second sentence.

デコーダーは、特定の文字エンコーディングを反映するバイト配列を、文字配列または文字列の文字セットに変換します。バイト配列を文字配列にデコードするには、 Encoding.GetChars メソッドを呼び出します。バイト配列を文字列にデコードするには、 GetString メソッドを呼び出します。デコードを実行する前にデコードされたバイトを格納するために必要な文字の数を確認する場合は、 GetCharCount メソッドを呼び出すことができます。

次の例では、3 つの文字列をエンコードし、それらを 1 つの文字配列にデコードします。デコードされた次の文字セットの文字配列内の開始位置を示すインデックスが保持されます。 GetCharCount メソッドを呼び出して、デコードされたすべての文字を格納するのに十分な大きさの文字配列を確保します。次に、 ASCIIEncoding.GetChars(Byte[], Int32, Int32, Char[], Int32) メソッドを呼び出してバイト配列をデコードします。

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
      string[] strings = { "This is the first sentence. ",
                           "This is the second sentence. ",
                           "This is the third sentence. " };
      Encoding asciiEncoding = Encoding.ASCII;
      // Array to hold encoded bytes.
      byte[] bytes;
      // Array to hold decoded characters.
      char[] chars = new char[50];
      // Create index for current position of character array.
      int index = 0;

      foreach (var stringValue in strings) {
         Console.WriteLine($"String to Encode: {stringValue}");
         // Encode the string to a byte array.
         bytes = asciiEncoding.GetBytes(stringValue);
         // Display the encoded bytes.
         Console.Write("Encoded bytes: ");
         for (int ctr = 0; ctr < bytes.Length; ctr++)
            Console.Write(" {0}{1:X2}",
                          ctr % 20 == 0 ? Environment.NewLine : "",
                          bytes[ctr]);
         Console.WriteLine();

         // Decode the bytes to a single character array.
         int count = asciiEncoding.GetCharCount(bytes);
         if (count + index >=  chars.Length)
            Array.Resize(ref chars, chars.Length + 50);

         int written = asciiEncoding.GetChars(bytes, 0,
                                              bytes.Length,
                                              chars, index);
         index = index + written;
         Console.WriteLine();
      }

      // Instantiate a single string containing the characters.
      string decodedString = new string(chars, 0, index - 1);
      Console.WriteLine("Decoded string: ");
      Console.WriteLine(decodedString);
   }
}
// The example displays the following output:
//    String to Encode: This is the first sentence.
//    Encoded bytes:
//    54 68 69 73 20 69 73 20 74 68 65 20 66 69 72 73 74 20 73 65
//    6E 74 65 6E 63 65 2E 20
//
//    String to Encode: This is the second sentence.
//    Encoded bytes:
//    54 68 69 73 20 69 73 20 74 68 65 20 73 65 63 6F 6E 64 20 73
//    65 6E 74 65 6E 63 65 2E 20
//
//    String to Encode: This is the third sentence.
//    Encoded bytes:
//    54 68 69 73 20 69 73 20 74 68 65 20 74 68 69 72 64 20 73 65
//    6E 74 65 6E 63 65 2E 20
//
//    Decoded string:
//    This is the first sentence. This is the second sentence. This is the third sentence.

Imports System.Text

Module Example
    Public Sub Main()
        Dim strings() As String = {"This is the first sentence. ",
                                    "This is the second sentence. ",
                                    "This is the third sentence. "}
        Dim asciiEncoding As Encoding = Encoding.ASCII
        ' Array to hold encoded bytes.
        Dim bytes() As Byte
        ' Array to hold decoded characters.
        Dim chars(50) As Char
        ' Create index for current position of character array.
        Dim index As Integer

        For Each stringValue In strings
            Console.WriteLine("String to Encode: {0}", stringValue)
            ' Encode the string to a byte array.
            bytes = asciiEncoding.GetBytes(stringValue)
            ' Display the encoded bytes.
            Console.Write("Encoded bytes: ")
            For ctr As Integer = 0 To bytes.Length - 1
                Console.Write(" {0}{1:X2}", If(ctr Mod 20 = 0, vbCrLf, ""),
                                            bytes(ctr))
            Next
            Console.WriteLine()

            ' Decode the bytes to a single character array.
            Dim count As Integer = asciiEncoding.GetCharCount(bytes)
            If count + index >= chars.Length Then
                Array.Resize(chars, chars.Length + 50)
            End If
            Dim written As Integer = asciiEncoding.GetChars(bytes, 0,
                                                            bytes.Length,
                                                            chars, index)
            index = index + written
            Console.WriteLine()
        Next

        ' Instantiate a single string containing the characters.
        Dim decodedString As New String(chars, 0, index - 1)
        Console.WriteLine("Decoded string: ")
        Console.WriteLine(decodedString)
    End Sub
End Module
' The example displays the following output:
'    String to Encode: This is the first sentence.
'    Encoded bytes:
'    54 68 69 73 20 69 73 20 74 68 65 20 66 69 72 73 74 20 73 65
'    6E 74 65 6E 63 65 2E 20
'    
'    String to Encode: This is the second sentence.
'    Encoded bytes:
'    54 68 69 73 20 69 73 20 74 68 65 20 73 65 63 6F 6E 64 20 73
'    65 6E 74 65 6E 63 65 2E 20
'    
'    String to Encode: This is the third sentence.
'    Encoded bytes:
'    54 68 69 73 20 69 73 20 74 68 65 20 74 68 69 72 64 20 73 65
'    6E 74 65 6E 63 65 2E 20
'    
'    Decoded string:
'    This is the first sentence. This is the second sentence. This is the third sentence.

Encodingから派生したクラスのエンコードおよびデコードメソッドは、完全なデータセットで動作するように設計されています。つまり、エンコードまたはデコードされるすべてのデータは、1 回のメソッド呼び出しで提供されます。ただし、場合によっては、ストリームでデータを使用でき、エンコードまたはデコードされるデータは、個別の読み取り操作からのみ使用できます。そのためには、エンコードまたはデコード操作で、以前の呼び出しから保存された状態を記憶する必要があります。 EncoderおよびDecoderから派生したクラスのメソッドは、複数のメソッド呼び出しにまたがるエンコードおよびデコード操作を処理できます。

特定のエンコーディングの Encoder オブジェクトは、そのエンコーディングの Encoding.GetEncoder() プロパティから使用できます。特定のエンコードの Decoder オブジェクトは、そのエンコードの Encoding.GetDecoder() プロパティから使用できます。デコード操作の場合、 Decoder から派生したクラスには Decoder.GetChars メソッドが含まれますが、 Encoding.GetStringに対応するメソッドは含まれていないことに注意してください。

次の例は、Unicode バイト配列をデコードするために Encoding.GetString メソッドと Decoder.GetChars メソッドを使用する場合の違いを示しています。この例では、一部の Unicode 文字を含む文字列をファイルにエンコードし、2 つのデコードメソッドを使用して一度に 10 バイトずつデコードします。サロゲートペアは 10 番目と 11 番目のバイトで発生するため、個別のメソッド呼び出しでデコードされます。出力が示すように、 Encoding.GetString メソッドはバイトを正しくデコードできず、代わりに U+FFFD (REPLACEMENT CHARACTER) に置き換えられます。一方、 Decoder.GetChars メソッドは、バイト配列を正常にデコードして元の文字列を取得できます。

using System;
using System.IO;
using System.Text;

public class Example
{
   public static void Main()
   {
      // Use default replacement fallback for invalid encoding.
      UnicodeEncoding enc = new UnicodeEncoding(true, false, false);

      // Define a string with various Unicode characters.
      string str1 = "AB YZ 19 \uD800\udc05 \u00e4";
      str1 += "Unicode characters. \u00a9 \u010C s \u0062\u0308";
      Console.WriteLine("Created original string...\n");

      // Convert string to byte array.
      byte[] bytes = enc.GetBytes(str1);

      FileStream fs = File.Create(@".\characters.bin");
      BinaryWriter bw = new BinaryWriter(fs);
      bw.Write(bytes);
      bw.Close();

      // Read bytes from file.
      FileStream fsIn = File.OpenRead(@".\characters.bin");
      BinaryReader br = new BinaryReader(fsIn);

      const int count = 10;            // Number of bytes to read at a time.
      byte[] bytesRead = new byte[10]; // Buffer (byte array).
      int read;                        // Number of bytes actually read.
      string str2 = String.Empty;      // Decoded string.

      // Try using Encoding object for all operations.
      do {
         read = br.Read(bytesRead, 0, count);
         str2 += enc.GetString(bytesRead, 0, read);
      } while (read == count);
      br.Close();
      Console.WriteLine("Decoded string using UnicodeEncoding.GetString()...");
      CompareForEquality(str1, str2);
      Console.WriteLine();

      // Use Decoder for all operations.
      fsIn = File.OpenRead(@".\characters.bin");
      br = new BinaryReader(fsIn);
      Decoder decoder = enc.GetDecoder();
      char[] chars = new char[50];
      int index = 0;                   // Next character to write in array.
      int written = 0;                 // Number of chars written to array.
      do {
         read = br.Read(bytesRead, 0, count);
         if (index + decoder.GetCharCount(bytesRead, 0, read) - 1 >= chars.Length)
            Array.Resize(ref chars, chars.Length + 50);

         written = decoder.GetChars(bytesRead, 0, read, chars, index);
         index += written;
      } while (read == count);
      br.Close();
      // Instantiate a string with the decoded characters.
      string str3 = new String(chars, 0, index);
      Console.WriteLine("Decoded string using UnicodeEncoding.Decoder.GetString()...");
      CompareForEquality(str1, str3);
   }

   private static void CompareForEquality(string original, string decoded)
   {
      bool result = original.Equals(decoded);
      Console.WriteLine($"original = decoded: {original.Equals(decoded, StringComparison.Ordinal)}");
      if (! result) {
         Console.WriteLine("Code points in original string:");
         foreach (var ch in original)
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));
         Console.WriteLine();

         Console.WriteLine("Code points in decoded string:");
         foreach (var ch in decoded)
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));
         Console.WriteLine();
      }
   }
}
// The example displays the following output:
//    Created original string...
//
//    Decoded string using UnicodeEncoding.GetString()...
//    original = decoded: False
//    Code points in original string:
//    0041 0042 0020 0059 005A 0020 0031 0039 0020 D800 DC05 0020 00E4 0055 006E 0069 0063 006F
//    0064 0065 0020 0063 0068 0061 0072 0061 0063 0074 0065 0072 0073 002E 0020 00A9 0020 010C
//    0020 0073 0020 0062 0308
//    Code points in decoded string:
//    0041 0042 0020 0059 005A 0020 0031 0039 0020 FFFD FFFD 0020 00E4 0055 006E 0069 0063 006F
//    0064 0065 0020 0063 0068 0061 0072 0061 0063 0074 0065 0072 0073 002E 0020 00A9 0020 010C
//    0020 0073 0020 0062 0308
//
//    Decoded string using UnicodeEncoding.Decoder.GetString()...
//    original = decoded: True

Imports System.IO
Imports System.Text

Module Example
    Public Sub Main()
        ' Use default replacement fallback for invalid encoding.
        Dim enc As New UnicodeEncoding(True, False, False)

        ' Define a string with various Unicode characters.
        Dim str1 As String = String.Format("AB YZ 19 {0}{1} {2}",
                                           ChrW(&hD800), ChrW(&hDC05), ChrW(&h00e4))
        str1 += String.Format("Unicode characters. {0} {1} s {2}{3}",
                              ChrW(&h00a9), ChrW(&h010C), ChrW(&h0062), ChrW(&h0308))
        Console.WriteLine("Created original string...")
        Console.WriteLine()

        ' Convert string to byte array.                     
        Dim bytes() As Byte = enc.GetBytes(str1)

        Dim fs As FileStream = File.Create(".\characters.bin")
        Dim bw As New BinaryWriter(fs)
        bw.Write(bytes)
        bw.Close()

        ' Read bytes from file.
        Dim fsIn As FileStream = File.OpenRead(".\characters.bin")
        Dim br As New BinaryReader(fsIn)

        Const count As Integer = 10      ' Number of bytes to read at a time. 
        Dim bytesRead(9) As Byte         ' Buffer (byte array).
        Dim read As Integer              ' Number of bytes actually read. 
        Dim str2 As String = ""          ' Decoded string.

        ' Try using Encoding object for all operations.
        Do
            read = br.Read(bytesRead, 0, count)
            str2 += enc.GetString(bytesRead, 0, read)
        Loop While read = count
        br.Close()
        Console.WriteLine("Decoded string using UnicodeEncoding.GetString()...")
        CompareForEquality(str1, str2)
        Console.WriteLine()

        ' Use Decoder for all operations.
        fsIn = File.OpenRead(".\characters.bin")
        br = New BinaryReader(fsIn)
        Dim decoder As Decoder = enc.GetDecoder()
        Dim chars(50) As Char
        Dim index As Integer = 0         ' Next character to write in array.
        Dim written As Integer = 0       ' Number of chars written to array.
        Do
            read = br.Read(bytesRead, 0, count)
            If index + decoder.GetCharCount(bytesRead, 0, read) - 1 >= chars.Length Then
                Array.Resize(chars, chars.Length + 50)
            End If
            written = decoder.GetChars(bytesRead, 0, read, chars, index)
            index += written
        Loop While read = count
        br.Close()
        ' Instantiate a string with the decoded characters.
        Dim str3 As New String(chars, 0, index)
        Console.WriteLine("Decoded string using UnicodeEncoding.Decoder.GetString()...")
        CompareForEquality(str1, str3)
    End Sub

    Private Sub CompareForEquality(original As String, decoded As String)
        Dim result As Boolean = original.Equals(decoded)
        Console.WriteLine("original = decoded: {0}",
                          original.Equals(decoded, StringComparison.Ordinal))
        If Not result Then
            Console.WriteLine("Code points in original string:")
            For Each ch In original
                Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
            Next
            Console.WriteLine()

            Console.WriteLine("Code points in decoded string:")
            For Each ch In decoded
                Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
            Next
            Console.WriteLine()
        End If
    End Sub
End Module
' The example displays the following output:
'    Created original string...
'    
'    Decoded string using UnicodeEncoding.GetString()...
'    original = decoded: False
'    Code points in original string:
'    0041 0042 0020 0059 005A 0020 0031 0039 0020 D800 DC05 0020 00E4 0055 006E 0069 0063 006F
'    0064 0065 0020 0063 0068 0061 0072 0061 0063 0074 0065 0072 0073 002E 0020 00A9 0020 010C
'    0020 0073 0020 0062 0308
'    Code points in decoded string:
'    0041 0042 0020 0059 005A 0020 0031 0039 0020 FFFD FFFD 0020 00E4 0055 006E 0069 0063 006F
'    0064 0065 0020 0063 0068 0061 0072 0061 0063 0074 0065 0072 0073 002E 0020 00A9 0020 010C
'    0020 0073 0020 0062 0308
'    
'    Decoded string using UnicodeEncoding.Decoder.GetString()...
'    original = decoded: True

フォールバック戦略の選択

メソッドが文字をエンコードまたはデコードしようとしてもマッピングが存在しない場合は、失敗したマッピングの処理方法を決定するフォールバック戦略を実装する必要があります。フォールバック戦略には、次の 3 種類があります。

最適なフォールバック
代替手段
例外時の代替案

Important

エンコード操作で最も一般的な問題は、Unicode 文字を特定のコードページエンコードにマップできない場合に発生します。デコード操作で最も一般的な問題は、無効なバイトシーケンスを有効な Unicode 文字に変換できない場合に発生します。このような理由から、特定のエンコードオブジェクトで使用されるフォールバック戦略を把握しておく必要があります。可能な限り、オブジェクトをインスタンス化するときにエンコードオブジェクトによって使用されるフォールバック戦略を指定する必要があります。

Best-Fit フォールバック

ターゲットエンコードで文字が完全に一致しない場合、エンコーダーは類似の文字にマップを試みることができます。 (最適なフォールバックは、デコードの問題ではなく、ほとんどのエンコードです。Unicode に正常にマップできない文字を含むコードページはほとんどありません)。最適フォールバックは、 Encoding.GetEncoding(Int32) および Encoding.GetEncoding(String) オーバーロードによって取得されるコードページおよび 2 バイト文字セットのエンコードの既定値です。

注

理論的には、.NET (UTF8Encoding、 UnicodeEncoding、および UTF32Encoding) で提供される Unicode エンコードクラスでは、すべての文字セット内のすべての文字がサポートされるため、最適なフォールバックの問題を排除するために使用できます。

最適な戦略は、コードページによって異なります。たとえば、一部のコードページでは、全角ラテン文字は、より一般的な半角ラテン文字にマップされます。その他のコードページでは、このマッピングは行われません。アグレッシブなベストフィット戦略の下でも、エンコーディングの種類によっては、一部の文字に対して適合する方法が全く考えられない場合があります。たとえば、中国語のイデオグラフには、コードページ 1252 への適切なマッピングがありません。この場合、置換文字列が使用されます。既定では、この文字列は単一の QUESTION MARK (U+003F) です。

注

最適な戦略については詳しく説明されていません。ただし、 Unicode コンソーシアムの Web サイトには、いくつかのコードページが記載されています。マッピングファイルの解釈方法については、そのフォルダー内の readme.txt ファイルを確認してください。

次の例では、コードページ 1252 (西ヨーロッパ言語用の Windows コードページ) を使用して、最適なマッピングとその欠点を示します。 Encoding.GetEncoding(Int32) メソッドは、コードページ 1252 のエンコードオブジェクトを取得するために使用されます。既定では、サポートされていない Unicode 文字に最適なマッピングが使用されます。この例では、3 つの非 ASCII 文字 (CIRCLED LATIN CAPITAL LETTER S (U+24C8)、SUPERSCRIPT FIVE (U+2075)、INFINITY (U+221E) ) をスペースで区切って含む文字列をインスタンス化します。この例の出力に示すように、文字列がエンコードされると、3 つの元の空白以外の文字が QUESTION MARK (U+003F)、DIGIT FIVE (U+0035)、DIGIT EIGHT (U+0038) に置き換えられます。 DIGIT EIGHT は、サポートされていない INFINITY 文字に対して特に不適切な置換であり、QUESTION MARK は、元の文字に対してマッピングが使用できなかったことを示します。

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
      // Get an encoding for code page 1252 (Western Europe character set).
      Encoding cp1252 = Encoding.GetEncoding(1252);

      // Define and display a string.
      string str = "\u24c8 \u2075 \u221e";
      Console.WriteLine("Original string: " + str);
      Console.Write("Code points in string: ");
      foreach (var ch in str)
         Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

      Console.WriteLine("\n");

      // Encode a Unicode string.
      Byte[] bytes = cp1252.GetBytes(str);
      Console.Write("Encoded bytes: ");
      foreach (byte byt in bytes)
         Console.Write("{0:X2} ", byt);
      Console.WriteLine("\n");

      // Decode the string.
      string str2 = cp1252.GetString(bytes);
      Console.WriteLine($"String round-tripped: {str.Equals(str2)}");
      if (! str.Equals(str2)) {
         Console.WriteLine(str2);
         foreach (var ch in str2)
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));
      }
   }
}
// The example displays the following output:
//       Original string: Ⓢ ⁵ ∞
//       Code points in string: 24C8 0020 2075 0020 221E
//
//       Encoded bytes: 3F 20 35 20 38
//
//       String round-tripped: False
//       ? 5 8
//       003F 0020 0035 0020 0038

Imports System.Text

Module Example
    Public Sub Main()
        ' Get an encoding for code page 1252 (Western Europe character set).
        Dim cp1252 As Encoding = Encoding.GetEncoding(1252)

        ' Define and display a string.
        Dim str As String = String.Format("{0} {1} {2}", ChrW(&h24c8), ChrW(&H2075), ChrW(&h221E))
        Console.WriteLine("Original string: " + str)
        Console.Write("Code points in string: ")
        For Each ch In str
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
        Next
        Console.WriteLine()
        Console.WriteLine()

        ' Encode a Unicode string.
        Dim bytes() As Byte = cp1252.GetBytes(str)
        Console.Write("Encoded bytes: ")
        For Each byt In bytes
            Console.Write("{0:X2} ", byt)
        Next
        Console.WriteLine()
        Console.WriteLine()

        ' Decode the string.
        Dim str2 As String = cp1252.GetString(bytes)
        Console.WriteLine("String round-tripped: {0}", str.Equals(str2))
        If Not str.Equals(str2) Then
            Console.WriteLine(str2)
            For Each ch In str2
                Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
            Next
        End If
    End Sub
End Module
' The example displays the following output:
'       Original string: Ⓢ ⁵ ∞
'       Code points in string: 24C8 0020 2075 0020 221E
'       
'       Encoded bytes: 3F 20 35 20 38
'       
'       String round-tripped: False
'       ? 5 8
'       003F 0020 0035 0020 0038

最適なマッピングは、Unicode データをコードページデータにエンコードする Encoding オブジェクトの既定の動作であり、この動作に依存するレガシアプリケーションがあります。ただし、ほとんどの新しいアプリケーションでは、セキュリティ上の理由から最適な動作を避ける必要があります。たとえば、アプリケーションでは、最適なエンコードを使用してドメイン名を配置しないでください。

注

エンコード用にカスタムの最適なフォールバックマッピングを実装することもできます。詳細については、「カスタムフォールバック戦略の実装」セクションを参照してください。

最適フォールバックがエンコードオブジェクトの既定値である場合は、EncodingまたはEncoding.GetEncoding(Int32, EncoderFallback, DecoderFallback)オーバーロードを呼び出してEncoding.GetEncoding(String, EncoderFallback, DecoderFallback) オブジェクトを取得するときに、別のフォールバック戦略を選択できます。次のセクションでは、コードページ 1252 にマップできない各文字をアスタリスク (*) に置き換える例を示します。

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
      Encoding cp1252r = Encoding.GetEncoding(1252,
                                  new EncoderReplacementFallback("*"),
                                  new DecoderReplacementFallback("*"));

      string str1 = "\u24C8 \u2075 \u221E";
      Console.WriteLine(str1);
      foreach (var ch in str1)
         Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

      Console.WriteLine();

      byte[] bytes = cp1252r.GetBytes(str1);
      string str2 = cp1252r.GetString(bytes);
      Console.WriteLine($"Round-trip: {str1.Equals(str2)}");
      if (! str1.Equals(str2)) {
         Console.WriteLine(str2);
         foreach (var ch in str2)
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

         Console.WriteLine();
      }
   }
}
// The example displays the following output:
//       Ⓢ ⁵ ∞
//       24C8 0020 2075 0020 221E
//       Round-trip: False
//       * * *
//       002A 0020 002A 0020 002A

Imports System.Text

Module Example
    Public Sub Main()
        Dim cp1252r As Encoding = Encoding.GetEncoding(1252,
                                           New EncoderReplacementFallback("*"),
                                           New DecoderReplacementFallback("*"))

        Dim str1 As String = String.Format("{0} {1} {2}", ChrW(&h24C8), ChrW(&h2075), ChrW(&h221E))
        Console.WriteLine(str1)
        For Each ch In str1
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
        Next
        Console.WriteLine()

        Dim bytes() As Byte = cp1252r.GetBytes(str1)
        Dim str2 As String = cp1252r.GetString(bytes)
        Console.WriteLine("Round-trip: {0}", str1.Equals(str2))
        If Not str1.Equals(str2) Then
            Console.WriteLine(str2)
            For Each ch In str2
                Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
            Next
            Console.WriteLine()
        End If
    End Sub
End Module
' The example displays the following output:
'       Ⓢ ⁵ ∞
'       24C8 0020 2075 0020 221E
'       Round-trip: False
'       * * *
'       002A 0020 002A 0020 002A

代替フォールバック

ターゲットスキームに完全に一致する文字がないが、マッピングできる適切な文字がない場合、アプリケーションは置換文字または文字列を指定できます。これは Unicode デコーダーの既定の動作であり、デコードできない 2 バイトシーケンスを REPLACEMENT_CHARACTER (U+FFFD) に置き換えます。これは、エンコードまたはデコードできない各文字を疑問符に置き換える、 ASCIIEncoding クラスの既定の動作でもあります。次の例は、前の例の Unicode 文字列の文字置換を示しています。出力が示すように、ASCII バイト値にデコードできない各文字は、疑問符の ASCII コードである 0x3F に置き換えられます。

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
      Encoding enc = Encoding.ASCII;

      string str1 = "\u24C8 \u2075 \u221E";
      Console.WriteLine(str1);
      foreach (var ch in str1)
         Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

      Console.WriteLine("\n");

      // Encode the original string using the ASCII encoder.
      byte[] bytes = enc.GetBytes(str1);
      Console.Write("Encoded bytes: ");
      foreach (var byt in bytes)
         Console.Write("{0:X2} ", byt);
      Console.WriteLine("\n");

      // Decode the ASCII bytes.
      string str2 = enc.GetString(bytes);
      Console.WriteLine($"Round-trip: {str1.Equals(str2)}");
      if (! str1.Equals(str2)) {
         Console.WriteLine(str2);
         foreach (var ch in str2)
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

         Console.WriteLine();
      }
   }
}
// The example displays the following output:
//       Ⓢ ⁵ ∞
//       24C8 0020 2075 0020 221E
//
//       Encoded bytes: 3F 20 3F 20 3F
//
//       Round-trip: False
//       ? ? ?
//       003F 0020 003F 0020 003F

Imports System.Text

Module Example
    Public Sub Main()
        Dim enc As Encoding = Encoding.Ascii

        Dim str1 As String = String.Format("{0} {1} {2}", ChrW(&h24C8), ChrW(&h2075), ChrW(&h221E))
        Console.WriteLine(str1)
        For Each ch In str1
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
        Next
        Console.WriteLine()
        Console.WriteLine()

        ' Encode the original string using the ASCII encoder.
        Dim bytes() As Byte = enc.GetBytes(str1)
        Console.Write("Encoded bytes: ")
        For Each byt In bytes
            Console.Write("{0:X2} ", byt)
        Next
        Console.WriteLine()
        Console.WriteLine()

        ' Decode the ASCII bytes.
        Dim str2 As String = enc.GetString(bytes)
        Console.WriteLine("Round-trip: {0}", str1.Equals(str2))
        If Not str1.Equals(str2) Then
            Console.WriteLine(str2)
            For Each ch In str2
                Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
            Next
            Console.WriteLine()
        End If
    End Sub
End Module
' The example displays the following output:
'       Ⓢ ⁵ ∞
'       24C8 0020 2075 0020 221E
'       
'       Encoded bytes: 3F 20 3F 20 3F
'       
'       Round-trip: False
'       ? ? ?
'       003F 0020 003F 0020 003F

.NET には、 EncoderReplacementFallback クラスと DecoderReplacementFallback クラスが含まれています。これは、文字がエンコードまたはデコード操作で正確にマップされない場合は、置換文字列に置き換えます。既定では、この置換文字列は疑問符ですが、クラスコンストラクターオーバーロードを呼び出して別の文字列を選択できます。通常、置換文字列は 1 文字ですが、これは必須ではありません。次の例では、置換文字列としてアスタリスク (*) を使用する EncoderReplacementFallback オブジェクトをインスタンス化することで、コードページ 1252 エンコーダーの動作を変更します。

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
      Encoding cp1252r = Encoding.GetEncoding(1252,
                                  new EncoderReplacementFallback("*"),
                                  new DecoderReplacementFallback("*"));

      string str1 = "\u24C8 \u2075 \u221E";
      Console.WriteLine(str1);
      foreach (var ch in str1)
         Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

      Console.WriteLine();

      byte[] bytes = cp1252r.GetBytes(str1);
      string str2 = cp1252r.GetString(bytes);
      Console.WriteLine($"Round-trip: {str1.Equals(str2)}");
      if (! str1.Equals(str2)) {
         Console.WriteLine(str2);
         foreach (var ch in str2)
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

         Console.WriteLine();
      }
   }
}
// The example displays the following output:
//       Ⓢ ⁵ ∞
//       24C8 0020 2075 0020 221E
//       Round-trip: False
//       * * *
//       002A 0020 002A 0020 002A

Imports System.Text

Module Example
    Public Sub Main()
        Dim cp1252r As Encoding = Encoding.GetEncoding(1252,
                                           New EncoderReplacementFallback("*"),
                                           New DecoderReplacementFallback("*"))

        Dim str1 As String = String.Format("{0} {1} {2}", ChrW(&h24C8), ChrW(&h2075), ChrW(&h221E))
        Console.WriteLine(str1)
        For Each ch In str1
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
        Next
        Console.WriteLine()

        Dim bytes() As Byte = cp1252r.GetBytes(str1)
        Dim str2 As String = cp1252r.GetString(bytes)
        Console.WriteLine("Round-trip: {0}", str1.Equals(str2))
        If Not str1.Equals(str2) Then
            Console.WriteLine(str2)
            For Each ch In str2
                Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
            Next
            Console.WriteLine()
        End If
    End Sub
End Module
' The example displays the following output:
'       Ⓢ ⁵ ∞
'       24C8 0020 2075 0020 221E
'       Round-trip: False
'       * * *
'       002A 0020 002A 0020 002A

注

エンコードの置換クラスを実装することもできます。詳細については、「カスタムフォールバック戦略の実装」セクションを参照してください。

QUESTION MARK (U+003F) に加えて、Unicode REPLACEMENT CHARACTER (U+FFFD) は置換文字列として一般的に使用されます。特に、Unicode 文字に正常に変換できないバイトシーケンスをデコードする場合に使用されます。ただし、任意の置換文字列を自由に選択でき、複数の文字を含めることができます。

例外フォールバック

エンコーダーは、最適なフォールバックまたは置換文字列を提供する代わりに、文字のセットをエンコードできない場合は EncoderFallbackException をスローし、デコーダーはバイト配列をデコードできない場合に DecoderFallbackException をスローできます。エンコード操作とデコード操作で例外をスローするには、 EncoderExceptionFallback オブジェクトと DecoderExceptionFallback オブジェクトをそれぞれ Encoding.GetEncoding(String, EncoderFallback, DecoderFallback) メソッドに指定します。次の例は、 ASCIIEncoding クラスでの例外フォールバックを示しています。

using System;
using System.Text;

public class Example
{
   public static void Main()
   {
      Encoding enc = Encoding.GetEncoding("us-ascii",
                                          new EncoderExceptionFallback(),
                                          new DecoderExceptionFallback());

      string str1 = "\u24C8 \u2075 \u221E";
      Console.WriteLine(str1);
      foreach (var ch in str1)
         Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

      Console.WriteLine("\n");

      // Encode the original string using the ASCII encoder.
      byte[] bytes = {};
      try {
         bytes = enc.GetBytes(str1);
         Console.Write("Encoded bytes: ");
         foreach (var byt in bytes)
            Console.Write("{0:X2} ", byt);

         Console.WriteLine();
      }
      catch (EncoderFallbackException e) {
         Console.Write("Exception: ");
         if (e.IsUnknownSurrogate())
            Console.WriteLine($"Unable to encode surrogate pair 0x{Convert.ToUInt16(e.CharUnknownHigh):X4} 0x{Convert.ToUInt16(e.CharUnknownLow):X3} at index {e.Index}.");
         else
            Console.WriteLine($"Unable to encode 0x{Convert.ToUInt16(e.CharUnknown):X4} at index {e.Index}.");
         return;
      }
      Console.WriteLine();

      // Decode the ASCII bytes.
      try {
         string str2 = enc.GetString(bytes);
         Console.WriteLine($"Round-trip: {str1.Equals(str2)}");
         if (! str1.Equals(str2)) {
            Console.WriteLine(str2);
            foreach (var ch in str2)
               Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

            Console.WriteLine();
         }
      }
      catch (DecoderFallbackException e) {
         Console.Write("Unable to decode byte(s) ");
         foreach (byte unknown in e.BytesUnknown)
            Console.Write("0x{0:X2} ");

         Console.WriteLine($"at index {e.Index}");
      }
   }
}
// The example displays the following output:
//       Ⓢ ⁵ ∞
//       24C8 0020 2075 0020 221E
//
//       Exception: Unable to encode 0x24C8 at index 0.

Imports System.Text

Module Example
    Public Sub Main()
        Dim enc As Encoding = Encoding.GetEncoding("us-ascii",
                                                   New EncoderExceptionFallback(),
                                                   New DecoderExceptionFallback())

        Dim str1 As String = String.Format("{0} {1} {2}", ChrW(&h24C8), ChrW(&h2075), ChrW(&h221E))
        Console.WriteLine(str1)
        For Each ch In str1
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
        Next
        Console.WriteLine()
        Console.WriteLine()

        ' Encode the original string using the ASCII encoder.
        Dim bytes() As Byte = {}
        Try
            bytes = enc.GetBytes(str1)
            Console.Write("Encoded bytes: ")
            For Each byt In bytes
                Console.Write("{0:X2} ", byt)
            Next
            Console.WriteLine()
        Catch e As EncoderFallbackException
            Console.Write("Exception: ")
            If e.IsUnknownSurrogate() Then
                Console.WriteLine("Unable to encode surrogate pair 0x{0:X4} 0x{1:X3} at index {2}.",
                                  Convert.ToUInt16(e.CharUnknownHigh),
                                  Convert.ToUInt16(e.CharUnknownLow),
                                  e.Index)
            Else
                Console.WriteLine("Unable to encode 0x{0:X4} at index {1}.",
                                  Convert.ToUInt16(e.CharUnknown),
                                  e.Index)
            End If
            Exit Sub
        End Try
        Console.WriteLine()

        ' Decode the ASCII bytes.
        Try
            Dim str2 As String = enc.GetString(bytes)
            Console.WriteLine("Round-trip: {0}", str1.Equals(str2))
            If Not str1.Equals(str2) Then
                Console.WriteLine(str2)
                For Each ch In str2
                    Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
                Next
                Console.WriteLine()
            End If
        Catch e As DecoderFallbackException
            Console.Write("Unable to decode byte(s) ")
            For Each unknown As Byte In e.BytesUnknown
                Console.Write("0x{0:X2} ")
            Next
            Console.WriteLine("at index {0}", e.Index)
        End Try
    End Sub
End Module
' The example displays the following output:
'       Ⓢ ⁵ ∞
'       24C8 0020 2075 0020 221E
'       
'       Exception: Unable to encode 0x24C8 at index 0.

注

エンコード操作のカスタム例外ハンドラーを実装することもできます。詳細については、「カスタムフォールバック戦略の実装」セクションを参照してください。

EncoderFallbackExceptionオブジェクトとDecoderFallbackException オブジェクトは、例外の原因となった条件に関する次の情報を提供します。

EncoderFallbackException オブジェクトには、IsUnknownSurrogate メソッドが含まれています。エンコードできない文字が不明なサロゲートペア (その場合、メソッドはtrueを返します) を表すか、不明な 1 文字 (この場合、メソッドはfalseを返します) を表すかを示します。サロゲートペアの文字は、 EncoderFallbackException.CharUnknownHigh プロパティと EncoderFallbackException.CharUnknownLow プロパティから使用できます。不明な 1 文字は、 EncoderFallbackException.CharUnknown プロパティから使用できます。 EncoderFallbackException.Index プロパティは、エンコードできなかった最初の文字が見つかった文字列内の位置を示します。
DecoderFallbackException オブジェクトには、デコードできないバイト配列を返すBytesUnknown プロパティが含まれています。 DecoderFallbackException.Index プロパティは、不明なバイトの開始位置を示します。

EncoderFallbackExceptionオブジェクトとDecoderFallbackException オブジェクトは、例外に関する適切な診断情報を提供しますが、エンコードまたはデコードバッファーへのアクセスは提供しません。そのため、エンコードまたはデコードメソッド内で無効なデータを置き換えたり修正したりすることはできません。

カスタムフォールバック戦略の実装

.NET には、コードページによって内部的に実装される最適なマッピングに加えて、フォールバック戦略を実装するための次のクラスが含まれています。

エンコード操作で文字を置き換えるには、 EncoderReplacementFallback と EncoderReplacementFallbackBuffer を使用します。
デコード操作で文字を置き換えるには、 DecoderReplacementFallback と DecoderReplacementFallbackBuffer を使用します。
EncoderExceptionFallbackとEncoderExceptionFallbackBufferを使用して、文字をエンコードできない場合にEncoderFallbackExceptionをスローします。
DecoderExceptionFallbackとDecoderExceptionFallbackBufferを使用して、文字をデコードできないときにDecoderFallbackExceptionをスローします。

さらに、次の手順に従って、最適フォールバック、置換フォールバック、または例外フォールバックを使用するカスタムソリューションを実装できます。

エンコード操作の EncoderFallback から、およびデコード操作の DecoderFallback からクラスを派生させます。
エンコード操作の EncoderFallbackBuffer から、およびデコード操作の DecoderFallbackBuffer からクラスを派生させます。
例外フォールバックの場合、定義済みの EncoderFallbackException クラスと DecoderFallbackException クラスがニーズを満たしていない場合は、 Exception や ArgumentExceptionなどの例外オブジェクトからクラスを派生させます。

EncoderFallback または DecoderFallback からの派生

カスタムフォールバックソリューションを実装するには、エンコード操作用の EncoderFallback とデコード操作の DecoderFallback から継承するクラスを作成する必要があります。これらのクラスのインスタンスは、 Encoding.GetEncoding(String, EncoderFallback, DecoderFallback) メソッドに渡され、エンコードクラスとフォールバック実装の仲介役として機能します。

エンコーダーまたはデコーダーのカスタムフォールバックソリューションを作成する場合は、次のメンバーを実装する必要があります。

EncoderFallback.MaxCharCountプロパティまたはDecoderFallback.MaxCharCountプロパティは、最適な適合、置換、または例外フォールバックが1つの文字に対して返すことができる最大文字数を示します。カスタム例外フォールバックの場合、その値は 0 です。
カスタム EncoderFallback.CreateFallbackBufferまたはDecoderFallback.CreateFallbackBuffer実装を返すEncoderFallbackBufferまたはDecoderFallbackBuffer メソッド。このメソッドは、エンコーダーが正常にエンコードできない最初の文字を検出したとき、またはデコードに成功できない最初のバイトを検出したときにデコーダーによって呼び出されます。

EncoderFallbackBuffer または DecoderFallbackBuffer からの派生

カスタムフォールバックソリューションを実装するには、エンコード操作用の EncoderFallbackBuffer とデコード操作の DecoderFallbackBuffer から継承するクラスも作成する必要があります。これらのクラスのインスタンスは、CreateFallbackBufferクラスとEncoderFallback クラスのDecoderFallback メソッドによって返されます。 EncoderFallback.CreateFallbackBuffer メソッドは、エンコードできない最初の文字が検出されたときにエンコーダーによって呼び出され、デコードできない 1 つ以上のバイトが検出されると、デコーダーによって DecoderFallback.CreateFallbackBuffer メソッドが呼び出されます。 EncoderFallbackBufferクラスとDecoderFallbackBuffer クラスは、フォールバック実装を提供します。各インスタンスは、エンコードできない文字またはデコードできないバイトシーケンスを置き換えるフォールバック文字を含むバッファーを表します。

エンコーダーまたはデコーダーのカスタムフォールバックソリューションを作成する場合は、次のメンバーを実装する必要があります。

EncoderFallbackBuffer.Fallbackメソッドまたは DecoderFallbackBuffer.Fallback メソッド。 EncoderFallbackBuffer.Fallback は、エンコードできない文字に関する情報をフォールバックバッファーに提供するためにエンコーダーによって呼び出されます。エンコードする文字はサロゲートペアである可能性があるため、このメソッドはオーバーロードされます。 1 つのオーバーロードには、エンコードする文字と文字列内のインデックスが渡されます。 2 番目のオーバーロードは、文字列内のインデックスと共に high および low サロゲートを渡します。 DecoderFallbackBuffer.Fallback メソッドは、デコードできないバイトに関する情報をフォールバックバッファーに提供するためにデコーダーによって呼び出されます。このメソッドには、デコードできないバイトの配列と、最初のバイトのインデックスが渡されます。フォールバックメソッドは、フォールバックバッファーが最適な文字または置換文字を提供できる場合は true を返す必要があります。それ以外の場合は、 falseを返す必要があります。例外フォールバックの場合、フォールバックメソッドは必ず例外をスローしなければなりません。
EncoderFallbackBuffer.GetNextCharメソッドまたは DecoderFallbackBuffer.GetNextChar メソッド。フォールバックバッファーから次の文字を取得するためにエンコーダーまたはデコーダーによって繰り返し呼び出されます。すべてのフォールバック文字が返された場合、メソッドは U+0000 を返す必要があります。
フォールバックバッファーに残っている文字数を返す EncoderFallbackBuffer.Remaining または DecoderFallbackBuffer.Remaining プロパティ。
フォールバックバッファー内の現在位置を前の文字に移動する EncoderFallbackBuffer.MovePrevious または DecoderFallbackBuffer.MovePrevious メソッド。
フォールバックバッファーを再初期化する EncoderFallbackBuffer.Reset または DecoderFallbackBuffer.Reset メソッド。

フォールバック実装が最適なフォールバックまたは代替フォールバックである場合、 EncoderFallbackBuffer および DecoderFallbackBuffer から派生したクラスは、バッファー内の正確な文字数と、返されるバッファー内の次の文字のインデックスという 2 つのプライベートインスタンスフィールドも保持します。

EncoderFallback の例

前の例では、置換フォールバックを使用して、ASCII 文字に対応していない Unicode 文字をアスタリスク (*) に置き換えました。次の例では、代わりにカスタムの最適フォールバック実装を使用して、非 ASCII 文字のマッピングを改善します。

次のコードでは、CustomMapperから派生した EncoderFallback という名前のクラスを定義し、非 ASCII 文字の最適なマッピングを処理します。そのCreateFallbackBufferメソッドは、CustomMapperFallbackBuffer実装を提供するEncoderFallbackBuffer オブジェクトを返します。 CustomMapper クラスは、Dictionary<TKey,TValue> オブジェクトを使用して、サポートされていない Unicode 文字 (キー値) とそれに対応する 8 ビット文字 (64 ビット整数で 2 つの連続したバイトに格納される) のマッピングを格納します。このマッピングをフォールバックバッファーで使用できるようにするには、 CustomMapper インスタンスをパラメーターとして CustomMapperFallbackBuffer クラスコンストラクターに渡します。最も長いマッピングは Unicode 文字 U+221E の文字列 "INF" であるため、 MaxCharCount プロパティは 3 を返します。

public class CustomMapper : EncoderFallback
{
   public string DefaultString;
   internal Dictionary<ushort, ulong> mapping;

   public CustomMapper() : this("*")
   {
   }

   public CustomMapper(string defaultString)
   {
      this.DefaultString = defaultString;

      // Create table of mappings
      mapping = new Dictionary<ushort, ulong>();
      mapping.Add(0x24C8, 0x53);
      mapping.Add(0x2075, 0x35);
      mapping.Add(0x221E, 0x49004E0046);
   }

   public override EncoderFallbackBuffer CreateFallbackBuffer()
   {
      return new CustomMapperFallbackBuffer(this);
   }

   public override int MaxCharCount
   {
      get { return 3; }
   }
}

Public Class CustomMapper : Inherits EncoderFallback
    Public DefaultString As String
    Friend mapping As Dictionary(Of UShort, ULong)

    Public Sub New()
        Me.New("?")
    End Sub

    Public Sub New(ByVal defaultString As String)
        Me.DefaultString = defaultString

        ' Create table of mappings
        mapping = New Dictionary(Of UShort, ULong)
        mapping.Add(&H24C8, &H53)
        mapping.Add(&H2075, &H35)
        mapping.Add(&H221E, &H49004E0046)
    End Sub

    Public Overrides Function CreateFallbackBuffer() As System.Text.EncoderFallbackBuffer
        Return New CustomMapperFallbackBuffer(Me)
    End Function

    Public Overrides ReadOnly Property MaxCharCount As Integer
        Get
            Return 3
        End Get
    End Property
End Class

次のコードでは、CustomMapperFallbackBufferから派生したEncoderFallbackBuffer クラスを定義します。最適なマッピングを含み、 CustomMapper インスタンスで定義されているディクショナリは、そのクラスコンストラクターから使用できます。 ASCII エンコーダーでエンコードできない Unicode 文字のいずれかがマッピングディクショナリで定義されている場合、その Fallback メソッドは true を返します。それ以外の場合は、 falseを返します。フォールバックごとに、プライベート count 変数は返される残りの文字数を示し、プライベート index 変数は、返される次の文字の文字列バッファー内の位置 ( charsToReturn) を示します。

public class CustomMapperFallbackBuffer : EncoderFallbackBuffer
{
   int count = -1;                   // Number of characters to return
   int index = -1;                   // Index of character to return
   CustomMapper fb;
   string charsToReturn;

   public CustomMapperFallbackBuffer(CustomMapper fallback)
   {
      this.fb = fallback;
   }

   public override bool Fallback(char charUnknownHigh, char charUnknownLow, int index)
   {
      // Do not try to map surrogates to ASCII.
      return false;
   }

   public override bool Fallback(char charUnknown, int index)
   {
      // Return false if there are already characters to map.
      if (count >= 1) return false;

      // Determine number of characters to return.
      charsToReturn = String.Empty;

      ushort key = Convert.ToUInt16(charUnknown);
      if (fb.mapping.ContainsKey(key)) {
         byte[] bytes = BitConverter.GetBytes(fb.mapping[key]);
         int ctr = 0;
         foreach (var byt in bytes) {
            if (byt > 0) {
               ctr++;
               charsToReturn += (char) byt;
            }
         }
         count = ctr;
      }
      else {
         // Return default.
         charsToReturn = fb.DefaultString;
         count = 1;
      }
      this.index = charsToReturn.Length - 1;

      return true;
   }

   public override char GetNextChar()
   {
      // We'll return a character if possible, so subtract from the count of chars to return.
      count--;
      // If count is less than zero, we've returned all characters.
      if (count < 0)
         return '\u0000';

      this.index--;
      return charsToReturn[this.index + 1];
   }

   public override bool MovePrevious()
   {
      // Original: if count >= -1 and pos >= 0
      if (count >= -1) {
         count++;
         return true;
      }
      else {
         return false;
      }
   }

   public override int Remaining
   {
      get { return count < 0 ? 0 : count; }
   }

   public override void Reset()
   {
      count = -1;
      index = -1;
   }
}

Public Class CustomMapperFallbackBuffer : Inherits EncoderFallbackBuffer

    Dim count As Integer = -1        ' Number of characters to return
    Dim index As Integer = -1        ' Index of character to return
    Dim fb As CustomMapper
    Dim charsToReturn As String

    Public Sub New(ByVal fallback As CustomMapper)
        MyBase.New()
        Me.fb = fallback
    End Sub

    Public Overloads Overrides Function Fallback(ByVal charUnknownHigh As Char, ByVal charUnknownLow As Char, ByVal index As Integer) As Boolean
        ' Do not try to map surrogates to ASCII.
        Return False
    End Function

    Public Overloads Overrides Function Fallback(ByVal charUnknown As Char, ByVal index As Integer) As Boolean
        ' Return false if there are already characters to map.
        If count >= 1 Then Return False

        ' Determine number of characters to return.
        charsToReturn = String.Empty

        Dim key As UShort = Convert.ToUInt16(charUnknown)
        If fb.mapping.ContainsKey(key) Then
            Dim bytes() As Byte = BitConverter.GetBytes(fb.mapping.Item(key))
            Dim ctr As Integer
            For Each byt In bytes
                If byt > 0 Then
                    ctr += 1
                    charsToReturn += Chr(byt)
                End If
            Next
            count = ctr
        Else
            ' Return default.
            charsToReturn = fb.DefaultString
            count = 1
        End If
        Me.index = charsToReturn.Length - 1

        Return True
    End Function

    Public Overrides Function GetNextChar() As Char
        ' We'll return a character if possible, so subtract from the count of chars to return.
        count -= 1
        ' If count is less than zero, we've returned all characters.
        If count < 0 Then Return ChrW(0)

        Me.index -= 1
        Return charsToReturn(Me.index + 1)
    End Function

    Public Overrides Function MovePrevious() As Boolean
        ' Original: if count >= -1 and pos >= 0
        If count >= -1 Then
            count += 1
            Return True
        Else
            Return False
        End If
    End Function

    Public Overrides ReadOnly Property Remaining As Integer
        Get
            Return If(count < 0, 0, count)
        End Get
    End Property

    Public Overrides Sub Reset()
        count = -1
        index = -1
    End Sub
End Class

次のコードは、 CustomMapper オブジェクトをインスタンス化し、そのインスタンスを Encoding.GetEncoding(String, EncoderFallback, DecoderFallback) メソッドに渡します。出力は、最適なフォールバック実装が元の文字列の 3 つの非 ASCII 文字を正常に処理することを示します。

using System;
using System.Collections.Generic;
using System.Text;

class Program
{
   static void Main()
   {
      Encoding enc = Encoding.GetEncoding("us-ascii", new CustomMapper(), new DecoderExceptionFallback());

      string str1 = "\u24C8 \u2075 \u221E";
      Console.WriteLine(str1);
      for (int ctr = 0; ctr <= str1.Length - 1; ctr++) {
         Console.Write("{0} ", Convert.ToUInt16(str1[ctr]).ToString("X4"));
         if (ctr == str1.Length - 1)
            Console.WriteLine();
      }
      Console.WriteLine();

      // Encode the original string using the ASCII encoder.
      byte[] bytes = enc.GetBytes(str1);
      Console.Write("Encoded bytes: ");
      foreach (var byt in bytes)
         Console.Write("{0:X2} ", byt);

      Console.WriteLine("\n");

      // Decode the ASCII bytes.
      string str2 = enc.GetString(bytes);
      Console.WriteLine($"Round-trip: {str1.Equals(str2)}");
      if (! str1.Equals(str2)) {
         Console.WriteLine(str2);
         foreach (var ch in str2)
            Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"));

         Console.WriteLine();
      }
   }
}

Imports System.Text
Imports System.Collections.Generic

Module Module1

    Sub Main()
        Dim enc As Encoding = Encoding.GetEncoding("us-ascii", New CustomMapper(), New DecoderExceptionFallback())

        Dim str1 As String = String.Format("{0} {1} {2}", ChrW(&H24C8), ChrW(&H2075), ChrW(&H221E))
        Console.WriteLine(str1)
        For ctr As Integer = 0 To str1.Length - 1
            Console.Write("{0} ", Convert.ToUInt16(str1(ctr)).ToString("X4"))
            If ctr = str1.Length - 1 Then Console.WriteLine()
        Next
        Console.WriteLine()

        ' Encode the original string using the ASCII encoder.
        Dim bytes() As Byte = enc.GetBytes(str1)
        Console.Write("Encoded bytes: ")
        For Each byt In bytes
            Console.Write("{0:X2} ", byt)
        Next
        Console.WriteLine()
        Console.WriteLine()

        ' Decode the ASCII bytes.
        Dim str2 As String = enc.GetString(bytes)
        Console.WriteLine("Round-trip: {0}", str1.Equals(str2))
        If Not str1.Equals(str2) Then
            Console.WriteLine(str2)
            For Each ch In str2
                Console.Write("{0} ", Convert.ToUInt16(ch).ToString("X4"))
            Next
            Console.WriteLine()
        End If
    End Sub
End Module

こちらも参照ください

フィードバック

このページはお役に立ちましたか?

Last updated on 2026-03-26