Last Updated: February 25, 2016

·

824

· hjwu

Compose bytes to UTF-8 string

#bitwise operation

先稍微記一下 UTF-8。

大於 ASCII 碼的，就會由上面的第一位元組的前幾位表示該 unicode 字元的長度，比如 110xxxxxx 前三位的二進位表示告訴我們這是個 2BYTE 的 UNICODE 字元；1110xxxx 是個三位的 UNICODE 字元，依此類推；xxx 的位置由字元編碼數的二進製表示的位填入。越靠右的 x 具有越少的特殊意義。只用最短的那個足夠表達一個字元編碼數的多位元組串。注意在多位元組串中，第一個位元組的開頭 "1" 的數目就是整個串中位元組的數目。

因此要把一堆 byte 組成 UTF-8 的字串，所以目前的想法是先判斷抓到的 byte 第幾位開始是 0，我參考了這篇 Get a specific bit from byte，利用 Bitwise Left Shift Operator(<<) 跟 Bitwise AND Operator (&) 來判斷。

private int GetZeroBit(byte b) 
{
    for (int i = 7; i >= 0; i--)
    {
        if ((b & (1 << i)) != 0) continue;
        else return 8 - i;
    }
    return 0;
}

原理很簡單， 1 << i 只有從右邊數來第 i 個位置為 1，然後又用 & 去比，所以只有當 b 的同樣位置也是 1 才會得到 true。所以就能判斷第一個 0 從哪出現。

byte[] myBytes;
List<string> myString = new List<string>();

for (int i = 0; i < myBytes.Length; i++)
{
   int zeroIndex = GetZeroDigit(myBytes[i]);

   if (zeroIndex > 0)
   {
      if (zeroIndex > 1) zeroIndex--;
      myString.Add(Encoding.UTF8.GetString(myBytes, i, zeroIndex));
      i = i + zeroIndex - 1;
   }                             
}

順便補充一下 Difference between >>> and >> 簡單說就是 >> 會保持正負號，也就是第一個 bit 的值不變，其他值右移。而 >>> 則是全部右移，最左邊補 0。

#bitwise operation

Written by 小叮噹文青

Related protips

Read Excel File in C#

1.115M

10

Why you shouldn't use Entity Framework with Transactions

331.3K

21

The //* // /// comment toggle trick

129K

27

Have a fresh tip? Share with Coderwall community!

Best #C# Authors

1.167M

#Architecture Design

331.3K

257.2K

134.2K

alexanderbrevig

130.8K

Related Tags

#bitwise operation

#native_company#

Filed Under

.NET Development Tips

Awesome Job

Post a job for only $299

Thanks to our sponsor

#native_title# #native_desc#