霍夫曼樹

樹的帶權路徑長度

設二叉樹具有 \(n\) 個帶權葉結點，從根結點到各葉結點的路徑長度與相應葉節點權值的乘積之和稱為 樹的帶權路徑長度（Weighted Path Length of Tree，WPL）。

設 \(w_i\) 為二叉樹第 \(i\) 個葉結點的權值，\(l_i\) 為從根結點到第 \(i\) 個葉結點的路徑長度，則 WPL 計算公式如下：

\[ WPL=\sum_{i=1}^nw_il_i \]

如上圖所示，其 WPL 計算過程與結果如下：

\[ WPL=2*2+3*2+4*2+7*2=4+6+8+14=32 \]

結構

對於給定一組具有確定權值的葉結點，可以構造出不同的二叉樹，其中，WPL 最小的二叉樹 稱為 霍夫曼樹（Huffman Tree）。

對於霍夫曼樹來説，其葉結點權值越小，離根越遠，葉結點權值越大，離根越近，此外其僅有葉結點的度為 \(0\)，其他結點度均為 \(2\)。

霍夫曼算法

霍夫曼算法用於構造一棵霍夫曼樹，算法步驟如下：

初始化：由給定的 \(n\) 個權值構造 \(n\) 棵只有一個根節點的二叉樹，得到一個二叉樹集合 \(F\)。
選取與合併：從二叉樹集合 \(F\) 中選取根節點權值 最小的兩棵 二叉樹分別作為左右子樹構造一棵新的二叉樹，這棵新二叉樹的根節點的權值為其左、右子樹根結點的權值和。
刪除與加入：從 \(F\) 中刪除作為左、右子樹的兩棵二叉樹，並將新建立的二叉樹加入到 \(F\) 中。
重複 2、3 步，當集合中只剩下一棵二叉樹時，這棵二叉樹就是霍夫曼樹。

霍夫曼編碼

在進行程序設計時，通常給每一個字符標記一個單獨的代碼來表示一組字符，即編碼。

在進行二進制編碼時，假設所有的代碼都等長，那麼表示 \(n\) 個不同的字符需要 \(\left \lceil \log_2 n \right \rceil\) 位，稱為 等長編碼。

如果每個字符的 使用頻率相等，那麼等長編碼無疑是空間效率最高的編碼方法，而如果字符出現的頻率不同，則可以讓頻率高的字符采用盡可能短的編碼，頻率低的字符采用盡可能長的編碼，來構造出一種 不等長編碼，從而獲得更好的空間效率。

在設計不等長編碼時，要考慮解碼的唯一性，如果一組編碼中任一編碼都不是其他任何一個編碼的前綴，那麼稱這組編碼為 前綴編碼，其保證了編碼被解碼時的唯一性。

霍夫曼樹可用於構造 最短的前綴編碼，即 霍夫曼編碼（Huffman Code），其構造步驟如下：

設需要編碼的字符集為：\(d_1,d_2,\dots,d_n\)，他們在字符串中出現的頻率為：\(w_1,w_2,\dots,w_n\)。
以 \(d_1,d_2,\dots,d_n\) 作為葉結點，\(w_1,w_2,\dots,w_n\) 作為葉結點的權值，構造一棵霍夫曼樹。
規定哈夫曼編碼樹的左分支代表 \(0\)，右分支代表 \(1\)，則從根結點到每個葉結點所經過的路徑組成的 \(0\)、\(1\) 序列即為該葉結點對應字符的編碼。

示例代碼

霍夫曼樹的構建

typedef struct HNode {
  int weight;
  HNode *lchild, *rchild;
} * Htree;

Htree createHuffmanTree(int arr[], int n) {
  Htree forest[N];
  Htree root = NULL;
  for (int i = 0; i < n; i++) {  // 將所有點存入森林
    Htree temp;
    temp = (Htree)malloc(sizeof(HNode));
    temp->weight = arr[i];
    temp->lchild = temp->rchild = NULL;
    forest[i] = temp;
  }

  for (int i = 1; i < n; i++) {  // n-1 次循環建哈夫曼樹
    int minn = -1, minnSub;  // minn 為最小值樹根下標，minnsub 為次小值樹根下標
    for (int j = 0; j < n; j++) {
      if (forest[j] != NULL && minn == -1) {
        minn = j;
        continue;
      }
      if (forest[j] != NULL) {
        minnSub = j;
        break;
      }
    }

    for (int j = minnSub; j < n; j++) {  // 根據 minn 與 minnSub 賦值
      if (forest[j] != NULL) {
        if (forest[j]->weight < forest[minn]->weight) {
          minnSub = minn;
          minn = j;
        } else if (forest[j]->weight < forest[minnSub]->weight) {
          minnSub = j;
        }
      }
    }

    // 建新樹
    root = (Htree)malloc(sizeof(HNode));
    root->weight = forest[minn]->weight + forest[minnSub]->weight;
    root->lchild = forest[minn];
    root->rchild = forest[minnSub];

    forest[minn] = root;     // 指向新樹的指針賦給 minn 位置
    forest[minnSub] = NULL;  // minnSub 位置為空
  }
  return root;
}

計算構成霍夫曼樹的 WPL

typedef struct HNode {
  int weight;
  HNode *lchild, *rchild;
} * Htree;

int getWPL(Htree root, int len) {  // 遞歸實現，對於已經建好的霍夫曼樹，求 WPL
  if (root == NULL)
    return 0;
  else {
    if (root->lchild == NULL && root->rchild == NULL)  // 葉節點
      return root->weight * len;
    else {
      int left = getWPL(root->lchild, len + 1);
      int right = getWPL(root->rchild, len + 1);
      return left + right;
    }
  }
}

對於未建好的霍夫曼樹，直接求其 WPL

int getWPL(int arr[], int n) {  // 對於未建好的霍夫曼樹，直接求其 WPL
  priority_queue<int, vector<int>, greater<int>> huffman;  // 小根堆
  for (int i = 0; i < n; i++) huffman.push(arr[i]);

  int res = 0;
  for (int i = 0; i < n - 1; i++) {
    int x = huffman.top();
    huffman.pop();
    int y = huffman.top();
    huffman.pop();
    int temp = x + y;
    res += temp;
    huffman.push(temp);
  }
  return res;
}

對於給定序列，計算霍夫曼編碼

typedef struct HNode {
  int weight;
  HNode *lchild, *rchild;
} * Htree;

void huffmanCoding(Htree root, int len, int arr[]) {  // 計算霍夫曼編碼
  if (root != NULL) {
    if (root->lchild == NULL && root->rchild == NULL) {
      printf("結點為 %d 的字符的編碼為: ", root->weight);
      for (int i = 0; i < len; i++) printf("%d", arr[i]);
      printf("\n");
    } else {
      arr[len] = 0;
      huffmanCoding(root->lchild, len + 1, arr);
      arr[len] = 1;
      huffmanCoding(root->rchild, len + 1, arr);
    }
  }
}

本页面最近更新：，更新历史
发现错误？想一起完善？在 GitHub 上编辑此页！
本页面贡献者：Alex-McAvoy
本页面的全部内容在 CC BY-SA 4.0 和 SATA 协议之条款下提供，附加条款亦可能应用