Data Structures in C++：树和二叉树

文章目录

树
基本术语
二叉树
遍历方式
二叉查找树的代码实现
红黑树
C++中的平衡二叉树 map

树

树是一种区别于数组和链表的基本数据结构。

数组由于数据的存储空间是连续的，所以其最大优势就是可以通过下标对有序数组进行折半查找/二分查找（Binary Search），速度快，时间复杂度 $O (l o g n)$ ，但在插入和删除操作时，需要移动大量元素，时间复杂度 $O (n)$ ；
链表只能进行线性查找/顺序查找（Sequential Search），逐元素比较，时间复杂度 $O (n)$ ，但插入和删除操作速度快，不需要移动大量元素，只改变指针的指向，时间复杂度 $O (n)$ 或 $O (1)$

平时在计算删除单链表的第i个节点的时间复杂度时一般认为是 $O (n)$ ，操作过程如下：
1. 先从头节点开始遍历链表，找到第i-1个节点
2. 将第i-1节点next指向第i个节点的next
可以看到时间主要花在了遍历链表上

如果已经拿到了要删除的第i个节点Node(i)，就不需要进行遍历操作和查找前驱节点了，直接拿Node(i+1)来覆盖Node(i)即可。具体的做法如下：
1.Node(i)->data=Node(i)-next->data;
2.Node(i)-next=Node(i+1)->next;
这样的时间复杂度就是 $O (1)$

树这种数据结构，同时集成了数组和链表的优势，不仅可以用二分查找（如下图），而且插入和删除也只需改变指针指向，每次查找、更新、插入或删除操作都可在 $O (l o g n)$ 时间内完成：

基本术语

根 root：非空树的第一层元素，存在且唯一
结点 node：包含一个数据元素及若干指向子树分支的信息，根据上下关系分为 父结点 Parent 和 子结点 Child
度 degree：一个结点拥有子树的数目称为结点的度，树中所有结点的度的最大值称为树的度
叶子结点 leaf：也称为终端结点，没有子树的结点或者度为零的结点
分支结点：也称为非终端结点，度不为零的结点称为非终端结点
结点的层次 level：从根结点开始为第1层，根结点的孩子结点为第2层，依此类推，如果某一个结点位于第L层，则其孩子结点位于第L+1层
树的深度 depth：也称为树的高度（height），树中所有结点的层次最大值
有序树 ordered binary tree：如果树中各棵子树的次序是有先后次序
无序树：如果树中各棵子树的次序没有先后次序
森林 forest：由m（m≥0）棵互不相交的树构成一片森林。如果把一棵非空的树的根结点删除，则该树就变成了一片森林，森林中的树由原来根结点的各棵子树构成

二叉树

在计算机编程中，一般使用的都是“二叉树” (Binary Tree)，也是 二叉查找树：

每个结点最多两个 子树(subtree)，且分左子树和右子树，即树的度最大为2
左子树和右子树是有序的，一般 左孩子结点 < 父结点 < 右孩子结点
第 $i$ 层上至多有 $2^{i - 1}$ 个节点，其中 $i \geq 1$ ：第1层1个结点，第11层1024个结点（1K），第21层100万个结点(1M)，第31层10亿个结点（1G） 因此，在100万个数据中查找最多只需比较21次即可。
深度为 $h$ 的二叉树中至多含有 $(2^{h} - 1)$ 个节点，其中 $h \geq 1$
对于任意二叉树，若叶子结点个数为 $n_{0}$ ，度为2的结点数为 $n_{2}$ ，则 $n_{0} = n_{2} + 1$

解析：
其他

遍历方式

需要判断空树，可以递归调用：

前序遍历：根节点 -> 前序遍历左子树 -> 前序遍历右子树
中序遍历：中序遍历左子树 -> 根节点 -> 中序遍历右子树
后序遍历：后序遍历左子树 -> 后序遍历右子树 -> 根节点

二叉查找树的代码实现

模板类代码：

#pragma once
#include <iostream>
#include <queue>

class Element  // 数据域
{
public:
    int key;
    char c;
    // 可添加更多的数据

public:
    Element(int _key = 0, char _c = '\0')
    {
        key = _key;
        c = _c;
    }
    bool operator>(const Element& other) const
    {
        return this->key > other.key;
    }
    bool operator<(const Element& other) const
    {
        return this->key < other.key;
    }
    bool operator==(const Element& other) const
    {
        return this->key == other.key;
    }
};


template<typename Type>
struct Node     // 树结点
{
    Element data;
    Node* leftChild = nullptr;
    Node* rightChild = nullptr;
    void show(int i);
};


template<typename Type>
class BinaryTree
{
public:
    BinaryTree(Node<Type>* _root = nullptr) { root = _root; }

    void inorder();                     // 中序遍历
    void inorder(Node<Type>* node);
    void preorder();                    // 前序遍历
    void preorder(Node<Type>* node);
    void postorder();                   // 后序遍历
    void postorder(Node<Type>* node);
    void level_order();                  // 层序遍历

    Node<Type>* find(const Element& x);                         // 递归查找
    Node<Type>* find(Node<Type>* node, const Element& x);   
    Node<Type>* iter_find(const Element& x);                    // 迭代查找

    bool insert(const Element& x);                              // 插入

    void visit(const Node<Type>* node) { std::cout << node->data.c << " "; }       // 辅助函数，用来输出结点
    void display();  // 辅助函数，用来输出树
    
private:
    Node<Type>* root;
};


template<typename Type>
inline void BinaryTree<Type>::inorder()
{
    inorder(root);
}

template<typename Type>
inline void BinaryTree<Type>::inorder(Node<Type>* node)
{
    if (node)
    {
        inorder(node->leftChild);
        visit(node);
        inorder(node->rightChild);
    }
}

template<typename Type>
inline void BinaryTree<Type>::preorder()
{
    preorder(root);
}

template<typename Type>
inline void BinaryTree<Type>::preorder(Node<Type>* node)
{
    if (node)
    {
        visit(node);
        preorder(node->leftChild);
        preorder(node->rightChild);
    }
}

template<typename Type>
inline void BinaryTree<Type>::postorder()
{
    postorder(root);
}

template<typename Type>
inline void BinaryTree<Type>::postorder(Node<Type>* node)
{
    if (node)
    {
        postorder(node->leftChild);
        postorder(node->rightChild);
        visit(node);
    }
}

template<typename Type>
inline void BinaryTree<Type>::level_order()
{
    std::queue<Node<Type>*> q;
    Node<Type>* node = root;
    while (node)
    {
        visit(node);
        if (node->leftChild) q.push(node->leftChild);
        if (node->rightChild) q.push(node->rightChild);
        if (q.empty()) break;
        node = q.front();
        q.pop();
    }
}


template<typename Type>
inline Node<Type>* BinaryTree<Type>::find(const Element& x)
{
    return find(root, x);
}

template<typename Type>
inline Node<Type>* BinaryTree<Type>::find(Node<Type>* node, const Element& x)
{
    if (!node) return nullptr;
    if (node->data == x) return node;
    if (x < node->data) return find(node->leftChild, x);
    else return find(node->rightChild, x);
}

template<typename Type>
inline Node<Type>* BinaryTree<Type>::iter_find(const Element& x)
{
    Node<Type>* node = root;
    while (node)
    {
        if(x == node->data) return node;
        if (x < node->data) node = node->leftChild;
        else node = node->rightChild;
    }
    return nullptr;
}

template<typename Type>
inline bool BinaryTree<Type>::insert(const Element& x)
{
    Node<Type>* p = root;
    Node<Type>* node = nullptr;    // 保存结点
    while (p)
    {
        node = p;
        if (x == p->data) return false;
        if (x < p->data) p = p->leftChild;
        else p = p->rightChild;
    }
    p = new Node<Type>;
    p->data = x;
    if (!root) root = p;
    else if (p->data < node->data) node->leftChild = p;
    else node->rightChild = p;
    return true;
}

template<typename Type>
inline void BinaryTree<Type>::display()
{
    std::cout << "\n";
    if (root) root->show(1);
    else std::cout << "Tree is empty!\n";
}

template<typename Type>
inline void Node<Type>::show(int i)
{
    printf("Position %2d: %c\n", i, this->data.c);
    if (this->leftChild) this->leftChild->show(2 * i);
    if (this->rightChild) this->rightChild->show(2 * i + 1);
}

测试代码：

#include <iostream>
#include "BinaryTree.h"

using namespace std;

int main()
{
    BinaryTree<int> tree;
    tree.insert(Element(5, 'A'));
    tree.insert(Element(4, 'B'));
    tree.insert(Element(8, 'C'));
    tree.insert(Element(2, 'D'));
    tree.insert(Element(6, 'E'));
    tree.insert(Element(9, 'F'));
    tree.insert(Element(1, 'G'));
    tree.insert(Element(3, 'H'));
    tree.insert(Element(7, 'I'));

    cout << "层序遍历："; tree.level_order(); cout << endl;
    cout << "前序遍历："; tree.preorder();   cout << endl;
    cout << "中序遍历："; tree.inorder();    cout << endl;
    cout << "后序遍历："; tree.postorder();  cout << endl;

    tree.display();

    cout << "\n递归查找的结果是：" << tree.find(Element(1))->data.c << endl;
    cout << "迭代查找的结果是：" << tree.iter_find(Element(1))->data.c << endl;

    return 0;
}

输出：

红黑树

由于基本的二叉查找树不能自动平衡，容易退化为链表，从而丧失查找的速度优势。因此需要对树实现自动平衡，如：红黑树(RedBlackTree)、平衡二叉搜索树（Self-balancing binary search tree, AVL树).

C++ STL中的set和map底层用的数据结构：红黑树

红黑树的5个性质：

每个结点要么是红的要么是黑的：enum COLOR {RED, BLACK};
根结点是黑的
每个叶子结点都是黑的
如果一个结点是红的，那么它的两个子结点都是黑的（不存在两个连续的红色节点）
对于任意结点而言，其到叶子结点(简化为树尾端的空指针)的每条路径都包含相同数目的黑结点

如何在不违反5条规则的前提下，平衡地插入或删除元素请参考：最容易懂得红黑树

红黑树的结点结构：

enum COLOR {RED, BLACK};

template<typename Type>
struct RBTreeNode  
{ 
	RBTreeNode  *parent;
    RBTreeNode  *left, *right;   
    Type data;  
    COLOR color;  
};

红黑树与AVL树的比较：

AVL 树的左右子树高度差不超过1，搜索的次数比红黑树少，当搜索的次数远远大于插入和删除时，选择 AVL树
相对于AVL树，红黑树牺牲了部分平衡性以换取插入、删除操作时少量的旋转操作。整体来说，性能要优于AVL树
红黑树的平均统计性能优于AVL树，实际应用中更多选择使用红黑树

因此，在需要大量插入和删除node的场景下，红黑树效率更高；而AVL由于高度平衡，search的效率更高，适用于插入删除node少、查询多的场景。

红黑树与B树的比较：

红黑树操作小数据，全部读进内存，树的高度可以很高，操作速度快
B树操作磁盘中的大文件，按需读入，尽量保证速度

C++中的平衡二叉树 map

map 即“键值对”容器，提供了“一对一”的映射关系. 其中存放的是 pair 对象，不允许有重复键值，构造时需要指定key与value的模板/类型, 容器内部自动按键值排序
multimap 则是“一对多”的多映射容器，这里不做介绍
pair 是具有两个模板参数的模板类，如 pair<string, int> s1("zhangsan", 25); 其具有两个属性 s1.first, s1.second

定义空map：

map<T1, T2> myMap;
map<T1, T2> newMap(myMap);

增加：

operator[]
insert() - make_pair()
insert() - pair()

map<string, int> myMap;
myMap.insert(pair<string, int>("zhangsan", 25));
myMap["lisi"] = 26;
myMap.insert(make_pair("wang",27));
myMap

自动排序：
容器内部自动按键值排序，见如下输出：

索引：

不建议使用 operator[] 进行索引，因为如果键值不存在，则会添加并设value=0，无法返回正确索引结果
查找元素是否存在则使用 count()方法，如果存在就返回1，否则返回0，无其他值
索引键值则使用 find() 方法，返回的是迭代器/pair指针变量，如果没找到就返回迭代器map.end()

auto idx = myMap.find("lee");
auto cnt = myMap.count("lee");
if(idx==myMap.end())
{
    cout << "count = " << cnt << endl;
    cout << "Not Found" << endl;
}
else
{
    cout << "count = " << cnt << endl;
    printf("key:value is (%s,%d)", idx->first.c_str(), idx->second);
}