Question 1 Intro Trie: How to design a dictionary

1. What is the requirement of this dictionary?

search(word)
delete
add

2. Options for data structures

n- # of words

m - the average length of the string/word

HashMap.contain(target)
- hashCode: O(m)
- string compare: O(m)
Binary Search Tree (Treeset)
ArrayList(sorted, unsorted)
Trie

What is Trie

Retrieve, prefixTrie,
a data structure, more precisely, a search tree

Tries are different from binary search tree

a key is associated with a "path from the root" to a node, instead of associated with a node(说白了就是你的字母是在边上，所以单词在path上)
the edges from a node represent distinct character

The path from the root to each node is a prefix of a word in the dictionary,因此也叫前缀树

Trie class

class TrieNode {
    // Map a character to its corresponding child
    // 以前的tree child对应的是一个个点，这次是以character作为1，2，3
    Map<Character, TrieNode> children;
    boolean isWord; //这个点是不是作为结束
    int value; // optional. a set doesn't nee this field
}


class TrieNode {
    // an array of size 26, each index --> character
    TrieNode[] children;
    boolean isWord; //这个点是不是作为结束
    int value; // optional. a set doesn't nee this field
}

TC& SC

Time: worst case guarantee O(m)

Space: worst case O(mn), but usually much better than this. - if the trie is dense, since reusing the common prefix as many as possible, less space is required

Drawbacks

Trie's advanced drawbacks(在硬盘里常有问题)

time - if stored on disk, more random disk accesses(very expensive)

space - every trieNode has extra space consumption--> extra space usage - memory allocation fragmentation, especially when the trie is sparse

Use Case

string or sequence type elements
fast worst-case search/add/delete
grouping common prefixes, both for time/space efficiency
especially when the trie to prefix/matching

Base Operations: 1. search "cat" in the trie

// finding the path from root which is equal to 'cat'
// for reach of the characters in "cat", see if there is an edge associated with it for the next level
public boolean search(String word) {
    TrieNode cur = root;
    for (int i = 0; i < word.length(); i++) {
        TrieNode next = cur.children.get(word.charAt(i));
        if (next == null) return false;
        cur = next;
    }
    return cur.isWord;
}

// Time: O(word.length)

Base Operations: 2. add "carp" into the trie

// finding the path from root which is equal to "carp"
// for each of the character in "carp", see if there is an edge associated with it for the next level
// 1. if there is an edge, go to the corresponding child node
// 2. if there is not, create the child node

public boolean insert(String word) {
    if (search(word)) {
        return false;
    }
    TrieNode cur = root;
    for (int i = 0; i < word.length(); i++) {
        TrieNode next = cur.children.get(word.charAt(i));
        if (next == null) {
            next = new TrieNode();
            cur.children.put(word.charAt(i), next);
        }
        cur = next;
        // cur.count++;
    }
    cur.isWord = true;
    return true;
}
// TC: O(word.length)

Base Operations: 3. delete “apple” from the trie

Solution 1:

Store the path from root to the to-delete node in a stack
Mark the last node as not word
When popping a node form the stack, erase the node if it is not a word and has no children

Solution 2:

所以你再遍历你这个word，然后呢，每次看这接下来有几个单词，如果只剩一个了，剩下的全删了


class TrieNode{
    int count; // how many word are on this subtree(inclusive)
    Map<Character, TrieNode> children;
    boolean isWord; // indicate if the node is the end of the word
}


public boolean delete(String word) {
    if (!search(word)) {
        return false;
    }
    root.count--; //你每一次都需要delete
    TrieNode cur = root;
    for (int i = 0; i < word.length(); i++) {
        TrieNode next = cur.children.get(word.charAt(i));
        next.count--;
        if (next.count == 0) {
            cur.children.remove(word.charAt(i));
            return true;
        }
        cur = next;
    }
    cur.isWord = false; // e.g. app
    return true;
}

// Time O(word.length)

Base Operations: 4. find all the words in a given prefix

for example find all with prefix "ca"

step 1: find the node x associated with prefix "ca"
step 2: do DFS on the subtree rooted at x to find all words

Option 1:偷懒就是在这个点上已经存储了，否则你要存path


// option 1
class TrieNode {
    // Maps a character to its corresponding child
    Map<Character, TrieNode> children;
    boolean isWord;
    int value; // optional, a set doesn't need this field
    string word;
}

public void findAllWordsUnderNode(TrieNode current, List<String> result) {
    if (current.isWord) { // this is when we find a word.
        result.add(current.word);
        // shall we return here? No, children可能还有word
    }
    // find all branches
    for (Entry<Character, TrieNode> child: current.children.EntrySet()) {
        findAllWordsUnderNode(child.getValue(), result);
    }
}

Option 2:你就是要记录这个stringBuilder

public List<String> findAllWordsWithPrefix(TrieNode root, String prefix) {
    TrieNode matchNode = searchNode(root, prefix); // 你要先找到前缀
    if (matchNode == null) {
        return result;
    }
    List<String> result = new ArrayList<>();
    findAllWordsUnderNode(matchNode, new StringBuilder(prefix), result);
    return result;
}

public void findAllWordsUnderNode(TrieNode current, StringBuilder curPath, List<String> result) {
    if (current.isWord) { // this is  when we find a word
        result.add(curPath.toString());// we should not return here, children might have word
    }
    // all branches
    for (Entry<Character, TrieNode> child: current.children.EntrySet()) {
        curPath.append(child.getKey());
        findAllWordsUnderNode(child.getValue(), curPath, result);
        curPath.deleteCharAt(curPath.length() - 1);
    }
}

TC * SC: O(search) + O(# of result words) = O(height) + O(# of result words)

It can be extended to support more operations, what other possible queries can it support?

Wildcard expression match queries
regular expression match queries
words winthin certain edit distance

e.g. Wildcard

"?" means single character. //.
"*" means >= 0 arbitrary characters. //>=0

PreviousIntro NextQuestion 2 World Search I& II

Last updated 1 year ago