Question 1 Intro Trie: How to design a dictionary

1. What is the requirement of this dictionary?

  • search(word)

  • delete

  • add

2. Options for data structures

n- # of words

m - the average length of the string/word

  • HashMap.contain(target)

    • hashCode: O(m)

    • string compare: O(m)

  • Binary Search Tree (Treeset)

  • ArrayList(sorted, unsorted)

  • Trie

What is Trie

  • Retrieve, prefixTrie,

  • a data structure, more precisely, a search tree

Tries are different from binary search tree

  • a key is associated with a "path from the root" to a node, instead of associated with a node(说白了就是你的字母是在边上,所以单词在path上)

  • the edges from a node represent distinct character

The path from the root to each node is a prefix of a word in the dictionary,因此也叫前缀树

Trie class

TC& SC

Time: worst case guarantee O(m)

Space: worst case O(mn), but usually much better than this. - if the trie is dense, since reusing the common prefix as many as possible, less space is required

Drawbacks

Trie's advanced drawbacks(在硬盘里常有问题)

time - if stored on disk, more random disk accesses(very expensive)

space - every trieNode has extra space consumption--> extra space usage - memory allocation fragmentation, especially when the trie is sparse

Use Case

  • string or sequence type elements

  • fast worst-case search/add/delete

  • grouping common prefixes, both for time/space efficiency

  • especially when the trie to prefix/matching

Base Operations: 1. search "cat" in the trie

Base Operations: 2. add "carp" into the trie

Base Operations: 3. delete “apple” from the trie

Solution 1:

  • Store the path from root to the to-delete node in a stack

  • Mark the last node as not word

  • When popping a node form the stack, erase the node if it is not a word and has no children

Solution 2:

  • 所以你再遍历你这个word,然后呢,每次看这接下来有几个单词,如果只剩一个了,剩下的全删了

Base Operations: 4. find all the words in a given prefix

for example find all with prefix "ca"

  • step 1: find the node x associated with prefix "ca"

  • step 2: do DFS on the subtree rooted at x to find all words

Option 1:偷懒就是在这个点上已经存储了,否则你要存path

Option 2:你就是要记录这个stringBuilder

TC * SC: O(search) + O(# of result words) = O(height) + O(# of result words)

It can be extended to support more operations, what other possible queries can it support?

  • Wildcard expression match queries

  • regular expression match queries

  • words winthin certain edit distance

e.g. Wildcard

  • "?" means single character. //.

  • "*" means >= 0 arbitrary characters. //>=0

Last updated