Dictionaries
Dictionaries represent a sophisticated data
structure that allows you to access an element based on a key.
Dictionaries are also known as hash tables or maps. The main
feature of dictionaries is fast lookup based on keys. You can also
add and remove items freely, a bit like a List<T>, but without the performance overhead
of having to shift subsequent items in memory.
Figure 10-5
shows a simplified representation of a dictionary. Here
employee-ids such as B4711 are the keys
added to the dictionary. The key is transformed into a hash. With
the hash a number is created to associate an index with the values.
The index then contains a link to the value. The figure is
simplified because it is possible that a single index entry can be
associated with multiple values, and the index can be stored as a
tree.
The .NET Framework offers several dictionary
classes. The main class you can use is Dictionary<TKey, TValue>. This class offers nearly the same
properties and methods as SortedList<TKey,
TValue> discussed earlier; that’s why they are not
repeated here.
Key Type
A type that is used as a key in the
dictionary must override the method GetHashCode() of the Object class. Whenever a dictionary class needs to
work out where an item should be located, it calls the GetHashCode() method. The int that is returned by GetHashCode() is used by the dictionary to calculate
an index where to place the element. We don’t go into this part of
the algorithm. What you should know is that it involves prime
numbers, so the capacity of a dictionary is a prime number.
The implementation of GetHashCode() must satisfy these requirements:
-
The same object should always return the same
value.
-
Different objects can return the same
value.
-
It should execute as quickly as possible; it
must be inexpensive to compute.
-
It must not throw exceptions.
-
It should use at least one instance
field.
-
The hash code value should be evenly
distributed across the entire range of numbers that an int can store.
-
At best, the hash code should not change
during the lifetime of the object.
|
|
Important |
A good performance of the dictionary is based
on a good implementation of the method GetHashCode().
|
What’s the reason for having hash code values
evenly distributed across the range of integers? If two keys return
hashes that give the same index, the dictionary class needs to
start looking for the nearest available free location to store the
second item - and will have to do some searching in order to
retrieve this item later on. This is obviously going to hurt
performance, and clearly, if lots of your keys are tending to give
the same indexes for where they should be stored, this kind of
clash becomes more likely. However, because of the way that
Microsoft’s part of the algorithm works, this risk is minimized
when the calculated hash values are evenly distributed between
int.MinValue and int.MaxValue.
Besides having an implementation of GetHashCode(), the key type also must implement the
IEquality.Equals() method or override
the Equals() method from the
Object class. Because different key objects may return
the same hash code, the method Equals()
is used by the dictionary comparing keys. The dictionary examines
if two keys A and B are equal; it invokes A.Equals(B). This means that you must ensure that
the following is always true:
|
|
Important |
If A.Equals(B) is
true, then A.GetHashCode() and
B.GetHashCode() must always return the
same hash code.
|
This probably seems a fairly subtle point, but it
is crucial. If you contrived some way of overriding these methods
so that the preceding statement was not always true, a dictionary
that uses instances of this class as its keys would simply not work
properly. Instead, you’d find funny things happening. For example,
you might place an object in the dictionary and then discover that
you could never retrieve it, or you might try to retrieve an entry
and have the wrong entry returned.
|
|
Tip |
For this reason, the C# compiler will display
a compilation warning if you supply an override for Equals() but don’t supply an override for
GetHashCode().
|
For System.Object this
condition is true, because Equals()
simply compares references, and GetHashCode() actually returns a hash that is based
solely on the address of the object. This means that hash tables
based on a key that doesn’t override these methods will work
correctly. However, the problem with this way of doing things is
that keys are regarded as equal only if they are the same object.
That means that when you place an object in the dictionary, you
then have to hang onto the reference to the key. You can’t simply
instantiate another key object later with the same value. If you
don’t override Equals() and GetHashCode(), the type is not very convenient to
use in a dictionary.
Incidentally, System.String implements the interface IEquatable and overloads GetHashCode() appropriately. Equals() provides value comparison, and GetHashCode() returns a hash based on the value of
the string. Strings can be used conveniently as keys in
dictionaries.
Number types such as Int32 also implement the interface IEquatable and overload GetHashCode(). However, the hash code returned by
these types simply maps to the value. If the number you would like
to use as a key is not itself distributed around the possible
values of an integer, using integers as keys doesn’t fulfill the
rule of evenly distributing key values to get the best performance.
Int32 is not meant to be used in a
dictionary.
If you need to use a key type that does not
implement IEquatable and override
GetHashCode according to the key values
you store in the dictionary, you can create a comparer implementing
the interface IEqualityComparer<T>. IEqualityComparer<T> defines the methods
GetHashCode() and Equals() with an argument of the object passed, so
you can offer an implementation different from the object type
itself. An overload of the Dictionary<TKey,
TValue> constructor allows passing an object implementing
IEqualityComparer<T>. If such an
object is assigned to the dictionary, this class is used to
generate the hash codes and compare the keys.
Dictionary Example
The dictionary example is a program that sets
up a dictionary of employees. The dictionary is indexed by
EmployeeId objects, and each item stored
in the dictionary is an Employee object
that stores details of an employee.
The class EmployeeId is
implemented to define a key to be used in a dictionary. The members
of the class are a prefix character and a number for the employee.
Both of these variables are read-only and can only be initialized
in the constructor. A key within the dictionary shouldn’t change,
and this way that is guaranteed. The fields are filled within the
constructor. The ToString() method is
overloaded to get a string representation of the employee ID. As
required for a key type, EmployeeId
implements the interface IEquatable and
overloads the method GetHashCode().
The Equals() method that
is defined by the IEquatable<T>
interface compares the values of two EmployeeId objects and returns true if the both values are the same. Instead of
implementing the Equals() method from
the IEquatable<T> interface, you
can also override the Equals() method
from the Object class:
With the number variable, a value from 1 to around
190000 is expected for the employees. This doesn’t fill the range
of an integer. The algorithm used by GetHashCode() shifts the number 16 bits to the left,
then does a xor with the
original number, and finally multiplies the result by the hex value
15051505. The hash code is fairly distributed across the range of
an integer.
|
|
Tip |
On the Internet you can find a lot of more
complex algorithms that have a better distribution across the
integer range. You can also use the GetHashCode() method of a string to return a
hash.
|
The Employee class is a
simple entity class containing the name, salary, and ID of the
employee. The constructor initializes all values, and the method
ToString() returns a string
representation of an instance. The implementation of ToString() uses the StringBuilder object to create the string
representation for performance reasons.
In the Main() method of
the sample application, a new Dictionary<TKey, TValue> instance is created,
where the key is of type EmployeeId and
the value is of type Employee. The
constructor allocates a capacity of 31 elements. Remember, the
capacity based on prime numbers. However, when you assign a value
that is not a prime number, you don’t need to worry. The
Dictionary<TKey, TValue> class
itself takes the next prime number that follows the integer passed
to the constructor to allocate the capacity. The employee objects
and IDs are created and added to the dictionary with the
Add() method. Instead of using the
Add() method, you can also use the
indexer to add keys and values to the dictionary, as shown with the
employees Jeff and Casey:
After the entries are added to the dictionary,
inside a while loop employees are read
from the dictionary. The user is asked to enter an employee number
to store the number in the variable userInput. The user can exit the application by
entering X. If the key is in the dictionary, it is examined with
the TryGetValue() method of the
Dictionary<TKey, TValue> class.
TryGetValue() returns true if the key is found and false otherwise. If the value is found, the value
associated with the key is stored in the employee variable. This
value is written to the console.
|
|
Tip |
You can also use an indexer of the
Dictionary<TKey, TValue> class
instead of TryGetValue() to access a
value stored in the dictionary. However, if the key is not found,
the indexer throws an exception of type KeyNotFoundException.
|
Running the application produces the following
output:
Other Dictionary Classes
Dictionary<TKey,
TValue> is the major dictionary class from the framework.
There are some more classes, and of course there are also some
nongeneric dictionary classes.
Dictionaries that are based on the Object type and are available since .NET 1.0 are
described in the following table.
Since .NET 2.0 generic dictionaries are preferred
over object-based dictionaries:
SortedDictionary<TKey,
TValue> and SortedList<TKey,
TValue> have similar functionality. But because
SortedList<TKey, TValue> is
implemented as a list and SortedDictionary<TKey, TValue> is implemented as a dictionary, the
classes have different characteristics.
-
SortedList<TKey,
TValue> uses less memory than SortedDictionary<TKey, TValue>.
-
SortedDictionary<TKey,
TValue> has faster insertion and removal of elements.
-
When populating the collection with already
sorted data, SortedList<TKey,
TValue> is faster, if capacity changes are not
needed.
|
|
Important |
SortedList consumes less memory than
SortedDictionary. SortedDictionary is
faster for inserts and the removal of unsorted data.
|
|