An hash array dictonary step by step
#20
This is quite the fun thread Smile I tried my hand at throwing together a hash table implementation using a bucket approach, it mostly just contrasts with Luke's linear probing and tombstone approach. In practice I think the linear-probing approach can actually be better for performance (since it avoids the linked-list traversal on look-ups) but I enjoy the simplicity of using the buckets.

Interestingly your test program showcases the worst-case performance for Luke's code - by continually adding the same key to the hash table Luke's code has to do a very long linear-probe to find the next empty entry. It's an interesting contrast with the bucket approach I did because insertions with the buckets are always constant-time, so it handles this test quite easily.

In practice duplicate keys shouldn't even be allowed, if you modify
InsertLookup
so that it checks if the key exists then Luke's code becomes significantly faster. It's not an entirely fair test though since Luke's hash table is no longer storing as many entries as the others are since it's discarding the duplicates ("HashTable" is my code):

[Image: hashtable.png]

On to my actual code Big Grin  The global
HashTable()
array stores the roots for all the buckets, and then the buckets themselves are a linked-list created via indexes into the
HashEntries()
array. The
HashEntries()
array holds the values associated with each key, HashTableAdd and HashTableLookup just return an index which you use with
HashEntries()
to access/modify the value for that key.

Code: (Select All)
Option _Explicit
$Console:Only

' Required Hash Table Globals

' A power-of-two representing the number of buckets in the hash table
Const HASH_TABLE_SIZE = 8

Type Value
    ' Put stuff in here
    i As Long
End Type

Type HashTableEntry
    nxt As Long
    k As String
    v As Value
End type

ReDim Shared HashEntries(1) As HashTableEntry
Dim Shared HashTableNextEntry As Long

HashTableNextEntry = 0

ReDim Shared HashTable(_Shl(1, HASH_TABLE_SIZE)) As Long



' Test code
'
Dim i As Long, cur As Long

' Add a bunch of entries
For i = 1 to 20000
    cur = HashTableAdd&(Space$(i) + Chr$(i And 255))
    HashEntries(cur).v.i = i
Next

cur = HashTableAdd&("foobar")
HashEntries(cur).v.i = 600

cur = HashTableAdd&("foobar2")
HashEntries(cur).v.i = 601

cur = HashTableAdd&("k")
HashEntries(cur).v.i = 602

cur = HashTableAdd&("j")
HashEntries(cur).v.i = 603

' An example lookup in the hash table
'
' Should print 4, since it looks up the 4 spaces entry created
' by the loop a few lines above this
Print "Lookup result: "; HashTableLookup&("    " + Chr$(4))
Print "V from lookup: "; HashEntries(HashTableLookup&("    " + Chr$(4))).v.i

System

'
' Hash Table Functions
'

Function HashTableHash&(k As String)
    Dim i As Long, hash As Long
    For i = 1 To Len(k)
        hash = hash * 3 + Asc(k, i)
    Next

    ' Chop off the high bits so this indexes the hash table
    HashTableHash& = hash And (_Shl(1, HASH_TABLE_SIZE) - 1)
End Function

Function HashTableAdd&(k As String)
    Dim hash As Long, prev As Long
    hash = HashTableHash&(k)

    prev = HashTable(hash)

    ' Create a new link and place it in the bucket for this hash value
    HashTableNextEntry = HashTableNextEntry + 1
    If HashTableNextEntry > UBOUND(HashEntries) Then ReDim _Preserve HashEntries(UBOUND(HashEntries) * 2) As HashTableEntry

    HashEntries(HashTableNextEntry).nxt = prev
    HashEntries(HashTableNextEntry).k = k

    HashTable(hash) = HashTableNextEntry
    HashTableAdd& = HashTableNextEntry
End Sub

' Returns 0 if k does not exist in the hash table, otherwise returns an index
' into HashEntries() for this key
Function HashTableLookup&(k As String)
    Dim hash As Long, cur As Long
    hash = HashTableHash&(k)

    cur = HashTable(hash)

    While cur <> 0
        If HashEntries(cur).k = k Then
            HashTableLookup& = cur
            Exit Function
        End If

        cur = HashEntries(cur).nxt
    Wend

    HashTableLookup& = 0
End Sub

Sub HashTableDelete(k As String)
    Dim hash As Long, cur As Long, prev As Long
    hash = HashTableHash&(k)

    cur = HashTable(hash)

    While cur <> 0
        If HashEntries(cur).k = k Then Exit While

        prev = cur
        cur = HashEntries(cur).nxt
    Wend

    If cur = 0 Then Exit Sub

    ' Remove link
    If prev = 0 Then
        ' Directly attached to bucket
        HashTable(hash) = HashEntries(cur).nxt
    Else
        HashEntries(prev).nxt = HashEntries(cur).nxt
    End If
End Sub

It's not quite complete, and it's not well tested, but it seems to work well enough. A big issue is that HashTableNextEntry isn't good enough since it can't reuse deleted HashEntries() indexes (from HashTableDelete). I'd fix that by adding a free-list to reclaim unused HashEntries() indexes, but that's more work than I have time for tonight Big Grin

Edit: I'd also mention, you can make use of
_Mem
to remove the need for the global arrays, I wrote an implementation that does that. Unfortunately though it's much more annoying to work with since you have to copy the
Value
objects in and out of the structure (rather than just access
HashEntries
directly). I suppose you could give the
_Mem
back and let the caller access it that way but that still stinks. But that's what we get for not having pointers I suppose Big Grin
Reply


Messages In This Thread
RE: An hash array dictonary step by step - by DSMan195276 - 04-10-2023, 06:39 AM



Users browsing this thread: 19 Guest(s)