04-10-2023, 06:39 AM
(This post was last modified: 04-10-2023, 02:57 PM by DSMan195276.)
This is quite the fun thread I tried my hand at throwing together a hash table implementation using a bucket approach, it mostly just contrasts with Luke's linear probing and tombstone approach. In practice I think the linear-probing approach can actually be better for performance (since it avoids the linked-list traversal on look-ups) but I enjoy the simplicity of using the buckets.
Interestingly your test program showcases the worst-case performance for Luke's code - by continually adding the same key to the hash table Luke's code has to do a very long linear-probe to find the next empty entry. It's an interesting contrast with the bucket approach I did because insertions with the buckets are always constant-time, so it handles this test quite easily.
In practice duplicate keys shouldn't even be allowed, if you modify
On to my actual code The global
It's not quite complete, and it's not well tested, but it seems to work well enough. A big issue is that HashTableNextEntry isn't good enough since it can't reuse deleted HashEntries() indexes (from HashTableDelete). I'd fix that by adding a free-list to reclaim unused HashEntries() indexes, but that's more work than I have time for tonight
Edit: I'd also mention, you can make use of
Interestingly your test program showcases the worst-case performance for Luke's code - by continually adding the same key to the hash table Luke's code has to do a very long linear-probe to find the next empty entry. It's an interesting contrast with the bucket approach I did because insertions with the buckets are always constant-time, so it handles this test quite easily.
In practice duplicate keys shouldn't even be allowed, if you modify
InsertLookupso that it checks if the key exists then Luke's code becomes significantly faster. It's not an entirely fair test though since Luke's hash table is no longer storing as many entries as the others are since it's discarding the duplicates ("HashTable" is my code):
On to my actual code The global
HashTable()array stores the roots for all the buckets, and then the buckets themselves are a linked-list created via indexes into the
HashEntries()array. The
HashEntries()array holds the values associated with each key, HashTableAdd and HashTableLookup just return an index which you use with
HashEntries()to access/modify the value for that key.
Code: (Select All)
Option _Explicit
$Console:Only
' Required Hash Table Globals
' A power-of-two representing the number of buckets in the hash table
Const HASH_TABLE_SIZE = 8
Type Value
' Put stuff in here
i As Long
End Type
Type HashTableEntry
nxt As Long
k As String
v As Value
End type
ReDim Shared HashEntries(1) As HashTableEntry
Dim Shared HashTableNextEntry As Long
HashTableNextEntry = 0
ReDim Shared HashTable(_Shl(1, HASH_TABLE_SIZE)) As Long
' Test code
'
Dim i As Long, cur As Long
' Add a bunch of entries
For i = 1 to 20000
cur = HashTableAdd&(Space$(i) + Chr$(i And 255))
HashEntries(cur).v.i = i
Next
cur = HashTableAdd&("foobar")
HashEntries(cur).v.i = 600
cur = HashTableAdd&("foobar2")
HashEntries(cur).v.i = 601
cur = HashTableAdd&("k")
HashEntries(cur).v.i = 602
cur = HashTableAdd&("j")
HashEntries(cur).v.i = 603
' An example lookup in the hash table
'
' Should print 4, since it looks up the 4 spaces entry created
' by the loop a few lines above this
Print "Lookup result: "; HashTableLookup&(" " + Chr$(4))
Print "V from lookup: "; HashEntries(HashTableLookup&(" " + Chr$(4))).v.i
System
'
' Hash Table Functions
'
Function HashTableHash&(k As String)
Dim i As Long, hash As Long
For i = 1 To Len(k)
hash = hash * 3 + Asc(k, i)
Next
' Chop off the high bits so this indexes the hash table
HashTableHash& = hash And (_Shl(1, HASH_TABLE_SIZE) - 1)
End Function
Function HashTableAdd&(k As String)
Dim hash As Long, prev As Long
hash = HashTableHash&(k)
prev = HashTable(hash)
' Create a new link and place it in the bucket for this hash value
HashTableNextEntry = HashTableNextEntry + 1
If HashTableNextEntry > UBOUND(HashEntries) Then ReDim _Preserve HashEntries(UBOUND(HashEntries) * 2) As HashTableEntry
HashEntries(HashTableNextEntry).nxt = prev
HashEntries(HashTableNextEntry).k = k
HashTable(hash) = HashTableNextEntry
HashTableAdd& = HashTableNextEntry
End Sub
' Returns 0 if k does not exist in the hash table, otherwise returns an index
' into HashEntries() for this key
Function HashTableLookup&(k As String)
Dim hash As Long, cur As Long
hash = HashTableHash&(k)
cur = HashTable(hash)
While cur <> 0
If HashEntries(cur).k = k Then
HashTableLookup& = cur
Exit Function
End If
cur = HashEntries(cur).nxt
Wend
HashTableLookup& = 0
End Sub
Sub HashTableDelete(k As String)
Dim hash As Long, cur As Long, prev As Long
hash = HashTableHash&(k)
cur = HashTable(hash)
While cur <> 0
If HashEntries(cur).k = k Then Exit While
prev = cur
cur = HashEntries(cur).nxt
Wend
If cur = 0 Then Exit Sub
' Remove link
If prev = 0 Then
' Directly attached to bucket
HashTable(hash) = HashEntries(cur).nxt
Else
HashEntries(prev).nxt = HashEntries(cur).nxt
End If
End Sub
It's not quite complete, and it's not well tested, but it seems to work well enough. A big issue is that HashTableNextEntry isn't good enough since it can't reuse deleted HashEntries() indexes (from HashTableDelete). I'd fix that by adding a free-list to reclaim unused HashEntries() indexes, but that's more work than I have time for tonight
Edit: I'd also mention, you can make use of
_Memto remove the need for the global arrays, I wrote an implementation that does that. Unfortunately though it's much more annoying to work with since you have to copy the
Valueobjects in and out of the structure (rather than just access
HashEntriesdirectly). I suppose you could give the
_Memback and let the caller access it that way but that still stinks. But that's what we get for not having pointers I suppose