05-12-2022, 01:34 PM
Starting from the top of the function list and working our way down slowly, let's begin with the BinarySearch routines:
There's 4 of these routines in the library, but I'm just going to demo one of them, and then I'll explain what the rest do, as they're all similar in functionality.
BinaryMemSearch
BinaryMemSearchSome
BinaryStringSearch
BinaryStringSearchSome
What these four functions do is simply implement a quick binary search method for sorted data, such as the wordlist which I'm using here. (Grab it from the common resource files download here, if you need it: Samples and Resources (qb64phoenix.com))
With the above, we load a word list which has 370,099 words in it. We then ask the user to give us a word, and we search the wordlist to see if the word exists in it, or not...
Now, one way we could do this search is to simply start at the top of the list, and then move one entry at a time, comparing to see if our word matches any in the list. A simple FOR i = 0 to 370099: IF search$ = Words(i) Then Exit For: Next would do that job...
BUT... That means we'd end up having to check 370099 words to see if we had a match or not!! Can you imagine doing this for a whole list of say 1000 words, and checking to see if they're in the dictionary, or not? How long would all that take to process for us??
Instead of going such a slow route to find our answer, we instead do a binary search pattern to look for our word's place in the list. We start in the middle of the list... If that word is less than our search word, we then can ignore the whole first half of the list. If that word is greater than our search word, we can then ignore the whole second left of the list. Divide the search area by half every pass!
2 ^ 19 = 524,288... We can search 524,288 items in 19 or fewer passes, if we divide the search area in half with every pass! And that makes a huge difference in how long it takes to find your index for an item, when you're doing the search multiple times. (19,000 max passes to find out if the 1000 words in the list I mentioned above are in our dictionary, rather than 370,099,000 passes if we check for the existence against every word in the list.)
I think everyone should be able to see the performance gain in the differences described above.
The differences for the four routines break down to simply matching our data types. For anything that we can point a _MEM variable at, we can use the BinaryMem searching. Since variable length strings don't work with _MEM, we have the BinaryString search routines.
The SearchSome routines let us specify a range of our data set that we want to search for the result, rather than having to search the whole list. For example, if we know "A" words start at 0 and end at 50,000 entries, we could BinaryStringSearchSome(search$ , Array(), 0, 50000), and not have to worry about the rest of the word list at all.
Binary Searching of just about anything we need to do a search with! Just be certain your data array is sorted properly before using these, or else you're not going to get the result you're looking for.
Code: (Select All)
'$INCLUDE:'SET.BI'
Dim Words(370099) As String
Open "word lists\370099 Word List.txt" For Binary As #1
Print "Attempting to load 370,099 words into memory..."
Do Until EOF(1)
Line Input #1, Words(count)
count = count + 1
Loop
Print "Successfully Loaded"; count; "words into memory."
Print
Do
Input "Give me a word =>", word$
If word$ = "" Then System
search$ = LCase$(word$)
position = BinaryStringSearch(search$, Words())
If position >= LBound(Words) Then
Print "Your word was found in position #"; position
If position > LBound(Words) Then Print position - 1, Words(position - 1)
Print position, Words(position)
If position < UBound(Words) Then Print position + 1, Words(position + 1)
Else
Print "Your word was not found in this data set."; position
Print "If it would have been found, it would have been between:"
Print LastIndex, Words(LastIndex)
Print LastIndex, Words(LastIndex + 1)
End If
Loop
'$INCLUDE:'SET.BM'
There's 4 of these routines in the library, but I'm just going to demo one of them, and then I'll explain what the rest do, as they're all similar in functionality.
BinaryMemSearch
BinaryMemSearchSome
BinaryStringSearch
BinaryStringSearchSome
What these four functions do is simply implement a quick binary search method for sorted data, such as the wordlist which I'm using here. (Grab it from the common resource files download here, if you need it: Samples and Resources (qb64phoenix.com))
With the above, we load a word list which has 370,099 words in it. We then ask the user to give us a word, and we search the wordlist to see if the word exists in it, or not...
Now, one way we could do this search is to simply start at the top of the list, and then move one entry at a time, comparing to see if our word matches any in the list. A simple FOR i = 0 to 370099: IF search$ = Words(i) Then Exit For: Next would do that job...
BUT... That means we'd end up having to check 370099 words to see if we had a match or not!! Can you imagine doing this for a whole list of say 1000 words, and checking to see if they're in the dictionary, or not? How long would all that take to process for us??
Instead of going such a slow route to find our answer, we instead do a binary search pattern to look for our word's place in the list. We start in the middle of the list... If that word is less than our search word, we then can ignore the whole first half of the list. If that word is greater than our search word, we can then ignore the whole second left of the list. Divide the search area by half every pass!
2 ^ 19 = 524,288... We can search 524,288 items in 19 or fewer passes, if we divide the search area in half with every pass! And that makes a huge difference in how long it takes to find your index for an item, when you're doing the search multiple times. (19,000 max passes to find out if the 1000 words in the list I mentioned above are in our dictionary, rather than 370,099,000 passes if we check for the existence against every word in the list.)
I think everyone should be able to see the performance gain in the differences described above.
The differences for the four routines break down to simply matching our data types. For anything that we can point a _MEM variable at, we can use the BinaryMem searching. Since variable length strings don't work with _MEM, we have the BinaryString search routines.
The SearchSome routines let us specify a range of our data set that we want to search for the result, rather than having to search the whole list. For example, if we know "A" words start at 0 and end at 50,000 entries, we could BinaryStringSearchSome(search$ , Array(), 0, 50000), and not have to worry about the rest of the word list at all.
Binary Searching of just about anything we need to do a search with! Just be certain your data array is sorted properly before using these, or else you're not going to get the result you're looking for.