Just ran some timed tests and I'm a little surprised by the results!
MemLineInput times is about 1.5 seconds.
OPEN FOR BINARY and then LINE INPUT is about 2.8 seconds.
OPEN FOR BINARY and then manually PARSE the data is about 0.05 seconds.
It's still a lot faster to load and parse than any other method, but I suppose it kind of makes sense after you think about it. By writing a generic routine, we have to check for multiple things (are the line endings CHR$(10) or CHR$(13) or CHR$(13) + CHR$(10)?? We need to check and account for all of those, along with some basic error checks.) Here, when I parse these, I'm already using 13+10 as the file endings and moving my pointer two spots for each CRLF.
I figured the times would be closer than that, but LOAD then PARSE is still the winner in terms of absolute time it takes to get something done. Still though, I'm pretty happy with the results where we load and read the file from memory with LINE INPUT about twice as fast as we load and read from disk in BINARY mode. We hold true to the syntax that a beginner quickly learns and uses, with a nice boost in speed and performance for them -- and we save our SSDs from repeated read/write calls to them.
Maybe I can tweak things here and close that gap for us with the speed somewhat. Instead of checking constantly for various CRLF endings, I could read for the first one and then assign it as the default from there on out, and skip a whole bunch of the IF type decision checks for us.
I'll dig into that later, after I find something around here to eat decent for lunch.
Edit: Included code in case anyone wants to test on their own:
MemLineInput times is about 1.5 seconds.
OPEN FOR BINARY and then LINE INPUT is about 2.8 seconds.
OPEN FOR BINARY and then manually PARSE the data is about 0.05 seconds.
It's still a lot faster to load and parse than any other method, but I suppose it kind of makes sense after you think about it. By writing a generic routine, we have to check for multiple things (are the line endings CHR$(10) or CHR$(13) or CHR$(13) + CHR$(10)?? We need to check and account for all of those, along with some basic error checks.) Here, when I parse these, I'm already using 13+10 as the file endings and moving my pointer two spots for each CRLF.
I figured the times would be closer than that, but LOAD then PARSE is still the winner in terms of absolute time it takes to get something done. Still though, I'm pretty happy with the results where we load and read the file from memory with LINE INPUT about twice as fast as we load and read from disk in BINARY mode. We hold true to the syntax that a beginner quickly learns and uses, with a nice boost in speed and performance for them -- and we save our SSDs from repeated read/write calls to them.
Maybe I can tweak things here and close that gap for us with the speed somewhat. Instead of checking constantly for various CRLF endings, I could read for the first one and then assign it as the default from there on out, and skip a whole bunch of the IF type decision checks for us.
I'll dig into that later, after I find something around here to eat decent for lunch.
Edit: Included code in case anyone wants to test on their own:
Code: (Select All)
Type Mem_File_Type
inUse As Integer
EOF_Marker As _Offset
Current_Pos As _Offset
Content As _MEM
End Type
Dim Shared MemFile(1 To 100) As Mem_File_Type
'BI HEADER INFO BEFORE THIS
Screen _NewImage(800, 600, 32)
Dim As String wordlist(466544), wordlist2(466544), wordlist3(466544), wordlist4(466545) 'arrays to hold the data
'MEM FILE INPUT
handle = MemFileLoad("466544 Word List.txt", 0) 'load a file directly into memory, and it's not compressed
t## = Timer ' timer to see how long we take loading this data
Do Until MemEOF(handle) ' Hopefully these lines will be intuitive enough.
count = count + 1 ' Especially when compared to the notes above
MemLineInput handle, wordlist(count) ' and the preceeding lines after
Loop
Print count; Using " words loaded into memory from file, in ##.#### seconds."; Timer - t##
MemFileClose handle
'OPEN FILE FOR BINARY
Open "466544 Word List.txt" For Binary As #1
t## = Timer
Do Until EOF(1)
count3 = count3 + 1
Line Input #1, wordlist3(count3)
Loop
Print count3; Using " words loaded from file OPEN FOR BINARY with LINE INPUT, in ##.#### seconds."; Timer - t##
Close 1
'OPEN FILE FOR BINARY AND PARSE
Open "466544 Word List.txt" For Binary As #1
temp$ = Space$(LOF(1))
Get #1, 1, temp$
Close 1
t## = Timer
p = 1: CRLF$ = Chr$(13) + Chr$(10)
Do
count4 = count4 + 1
l = InStr(p, temp$, CRLF$)
wordlist4(count4) = Mid$(temp$, p, l - p)
p = l + 2 'move the pointer by the length and 2 more for windows CRLF data
Loop Until l = 0
Print count4; Using " words loaded from file OPEN FOR BINARY then PARSED, in ##.#### seconds."; Timer - t##
'and let's compare contents to be safe
For i = 1 To count
If wordlist(i) <> wordlist4(i) Then Print "Wordlist does not match Wordlist4": failed = -1
If wordlist(i) <> wordlist3(i) Then Print "Wordlist does not match Wordlist3": failed = -1
Next
If failed Then
Print "Lists do not match"
Else
Print "Lists match each other perfectly"
End If
'BM FOOTER AFTER THIS
Sub MemFileDump (memfile, file$, compressed) 'just one quick call to save to disk and free the memory all at once.
MemFileSave memfile, file$, compressed
MemFileClose memfile
End Sub
Sub MemFileSave (memfile, file$, compressed)
If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
temphandle = FreeFile
Dim As _Offset length
length = MemFile(memfile).EOF_Marker + 1
temp$ = Space$(length)
$Checking:Off
_MemGet MemFile(memfile).Content, MemFile(memfile).Content.OFFSET, temp$
$Checking:On
If compressed Then temp1$ = _Deflate$(temp$) Else temp1$ = temp$
Open file$ For Output As temphandle: Close temphandle 'erase any existing file with the same name
Open file$ For Binary As temphandle
Put #temphandle, 1, temp1$
Close
End Sub
Function MemFileLoad (file$, compressed)
'Error codes for MemFileLoad
'1: No mem files available. (All 100 are in use! Free some to use more!)
For i = 1 To 100
If MemFile(i).inUse = 0 Then Exit For
Next
If i > 100 Then MemFileLoad = 0: Exit Function 'can't open any more memfiles!
If _FileExists(file$) = 0 Then Error 53: Exit Function 'file not found
MemFileLoad = i
temphandle = FreeFile
Open file$ For Binary As #temphandle
temp$ = Space$(LOF(temphandle))
Get temphandle, 1, temp$
Close temphandle
If compressed Then temp$ = _Inflate$(temp$)
length = Len(temp$)
MemFile(i).Content = _MemNew(length)
$Checking:Off
_MemPut MemFile(i).Content, MemFile(i).Content.OFFSET, temp$
$Checking:On
MemFile(i).inUse = -1 'TRUE
MemFile(i).EOF_Marker = length - 1 'the end of the file is the length of the file to begin with
MemFile(i).Current_Pos = 0 'and we're at the start of our nothing in the file
End Function
Sub MemFileClose (memfile)
If memfile < 1 Or memfile > 100 Then Error 5: Exit Sub 'ILLEGAL FUNCTION CALL
MemFile(memfile).inUse = 0 'no longer in sue
MemFile(memfile).EOF_Marker = 0 'nothing is written in the file to begin with
MemFile(memfile).Current_Pos = 0 'and we're at the start of our nothing in the file
_MemFree MemFile(memfile).Content 'free the memory we were using
End Sub
Function MemFileOpen
'Error codes for MemFileOpen
'1: No mem files available. (All 100 are in use! Free some to use more!)
For i = 1 To 100
If MemFile(i).inUse = 0 Then Exit For
Next
If i > 100 Then MemFileOpen = 0: Exit Function 'can't open any more memfiles!
MemFileOpen = i
MemFile(i).inUse = -1 'TRUE
MemFile(i).EOF_Marker = 0 'nothing is written in the file to begin with
MemFile(i).Current_Pos = 0 'and we're at the start of our nothing in the file
MemFile(i).Content = _MemNew(1000000) '1mb memfile by default
$Checking:Off
_MemFill MemFile(i).Content, MemFile(i).Content.OFFSET, MemFile(i).Content.SIZE, 0 As _UNSIGNED _BYTE
'make certain to blank the file when opening it for the first time so we don't have unwanted characters in it.
$Checking:On
End Function
Function MemEOF (memfile)
If MemFile(memfile).inUse = 0 Then Error 53: Exit Function 'File Not Found Error message
If MemFile(memfile).Current_Pos >= MemFile(memfile).EOF_Marker Then MemEOF = -1
End Function
Sub MemSeek (memfile, position As _Offset)
If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
If position < 0 Then Error 5: Exit Sub 'Invalid Function Call
If position > MemFile(memfile).EOF_Marker Then Error 5: Exit Sub 'Invalid Function Call
MemFile(memfile).Current_Pos = position
End Sub
Sub MemLineInput (memfile, what$)
'only valid line endings here are CHR$(10), chr$(13), and chr$(13) + chr$(10)
If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
Dim As _Offset CP, EP, Size, L
Dim tempM As _MEM, a1 As _Unsigned _Byte
tempM = MemFile(memfile).Content 'it's just much shorter to type!
CP = MemFile(memfile).Current_Pos
EP = MemFile(memfile).EOF_Marker
If CP >= EP Then Error 62: Exit Sub 'INPUT PAST END OF FILE error
Size = tempM.SIZE
$Checking:Off
Do
a$ = _MemGet(tempM, tempM.OFFSET + CP, String * 1)
Select Case a$
Case Chr$(13)
_MemGet tempM, tempM.OFFSET + CP + 1, a1
If a1 = 10 Then CP = CP + 1 'move the Current Pointer past the 2nd character in a windows CRLF ending
finished = -1
Case Chr$(10)
finished = -1
Case Else
temp$ = temp$ + a$
End Select
CP = CP + 1
If CP >= EP Then finished = -1
Loop Until finished
$Checking:On
MemFile(memfile).Current_Pos = CP
what$ = temp$
End Sub
Sub MemPrint (memfile, what$, EOL_Type As Integer)
'memfile is the memfile handle to print to
'what$ is what we want to print
'EOL_Type is the type of line ending we want after this print statement
'1: This is a CHR$(10) line ending (Linux style line ending)
'2: This is a CHR$(13) line ending (Old Mac style line ending)
'3: This is a CHR$(13) + CHR$(10) line ending (Old Windows style line ending)
'4: This is a COMMA line ending. Use this if writing continous CSV fields.
' (Think PRINT #1, stuff$, <-- see the comma there at the end of the print statement??)
If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
Dim CRLF As String
Dim As _Offset CP, EP, Size, L
Select Case EOL_Type
Case 1: CRLF = Chr$(10)
Case 2: CRLF = Chr$(13)
Case 3: CRLF = Chr$(13) + Chr$(10)
Case 4: CRLF = ","
End Select
CP = MemFile(memfile).Current_Pos
EP = MemFile(memfile).EOF_Marker
Size = MemFile(memfile).Content.SIZE
L = Len(what$) + Len(CRLF)
If CP + L > Size Then 'we're writing beyond the bounds of our reserved memory!
Dim tempM As _MEM
recheck:
If Size <= 100000000 Then 'resize our memblock (to the limit) to save our data
tempM = _MemNew(Size * 10)
_MemCopy MemFile(memfile).Content, MemFile(memfile).Content.OFFSET, Size To tempM, tempM.OFFSET
_MemFree MemFile(memfile).Content
MemFile(memfile).Content = tempM
Size = Size * 10
GoTo recheck 'just to make certain that our reserved memory is now large enough to hold our data
Else
Error 61 'DISK FULL ERROR MESSAGE
Exit Sub 'I'm coding a hard size limit of 1GB for each memfile opened!
' Anything larger than that, and I'm tossing a Disk Full Error
End If
End If
_MemPut MemFile(memfile).Content, MemFile(memfile).Content.OFFSET + CP, what$ + CRLF
MemFile(memfile).Current_Pos = CP + L
If CP + L > EP Then MemFile(memfile).EOF_Marker = CP + L
End Sub