An update and an overhaul for the MemFile system here to make it both easier to use and more verstile.
The changes first:
You no longer have to specify a line ending for each MemPrint statement. This brings our syntax in closer to what one would expect with a PRINT # statement. Compare:
PRINT #1, stuff$
MemPrint handle, stuff$
If you want to change your line endings, you can do so via a call to MemFileCRLF and set them with it. Default endings are CHR$(10) -- why use an extra byte of memory when it's not necessary? -- and MemFileOpen will search and detect the proper file endings automatically for any file it opens for you so you don't have to worry about it. Truly, you should only need to change MemFileCRLF if you open a MemFileOpen and need it to have something other than the now standard CHR$(10) endings.
New additions:
MemLOF -- this is basically the same as LOF for a file, except it's for our mem file. Everyone should know more or less what it'll do for us.
MemInput -- this allows us to do something which QB64 has been needing to do for quite a while -- have a speedier way to INPUT CSV files! OPEN FOR INPUT is sloooooww.... and we can't OPEN FOR BINARY with INPUT (it only works with LINE INPUT)... so we've either been stuck with doing things the slooow way, or else having to read and attempt to sort out and parse our data properly manually. MemInput allows us to bypass this limitation now!
INPUT has a lot of little quirky behaviors to it, and I don't think I've ever seen a book that describes the behavior perfectly. Items in quotes are supposed to stay in quotes, but the quotes themselves are sometimes removed from the items, and yet the quotes are always removed, and, and... and who knows if QB64 is even perfectly mimicking how QB45 did this?! INPUT with files is complex crap with all sorts of little exceptions and nuances. I've did some testing and tried to replicate how QB64 does it, but there's probably a few tweaks that I'm just not aware of and so haven't coded any specific exceptions for. If you guys do any testing and find an use case that doesn't behave as it should, post an example for me and I'll be happy to tweak things. AFAIK, it's mimicking QB64 behavior, but I've never really used INPUT very much and thus feel like I'm just shooting blindly into the dark and hoping to hit close to the target in this case.
The difference in speed, however, means it's certainly worth implementing this little routine into your own programs, if you have to deal with large CSV datafiles.
(Note: There's still a MemLineInput in here as well, which mimics LINE INPUT behavior. You just have the choice now between which of the two functions you need for your usage.)
Code: (Select All)
Type Mem_File_Type
inUse As Integer
CRLF As String
EOF_Marker As _Offset
Current_Pos As _Offset
Content As _MEM
End Type
Dim Shared MemFile(1 To 100) As Mem_File_Type
'BI HEADER INFO BEFORE THIS
Screen _NewImage(800, 600, 32)
Dim As String wordlist(466549), wordlist3(466549) 'arrays to hold the data
$Color:32
Color Red, Yellow 'color so we can make certain that we're dealing with spaces properly
Open "test.txt" For Output As #1
Print #1, Chr$(34) + "New York, New York" + Chr$(34) + ", New York , New York" 'distinguish between in and out of quotes
Print #1, "Hello World, My name is " + Chr$(34) + "Steve The Awesome" + Chr$(34) + "!"
Close
Print "********** (testing INPUT)"
Open "test.txt" For Input As #1
Do Until EOF(1)
Input #1, test$
Print test$
Loop
Close
Print "********** (testing MemInput)"
'MEM FILE INPUT
handle = MemFileLoad("test.txt", 0) 'load a file directly into memory, and it's not compressed
t## = Timer ' timer to see how long we take loading this data
Do Until MemEOF(handle) ' Hopefully these lines will be intuitive enough.
count = count + 1 ' Especially when compared to the notes above
MemInput handle, wordlist(count) ' and the preceeding lines after
Print wordlist(count)
Loop
Print "**********"
MemFileClose handle
Color White, Black
count = 0
Print
Print "Now testing speed difference in loading a large file"
handle = MemFileLoad("466544 Word List.txt", 0) 'load a file directly into memory, and it's not compressed
t## = Timer ' timer to see how long we take loading this data
Do Until MemEOF(handle) ' Hopefully these lines will be intuitive enough.
count = count + 1 ' Especially when compared to the notes above
MemInput handle, wordlist(count) ' and the preceeding lines after
Loop
Print count; Using " words loaded into memory from file with MemInput, in ##.#### seconds."; Timer - t##
MemFileClose handle
Print
Print "Now, go grab a soda or use the bathroom. We're going to load the same list as a file FOR INPUT."
Print "Expect this to take several minutes -- we're not locking up your PC! We're just sloooow!!"
Open "466544 Word List.txt" For Input As #1
t## = Timer
Do Until EOF(1)
count3 = count3 + 1
Input #1, wordlist3(count3)
Loop
Print count3; Using " words loaded from file OPEN FOR INPUT with INPUT, in ##.#### seconds."; Timer - t##
Close 1
'and let's compare contents to be safe
For i = 1 To count
If wordlist(i) <> wordlist3(i) Then failed = -1: Print i, wordlist(i), wordlist3(i): Sleep
Next
If failed Then Print "Lists don't match" Else Print "Lists match perfectly"
'BM FOOTER AFTER THIS
Sub MemFileDump (memfile, file$, compressed) 'just one quick call to save to disk and free the memory all at once.
MemFileSave memfile, file$, compressed
MemFileClose memfile
End Sub
Sub MemFileSave (memfile, file$, compressed)
If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
temphandle = FreeFile
Dim As _Offset length
length = MemFile(memfile).EOF_Marker + 1
temp$ = Space$(length)
$Checking:Off
_MemGet MemFile(memfile).Content, MemFile(memfile).Content.OFFSET, temp$
$Checking:On
If compressed Then temp1$ = _Deflate$(temp$) Else temp1$ = temp$
Open file$ For Output As temphandle: Close temphandle 'erase any existing file with the same name
Open file$ For Binary As temphandle
Put #temphandle, 1, temp1$
Close
End Sub
Function MemFileLoad% (file$, compressed)
For i = 1 To 100
If MemFile(i).inUse = 0 Then Exit For
Next
If i > 100 Then Error 5: Exit Function 'can't open any more memfiles!
If _FileExists(file$) = 0 Then Error 53: Exit Function 'file not found
MemFileLoad% = i
temphandle = FreeFile
Open file$ For Binary As #temphandle
temp$ = Space$(LOF(temphandle))
Get temphandle, 1, temp$
Close temphandle
If compressed Then temp$ = _Inflate$(temp$)
length = Len(temp$)
MemFile(i).Content = _MemNew(length)
$Checking:Off
_MemPut MemFile(i).Content, MemFile(i).Content.OFFSET, temp$
$Checking:On
'we want to auto-detect our CRLF endings
'as we have the file in temp$ at the moment, we'll just search for it via instr
If InStr(temp$, Chr$(13) + Chr$(10)) Then
MemFile(i).CRLF = Chr$(13) + Chr$(10)
ElseIf InStr(temp$, Chr$(10)) Then
MemFile(i).CRLF = Chr$(10)
ElseIf InStr(temp$, Chr$(13)) Then
MemFile(i).CRLF = Chr$(13)
Else
Error 5: Exit Function
End If
MemFile(i).inUse = -1 'TRUE
MemFile(i).EOF_Marker = length - 1 'the end of the file is the length of the file to begin with
MemFile(i).Current_Pos = 0 'and we're at the start of our nothing in the file
End Function
Sub MemFileClose (memfile As Integer)
If memfile < 1 Or memfile > 100 Then Error 5: Exit Sub 'ILLEGAL FUNCTION CALL
MemFile(memfile).inUse = 0 'no longer in sue
MemFile(memfile).CRLF = "" 'we have no file ending as we no longer have a file
MemFile(memfile).EOF_Marker = 0 'nothing is written in the file to begin with
MemFile(memfile).Current_Pos = 0 'and we're at the start of our nothing in the file
_MemFree MemFile(memfile).Content 'free the memory we were using
End Sub
Sub MemFileCRLF (memfile As Integer, CRLF As Integer)
If memfile < 1 Or memfile > 100 Then Error 5: Exit Sub 'ILLEGAL FUNCTION CALL
If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
Select Case CRLF
Case 0: MemFile(i).CRLF = "" 'no file ending. Use this when you want text to continue on one line. (Think PRINT with semicolon.)
Case 1: MemFile(i).CRLF = Chr$(10) 'we default to CHR$(10) line endings
Case 2: MemFile(i).CRLF = Chr$(13)
Case 3: MemFile(i).CRLF = Chr$(13) + Chr$(10)
Case Else: Error 5: Exit Sub
End Select
End Sub
Function MemFileOpen%
For i = 1 To 100
If MemFile(i).inUse = 0 Then Exit For
Next
If i > 100 Then Error 5: Exit Function 'can't open any more memfiles!
If CRLF < 0 Or CRLF > 3 Then Error 5: Exit Function
MemFileOpen% = i
MemFile(i).inUse = -1 'TRUE
MemFile(i).EOF_Marker = 0 'nothing is written in the file to begin with
MemFile(i).CRLF = Chr$(10) 'we default to CHR$(10) line endings
MemFile(i).Current_Pos = 0 'and we're at the start of our nothing in the file
MemFile(i).Content = _MemNew(1000000) '1mb memfile by default
$Checking:Off
_MemFill MemFile(i).Content, MemFile(i).Content.OFFSET, MemFile(i).Content.SIZE, 0 As _UNSIGNED _BYTE
'make certain to blank the file when opening it for the first time so we don't have unwanted characters in it.
$Checking:On
End Function
Function MemEOF& (memfile As Integer)
If MemFile(memfile).inUse = 0 Then Error 53: Exit Function 'File Not Found Error message
If MemFile(memfile).Current_Pos >= MemFile(memfile).EOF_Marker Then MemEOF = -1
End Function
Function MemLOF& (memfile As Integer)
If MemFile(memfile).inUse = 0 Then Error 53: Exit Function 'File Not Found Error message
MemLOF = Val(Str$(MemFile(memfile).EOF_Marker))
End Function
Sub MemSeek (memfile As Integer, position As _Offset)
If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
If position < 0 Then Error 5: Exit Sub 'Invalid Function Call
If position > MemFile(memfile).EOF_Marker Then Error 5: Exit Sub 'Invalid Function Call
MemFile(memfile).Current_Pos = position
End Sub
Sub MemLineInput (memfile As Integer, what$)
If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
Dim As _Offset CP, EP, L, o
Dim tempM As _MEM
tempM = MemFile(memfile).Content 'it's just much shorter to type!
CP = MemFile(memfile).Current_Pos
CRLF$ = MemFile(memfile).CRLF: length = Len(CRLF$)
a$ = CRLF$: o = tempM.OFFSET + CP: L = 0
If length = 0 Then 'we have no CRLF to look for!
what$ = Space$(MemFile(memfile).EOF_Marker - CP) 'return the whole string in memory as the result
_MemGet MemFile(memfile).Content, o, what$
Exit Sub
End If
EP = MemFile(memfile).EOF_Marker - length
If CP >= EP Then Error 62: Exit Sub 'INPUT PAST END OF FILE error
$Checking:Off
Do
_MemGet tempM, o + L, a$
If a$ = CRLF$ Then Exit Do
L = L + 1
Loop Until CP + L > EP
temp$ = Space$(L)
_MemGet MemFile(memfile).Content, o, temp$
$Checking:On
CP = CP + L + length
MemFile(memfile).Current_Pos = CP
what$ = temp$
End Sub
Sub MemInput (memfile As Integer, what$)
If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
Dim As _Offset CP, EP, L, o
Dim tempM As _MEM, a As _Unsigned _Byte
tempM = MemFile(memfile).Content 'it's just much shorter to type!
CP = MemFile(memfile).Current_Pos
CRLF$ = MemFile(memfile).CRLF
o = tempM.OFFSET + CP: L = 0
EP = MemFile(memfile).EOF_Marker
If CP >= EP Then Error 62: Exit Sub 'INPUT PAST END OF FILE error
$Checking:Off
Do
_MemGet tempM, o + L, a
Select Case a 'valid line seperators
Case 10 'chr$(10)
length = 1
Exit Do
Case 13 'chr$(13)
If _MemGet(tempM, o + L + 1, _Unsigned _Byte) = 10 Then length = 2 Else length = 1
Exit Do
Case 32
If L = 0 Then o = o + 1: L = L - 1: CP = CP + 1 'strip off leading spaces
Case 34
If L = 0 Then inquote = -1 Else inquote = 0
If inquote Then stripQuotes = -1
Case 44 'comma
If Not inquote Then length = 1: Exit Do
End Select
L = L + 1
Loop Until CP + L >= EP
temp$ = Space$(L)
_MemGet MemFile(memfile).Content, o, temp$
$Checking:On
CP = CP + L + length
MemFile(memfile).Current_Pos = CP
If stripQuotes Then 'we only count quotes as special when they start and stop a sequence?
If Right$(temp$, 1) = Chr$(34) Then temp$ = Mid$(temp$, 2, Len(temp$) - 2)
End If
what$ = _Trim$(temp$)
End Sub
Sub MemPrint (memfile As Integer, what$)
'memfile is the memfile handle to print to
'what$ is what we want to print
If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
Dim As _Offset CP, EP, Size, L
CP = MemFile(memfile).Current_Pos
EP = MemFile(memfile).EOF_Marker
Size = MemFile(memfile).Content.SIZE
L = Len(what$) + Len(MemFile(memfile).CRLF)
If CP + L > Size Then 'we're writing beyond the bounds of our reserved memory!
Dim tempM As _MEM
recheck:
If Size <= 100000000 Then 'resize our memblock (to the limit) to save our data
tempM = _MemNew(Size * 10)
_MemCopy MemFile(memfile).Content, MemFile(memfile).Content.OFFSET, Size To tempM, tempM.OFFSET
_MemFree MemFile(memfile).Content
MemFile(memfile).Content = tempM
Size = Size * 10
GoTo recheck 'just to make certain that our reserved memory is now large enough to hold our data
Else
Error 61 'DISK FULL ERROR MESSAGE
Exit Sub 'I'm coding a hard size limit of 1GB for each memfile opened!
' Anything larger than that, and I'm tossing a Disk Full Error
End If
End If
_MemPut MemFile(memfile).Content, MemFile(memfile).Content.OFFSET + CP, what$ + MemFile(memfile).CRLF
MemFile(memfile).Current_Pos = CP + L
If CP + L > EP Then MemFile(memfile).EOF_Marker = CP + L
End Sub
The changes first:
You no longer have to specify a line ending for each MemPrint statement. This brings our syntax in closer to what one would expect with a PRINT # statement. Compare:
PRINT #1, stuff$
MemPrint handle, stuff$
If you want to change your line endings, you can do so via a call to MemFileCRLF and set them with it. Default endings are CHR$(10) -- why use an extra byte of memory when it's not necessary? -- and MemFileOpen will search and detect the proper file endings automatically for any file it opens for you so you don't have to worry about it. Truly, you should only need to change MemFileCRLF if you open a MemFileOpen and need it to have something other than the now standard CHR$(10) endings.
New additions:
MemLOF -- this is basically the same as LOF for a file, except it's for our mem file. Everyone should know more or less what it'll do for us.
MemInput -- this allows us to do something which QB64 has been needing to do for quite a while -- have a speedier way to INPUT CSV files! OPEN FOR INPUT is sloooooww.... and we can't OPEN FOR BINARY with INPUT (it only works with LINE INPUT)... so we've either been stuck with doing things the slooow way, or else having to read and attempt to sort out and parse our data properly manually. MemInput allows us to bypass this limitation now!
INPUT has a lot of little quirky behaviors to it, and I don't think I've ever seen a book that describes the behavior perfectly. Items in quotes are supposed to stay in quotes, but the quotes themselves are sometimes removed from the items, and yet the quotes are always removed, and, and... and who knows if QB64 is even perfectly mimicking how QB45 did this?! INPUT with files is complex crap with all sorts of little exceptions and nuances. I've did some testing and tried to replicate how QB64 does it, but there's probably a few tweaks that I'm just not aware of and so haven't coded any specific exceptions for. If you guys do any testing and find an use case that doesn't behave as it should, post an example for me and I'll be happy to tweak things. AFAIK, it's mimicking QB64 behavior, but I've never really used INPUT very much and thus feel like I'm just shooting blindly into the dark and hoping to hit close to the target in this case.
The difference in speed, however, means it's certainly worth implementing this little routine into your own programs, if you have to deal with large CSV datafiles.
(Note: There's still a MemLineInput in here as well, which mimics LINE INPUT behavior. You just have the choice now between which of the two functions you need for your usage.)