MemFile System
#7
An update and an overhaul for the MemFile system here to make it both easier to use and more verstile.

Code: (Select All)
Type Mem_File_Type
    inUse As Integer
    CRLF As String
    EOF_Marker As _Offset
    Current_Pos As _Offset
    Content As _MEM
End Type

Dim Shared MemFile(1 To 100) As Mem_File_Type
'BI HEADER INFO BEFORE THIS

Screen _NewImage(800, 600, 32)


Dim As String wordlist(466549), wordlist3(466549) 'arrays to hold the data
$Color:32
Color Red, Yellow 'color so we can make certain that we're dealing with spaces properly
Open "test.txt" For Output As #1
Print #1, Chr$(34) + "New York, New York" + Chr$(34) + ",      New York      , New York" 'distinguish between in and out of quotes
Print #1, "Hello World, My name is " + Chr$(34) + "Steve The Awesome" + Chr$(34) + "!"
Close
Print "********** (testing INPUT)"
Open "test.txt" For Input As #1
Do Until EOF(1)
    Input #1, test$
    Print test$
Loop
Close
Print "********** (testing MemInput)"



'MEM FILE INPUT
handle = MemFileLoad("test.txt", 0) 'load a file directly into memory, and it's not compressed
t## = Timer '                                                        timer to see how long we take loading this data
Do Until MemEOF(handle) '                        Hopefully these lines will be intuitive enough.
    count = count + 1 '                          Especially when compared to the notes above
    MemInput handle, wordlist(count) '        and the preceeding lines after
    Print wordlist(count)
Loop
Print "**********"
MemFileClose handle

Color White, Black
count = 0
Print
Print "Now testing speed difference in loading a large file"

handle = MemFileLoad("466544 Word List.txt", 0) 'load a file directly into memory, and it's not compressed
t## = Timer '                                                        timer to see how long we take loading this data
Do Until MemEOF(handle) '                        Hopefully these lines will be intuitive enough.
    count = count + 1 '                          Especially when compared to the notes above
    MemInput handle, wordlist(count) '        and the preceeding lines after
Loop
Print count; Using " words loaded into memory from file with MemInput, in ##.#### seconds."; Timer - t##
MemFileClose handle
Print
Print "Now, go grab a soda or use the bathroom.  We're going to load the same list as a file FOR INPUT."
Print "Expect this to take several minutes -- we're not locking up your PC!  We're just sloooow!!"




Open "466544 Word List.txt" For Input As #1
t## = Timer
Do Until EOF(1)
    count3 = count3 + 1
    Input #1, wordlist3(count3)
Loop
Print count3; Using " words loaded from file OPEN FOR INPUT with INPUT, in ##.#### seconds."; Timer - t##
Close 1

'and let's compare contents to be safe
For i = 1 To count
    If wordlist(i) <> wordlist3(i) Then failed = -1: Print i, wordlist(i), wordlist3(i): Sleep
Next

If failed Then Print "Lists don't match" Else Print "Lists match perfectly"




'BM FOOTER AFTER THIS
Sub MemFileDump (memfile, file$, compressed) 'just one quick call to save to disk and free the memory all at once.
    MemFileSave memfile, file$, compressed
    MemFileClose memfile
End Sub

Sub MemFileSave (memfile, file$, compressed)
    If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
    temphandle = FreeFile
    Dim As _Offset length

    length = MemFile(memfile).EOF_Marker + 1
    temp$ = Space$(length)
    $Checking:Off
    _MemGet MemFile(memfile).Content, MemFile(memfile).Content.OFFSET, temp$
    $Checking:On
    If compressed Then temp1$ = _Deflate$(temp$) Else temp1$ = temp$
    Open file$ For Output As temphandle: Close temphandle 'erase any existing file with the same name
    Open file$ For Binary As temphandle
    Put #temphandle, 1, temp1$
    Close
End Sub

Function MemFileLoad% (file$, compressed)
    For i = 1 To 100
        If MemFile(i).inUse = 0 Then Exit For
    Next
    If i > 100 Then Error 5: Exit Function 'can't open any more memfiles!
    If _FileExists(file$) = 0 Then Error 53: Exit Function 'file not found

    MemFileLoad% = i
    temphandle = FreeFile
    Open file$ For Binary As #temphandle
    temp$ = Space$(LOF(temphandle))
    Get temphandle, 1, temp$
    Close temphandle
    If compressed Then temp$ = _Inflate$(temp$)
    length = Len(temp$)
    MemFile(i).Content = _MemNew(length)
    $Checking:Off
    _MemPut MemFile(i).Content, MemFile(i).Content.OFFSET, temp$
    $Checking:On

    'we want to auto-detect our CRLF endings
    'as we have the file in temp$ at the moment, we'll just search for it via instr
    If InStr(temp$, Chr$(13) + Chr$(10)) Then
        MemFile(i).CRLF = Chr$(13) + Chr$(10)
    ElseIf InStr(temp$, Chr$(10)) Then
        MemFile(i).CRLF = Chr$(10)
    ElseIf InStr(temp$, Chr$(13)) Then
        MemFile(i).CRLF = Chr$(13)
    Else
        Error 5: Exit Function
    End If
    MemFile(i).inUse = -1 'TRUE
    MemFile(i).EOF_Marker = length - 1 'the end of the file is the length of the file to begin with
    MemFile(i).Current_Pos = 0 'and we're at the start of our nothing in the file
End Function


Sub MemFileClose (memfile As Integer)
    If memfile < 1 Or memfile > 100 Then Error 5: Exit Sub 'ILLEGAL FUNCTION CALL
    MemFile(memfile).inUse = 0 'no longer in sue
    MemFile(memfile).CRLF = "" 'we have no file ending as we no longer have a file
    MemFile(memfile).EOF_Marker = 0 'nothing is written in the file to begin with
    MemFile(memfile).Current_Pos = 0 'and we're at the start of our nothing in the file
    _MemFree MemFile(memfile).Content 'free the memory we were using
End Sub


Sub MemFileCRLF (memfile As Integer, CRLF As Integer)
    If memfile < 1 Or memfile > 100 Then Error 5: Exit Sub 'ILLEGAL FUNCTION CALL
    If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
    Select Case CRLF
        Case 0: MemFile(i).CRLF = "" 'no file ending. Use this when you want text to continue on one line.  (Think PRINT with semicolon.)
        Case 1: MemFile(i).CRLF = Chr$(10) 'we default to CHR$(10) line endings
        Case 2: MemFile(i).CRLF = Chr$(13)
        Case 3: MemFile(i).CRLF = Chr$(13) + Chr$(10)
        Case Else: Error 5: Exit Sub
    End Select
End Sub


Function MemFileOpen%
    For i = 1 To 100
        If MemFile(i).inUse = 0 Then Exit For
    Next
    If i > 100 Then Error 5: Exit Function 'can't open any more memfiles!
    If CRLF < 0 Or CRLF > 3 Then Error 5: Exit Function
    MemFileOpen% = i
    MemFile(i).inUse = -1 'TRUE
    MemFile(i).EOF_Marker = 0 'nothing is written in the file to begin with
    MemFile(i).CRLF = Chr$(10) 'we default to CHR$(10) line endings
    MemFile(i).Current_Pos = 0 'and we're at the start of our nothing in the file
    MemFile(i).Content = _MemNew(1000000) '1mb memfile by default
    $Checking:Off
    _MemFill MemFile(i).Content, MemFile(i).Content.OFFSET, MemFile(i).Content.SIZE, 0 As _UNSIGNED _BYTE
    'make certain to blank the file when opening it for the first time so we don't have unwanted characters in it.
    $Checking:On
End Function

Function MemEOF& (memfile As Integer)
    If MemFile(memfile).inUse = 0 Then Error 53: Exit Function 'File Not Found Error message
    If MemFile(memfile).Current_Pos >= MemFile(memfile).EOF_Marker Then MemEOF = -1
End Function

Function MemLOF& (memfile As Integer)
    If MemFile(memfile).inUse = 0 Then Error 53: Exit Function 'File Not Found Error message
    MemLOF = Val(Str$(MemFile(memfile).EOF_Marker))
End Function



Sub MemSeek (memfile As Integer, position As _Offset)
    If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
    If position < 0 Then Error 5: Exit Sub 'Invalid Function Call
    If position > MemFile(memfile).EOF_Marker Then Error 5: Exit Sub 'Invalid Function Call
    MemFile(memfile).Current_Pos = position
End Sub

Sub MemLineInput (memfile As Integer, what$)
    If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
    Dim As _Offset CP, EP, L, o
    Dim tempM As _MEM
    tempM = MemFile(memfile).Content 'it's just much shorter to type!
    CP = MemFile(memfile).Current_Pos
    CRLF$ = MemFile(memfile).CRLF: length = Len(CRLF$)
    a$ = CRLF$: o = tempM.OFFSET + CP: L = 0
    If length = 0 Then 'we have no CRLF to look for!
        what$ = Space$(MemFile(memfile).EOF_Marker - CP) 'return the whole string in memory as the result
        _MemGet MemFile(memfile).Content, o, what$
        Exit Sub
    End If
    EP = MemFile(memfile).EOF_Marker - length
    If CP >= EP Then Error 62: Exit Sub 'INPUT PAST END OF FILE error
    $Checking:Off
    Do
        _MemGet tempM, o + L, a$
        If a$ = CRLF$ Then Exit Do
        L = L + 1
    Loop Until CP + L > EP
    temp$ = Space$(L)
    _MemGet MemFile(memfile).Content, o, temp$
    $Checking:On
    CP = CP + L + length
    MemFile(memfile).Current_Pos = CP
    what$ = temp$
End Sub


Sub MemInput (memfile As Integer, what$)
    If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
    Dim As _Offset CP, EP, L, o
    Dim tempM As _MEM, a As _Unsigned _Byte
    tempM = MemFile(memfile).Content 'it's just much shorter to type!
    CP = MemFile(memfile).Current_Pos
    CRLF$ = MemFile(memfile).CRLF
    o = tempM.OFFSET + CP: L = 0
    EP = MemFile(memfile).EOF_Marker
    If CP >= EP Then Error 62: Exit Sub 'INPUT PAST END OF FILE error
    $Checking:Off
    Do
        _MemGet tempM, o + L, a
        Select Case a 'valid line seperators
            Case 10 'chr$(10)
                length = 1
                Exit Do
            Case 13 'chr$(13)
                If _MemGet(tempM, o + L + 1, _Unsigned _Byte) = 10 Then length = 2 Else length = 1
                Exit Do
            Case 32
                If L = 0 Then o = o + 1: L = L - 1: CP = CP + 1 'strip off leading spaces
            Case 34
                If L = 0 Then inquote = -1 Else inquote = 0
                If inquote Then stripQuotes = -1
            Case 44 'comma
                If Not inquote Then length = 1: Exit Do
        End Select
        L = L + 1
    Loop Until CP + L >= EP
    temp$ = Space$(L)
    _MemGet MemFile(memfile).Content, o, temp$
    $Checking:On
    CP = CP + L + length
    MemFile(memfile).Current_Pos = CP
    If stripQuotes Then 'we only count quotes as special when they start and stop a sequence?
        If Right$(temp$, 1) = Chr$(34) Then temp$ = Mid$(temp$, 2, Len(temp$) - 2)
    End If
    what$ = _Trim$(temp$)
End Sub





Sub MemPrint (memfile As Integer, what$)
    'memfile is the memfile handle to print to
    'what$ is what we want to print
    If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message

    Dim As _Offset CP, EP, Size, L
    CP = MemFile(memfile).Current_Pos
    EP = MemFile(memfile).EOF_Marker
    Size = MemFile(memfile).Content.SIZE
    L = Len(what$) + Len(MemFile(memfile).CRLF)
    If CP + L > Size Then 'we're writing beyond the bounds of our reserved memory!
        Dim tempM As _MEM
        recheck:
        If Size <= 100000000 Then 'resize our memblock (to the limit) to save our data
            tempM = _MemNew(Size * 10)
            _MemCopy MemFile(memfile).Content, MemFile(memfile).Content.OFFSET, Size To tempM, tempM.OFFSET
            _MemFree MemFile(memfile).Content
            MemFile(memfile).Content = tempM
            Size = Size * 10
            GoTo recheck 'just to make certain that our reserved memory is now large enough to hold our data
        Else
            Error 61 'DISK FULL ERROR MESSAGE
            Exit Sub 'I'm coding a hard size limit of 1GB for each memfile opened!
            '        Anything larger than that, and I'm tossing a Disk Full Error
        End If
    End If
    _MemPut MemFile(memfile).Content, MemFile(memfile).Content.OFFSET + CP, what$ + MemFile(memfile).CRLF
    MemFile(memfile).Current_Pos = CP + L
    If CP + L > EP Then MemFile(memfile).EOF_Marker = CP + L
End Sub

The changes first:
You no longer have to specify a line ending for each MemPrint statement.  This brings our syntax in closer to what one would expect with a PRINT # statement.  Compare:

PRINT #1, stuff$
MemPrint handle, stuff$

If you want to change your line endings, you can do so via a call to MemFileCRLF and set them with it.  Default endings are CHR$(10) -- why use an extra byte of memory when it's not necessary? -- and MemFileOpen will search and detect the proper file endings automatically for any file it opens for you so you don't have to worry about it.  Truly, you should only need to change MemFileCRLF if you open a MemFileOpen and need it to have something other than the now standard CHR$(10) endings.

New additions:
MemLOF -- this is basically the same as LOF for a file, except it's for our mem file.  Everyone should know more or less what it'll do for us.

MemInput -- this allows us to do something which QB64 has been needing to do for quite a while -- have a speedier way to INPUT CSV files!  OPEN FOR INPUT is sloooooww....  and we can't OPEN FOR BINARY with INPUT (it only works with LINE INPUT)...  so we've either been stuck with doing things the slooow way, or else having to read and attempt to sort out and parse our data properly manually.  MemInput allows us to bypass this limitation now!

   


INPUT has a lot of little quirky behaviors to it, and I don't think I've ever seen a book that describes the behavior perfectly.  Items in quotes are supposed to stay in quotes, but the quotes themselves are sometimes removed from the items, and yet the quotes are always removed, and, and...  and who knows if QB64 is even perfectly mimicking how QB45 did this?!  INPUT with files is complex crap with all sorts of little exceptions and nuances.  I've did some testing and tried to replicate how QB64 does it, but there's probably a few tweaks that I'm just not aware of and so haven't coded any specific exceptions for.  If you guys do any testing and find an use case that doesn't behave as it should, post an example for me and I'll be happy to tweak things.  AFAIK, it's mimicking QB64 behavior, but I've never really used INPUT very much and thus feel like I'm just shooting blindly into the dark and hoping to hit close to the target in this case.

The difference in speed, however, means it's certainly worth implementing this little routine into your own programs, if you have to deal with large CSV datafiles.  Wink


(Note:  There's still a MemLineInput in here as well, which mimics LINE INPUT behavior.  You just have the choice now between which of the two functions you need for your usage.)
Reply


Messages In This Thread
MemFile System - by SMcNeill - 08-22-2022, 03:08 PM
RE: MemFile System - by SpriggsySpriggs - 08-22-2022, 03:16 PM
RE: MemFile System - by SMcNeill - 08-22-2022, 03:20 PM
RE: MemFile System - by SpriggsySpriggs - 08-22-2022, 03:29 PM
RE: MemFile System - by SMcNeill - 08-22-2022, 03:34 PM
RE: MemFile System - by SMcNeill - 08-22-2022, 04:07 PM
RE: MemFile System - by SMcNeill - 08-23-2022, 01:16 PM
RE: MemFile System INPUT output - by JRace - 08-23-2022, 10:56 PM
RE: MemFile System - by PhilOfPerth - 08-27-2022, 05:25 AM
RE: MemFile System - by PhilOfPerth - 08-27-2022, 06:22 AM
RE: MemFile System - by SMcNeill - 08-27-2022, 06:37 AM
RE: MemFile System - by PhilOfPerth - 08-27-2022, 08:02 AM
RE: MemFile System - by SpriggsySpriggs - 08-31-2022, 02:14 PM
RE: MemFile System - by SMcNeill - 08-31-2022, 03:01 PM
RE: MemFile System - by RhoSigma - 08-31-2022, 05:20 PM



Users browsing this thread: 4 Guest(s)