MemFile System
#6
Just ran some timed tests and I'm a little surprised by the results!




[Image: image.png]

MemLineInput times is about 1.5 seconds.
OPEN FOR BINARY and then LINE INPUT is about 2.8 seconds.
OPEN FOR BINARY and then manually PARSE the data is about 0.05 seconds.

It's still a lot faster to load and parse than any other method, but I suppose it kind of makes sense after you think about it.  By writing a generic routine, we have to check for multiple things (are the line endings CHR$(10) or CHR$(13) or CHR$(13) + CHR$(10)??  We need to check and account for all of those, along with some basic error checks.)  Here, when I parse these, I'm already using 13+10 as the file endings and moving my pointer two spots for each CRLF.

I figured the times would be closer than that, but LOAD then PARSE is still the winner in terms of absolute time it takes to get something done.  Still though, I'm pretty happy with the results where we load and read the file from memory with LINE INPUT about twice as fast as we load and read from disk in BINARY mode.  We hold true to the syntax that a beginner quickly learns and uses, with a nice boost in speed and performance for them -- and we save our SSDs from repeated read/write calls to them.

Maybe I can tweak things here and close that gap for us with the speed somewhat.  Instead of checking constantly for various CRLF endings, I could read for the first one and then assign it as the default from there on out, and skip a whole bunch of the IF type decision checks for us. 

I'll dig into that later, after I find something around here to eat decent for lunch.  Wink


Edit:  Included code in case anyone wants to test on their own:

Code: (Select All)
Type Mem_File_Type
    inUse As Integer
    EOF_Marker As _Offset
    Current_Pos As _Offset
    Content As _MEM
End Type

Dim Shared MemFile(1 To 100) As Mem_File_Type
'BI HEADER INFO BEFORE THIS

Screen _NewImage(800, 600, 32)


Dim As String wordlist(466544), wordlist2(466544), wordlist3(466544), wordlist4(466545) 'arrays to hold the data

'MEM FILE INPUT
handle = MemFileLoad("466544 Word List.txt", 0) 'load a file directly into memory, and it's not compressed
t## = Timer '                                                        timer to see how long we take loading this data
Do Until MemEOF(handle) '                        Hopefully these lines will be intuitive enough.
    count = count + 1 '                          Especially when compared to the notes above
    MemLineInput handle, wordlist(count) '        and the preceeding lines after
Loop
Print count; Using " words loaded into memory from file, in ##.#### seconds."; Timer - t##
MemFileClose handle

'OPEN FILE FOR BINARY
Open "466544 Word List.txt" For Binary As #1
t## = Timer
Do Until EOF(1)
    count3 = count3 + 1
    Line Input #1, wordlist3(count3)
Loop
Print count3; Using " words loaded from file OPEN FOR BINARY with LINE INPUT, in ##.#### seconds."; Timer - t##
Close 1

'OPEN FILE FOR BINARY AND PARSE
Open "466544 Word List.txt" For Binary As #1
temp$ = Space$(LOF(1))
Get #1, 1, temp$
Close 1
t## = Timer
p = 1: CRLF$ = Chr$(13) + Chr$(10)
Do
    count4 = count4 + 1
    l = InStr(p, temp$, CRLF$)
    wordlist4(count4) = Mid$(temp$, p, l - p)
    p = l + 2 'move the pointer by the length and 2 more for windows CRLF data
Loop Until l = 0
Print count4; Using " words loaded from file OPEN FOR BINARY then PARSED, in ##.#### seconds."; Timer - t##




'and let's compare contents to be safe
For i = 1 To count
    If wordlist(i) <> wordlist4(i) Then Print "Wordlist does not match Wordlist4": failed = -1
    If wordlist(i) <> wordlist3(i) Then Print "Wordlist does not match Wordlist3": failed = -1
Next

If failed Then
    Print "Lists do not match"
Else
    Print "Lists match each other perfectly"
End If





'BM FOOTER AFTER THIS
Sub MemFileDump (memfile, file$, compressed) 'just one quick call to save to disk and free the memory all at once.
    MemFileSave memfile, file$, compressed
    MemFileClose memfile
End Sub

Sub MemFileSave (memfile, file$, compressed)
    If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
    temphandle = FreeFile
    Dim As _Offset length

    length = MemFile(memfile).EOF_Marker + 1
    temp$ = Space$(length)
    $Checking:Off
    _MemGet MemFile(memfile).Content, MemFile(memfile).Content.OFFSET, temp$
    $Checking:On
    If compressed Then temp1$ = _Deflate$(temp$) Else temp1$ = temp$
    Open file$ For Output As temphandle: Close temphandle 'erase any existing file with the same name
    Open file$ For Binary As temphandle
    Put #temphandle, 1, temp1$
    Close
End Sub

Function MemFileLoad (file$, compressed)
    'Error codes for MemFileLoad
    '1: No mem files available.  (All 100 are in use!  Free some to use more!)
    For i = 1 To 100
        If MemFile(i).inUse = 0 Then Exit For
    Next
    If i > 100 Then MemFileLoad = 0: Exit Function 'can't open any more memfiles!
    If _FileExists(file$) = 0 Then Error 53: Exit Function 'file not found


    MemFileLoad = i
    temphandle = FreeFile
    Open file$ For Binary As #temphandle
    temp$ = Space$(LOF(temphandle))
    Get temphandle, 1, temp$
    Close temphandle
    If compressed Then temp$ = _Inflate$(temp$)
    length = Len(temp$)
    MemFile(i).Content = _MemNew(length)
    $Checking:Off
    _MemPut MemFile(i).Content, MemFile(i).Content.OFFSET, temp$
    $Checking:On
    MemFile(i).inUse = -1 'TRUE
    MemFile(i).EOF_Marker = length - 1 'the end of the file is the length of the file to begin with
    MemFile(i).Current_Pos = 0 'and we're at the start of our nothing in the file
End Function


Sub MemFileClose (memfile)
    If memfile < 1 Or memfile > 100 Then Error 5: Exit Sub 'ILLEGAL FUNCTION CALL
    MemFile(memfile).inUse = 0 'no longer in sue
    MemFile(memfile).EOF_Marker = 0 'nothing is written in the file to begin with
    MemFile(memfile).Current_Pos = 0 'and we're at the start of our nothing in the file
    _MemFree MemFile(memfile).Content 'free the memory we were using
End Sub


Function MemFileOpen
    'Error codes for MemFileOpen
    '1: No mem files available.  (All 100 are in use!  Free some to use more!)
    For i = 1 To 100
        If MemFile(i).inUse = 0 Then Exit For
    Next
    If i > 100 Then MemFileOpen = 0: Exit Function 'can't open any more memfiles!
    MemFileOpen = i
    MemFile(i).inUse = -1 'TRUE
    MemFile(i).EOF_Marker = 0 'nothing is written in the file to begin with
    MemFile(i).Current_Pos = 0 'and we're at the start of our nothing in the file
    MemFile(i).Content = _MemNew(1000000) '1mb memfile by default
    $Checking:Off
    _MemFill MemFile(i).Content, MemFile(i).Content.OFFSET, MemFile(i).Content.SIZE, 0 As _UNSIGNED _BYTE
    'make certain to blank the file when opening it for the first time so we don't have unwanted characters in it.
    $Checking:On
End Function

Function MemEOF (memfile)
    If MemFile(memfile).inUse = 0 Then Error 53: Exit Function 'File Not Found Error message
    If MemFile(memfile).Current_Pos >= MemFile(memfile).EOF_Marker Then MemEOF = -1
End Function


Sub MemSeek (memfile, position As _Offset)
    If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message
    If position < 0 Then Error 5: Exit Sub 'Invalid Function Call
    If position > MemFile(memfile).EOF_Marker Then Error 5: Exit Sub 'Invalid Function Call
    MemFile(memfile).Current_Pos = position
End Sub

Sub MemLineInput (memfile, what$)
    'only valid line endings here are CHR$(10), chr$(13), and chr$(13) + chr$(10)

    If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message

    Dim As _Offset CP, EP, Size, L
    Dim tempM As _MEM, a1 As _Unsigned _Byte
    tempM = MemFile(memfile).Content 'it's just much shorter to type!

    CP = MemFile(memfile).Current_Pos
    EP = MemFile(memfile).EOF_Marker
    If CP >= EP Then Error 62: Exit Sub 'INPUT PAST END OF FILE error
    Size = tempM.SIZE
    $Checking:Off
    Do
        a$ = _MemGet(tempM, tempM.OFFSET + CP, String * 1)
        Select Case a$
            Case Chr$(13)
                _MemGet tempM, tempM.OFFSET + CP + 1, a1
                If a1 = 10 Then CP = CP + 1 'move the Current Pointer past the 2nd character in a windows CRLF ending
                finished = -1
            Case Chr$(10)
                finished = -1
            Case Else
                temp$ = temp$ + a$
        End Select
        CP = CP + 1
        If CP >= EP Then finished = -1
    Loop Until finished
    $Checking:On
    MemFile(memfile).Current_Pos = CP
    what$ = temp$
End Sub



Sub MemPrint (memfile, what$, EOL_Type As Integer)
    'memfile is the memfile handle to print to
    'what$ is what we want to print
    'EOL_Type is the type of line ending we want after this print statement
    '1: This is a CHR$(10) line ending                  (Linux style line ending)
    '2: This is a CHR$(13) line ending                  (Old Mac style line ending)
    '3: This is a CHR$(13) + CHR$(10) line ending      (Old Windows style line ending)
    '4: This is a COMMA line ending.  Use this if writing continous CSV fields.
    '      (Think PRINT #1, stuff$, <-- see the comma there at the end of the print statement??)


    If MemFile(memfile).inUse = 0 Then Error 53: Exit Sub 'File Not Found Error message

    Dim CRLF As String
    Dim As _Offset CP, EP, Size, L

    Select Case EOL_Type
        Case 1: CRLF = Chr$(10)
        Case 2: CRLF = Chr$(13)
        Case 3: CRLF = Chr$(13) + Chr$(10)
        Case 4: CRLF = ","
    End Select
    CP = MemFile(memfile).Current_Pos
    EP = MemFile(memfile).EOF_Marker
    Size = MemFile(memfile).Content.SIZE
    L = Len(what$) + Len(CRLF)
    If CP + L > Size Then 'we're writing beyond the bounds of our reserved memory!
        Dim tempM As _MEM
        recheck:
        If Size <= 100000000 Then 'resize our memblock (to the limit) to save our data
            tempM = _MemNew(Size * 10)
            _MemCopy MemFile(memfile).Content, MemFile(memfile).Content.OFFSET, Size To tempM, tempM.OFFSET
            _MemFree MemFile(memfile).Content
            MemFile(memfile).Content = tempM
            Size = Size * 10
            GoTo recheck 'just to make certain that our reserved memory is now large enough to hold our data
        Else
            Error 61 'DISK FULL ERROR MESSAGE
            Exit Sub 'I'm coding a hard size limit of 1GB for each memfile opened!
            '        Anything larger than that, and I'm tossing a Disk Full Error
        End If
    End If
    _MemPut MemFile(memfile).Content, MemFile(memfile).Content.OFFSET + CP, what$ + CRLF
    MemFile(memfile).Current_Pos = CP + L
    If CP + L > EP Then MemFile(memfile).EOF_Marker = CP + L
End Sub
Reply


Messages In This Thread
MemFile System - by SMcNeill - 08-22-2022, 03:08 PM
RE: MemFile System - by SpriggsySpriggs - 08-22-2022, 03:16 PM
RE: MemFile System - by SMcNeill - 08-22-2022, 03:20 PM
RE: MemFile System - by SpriggsySpriggs - 08-22-2022, 03:29 PM
RE: MemFile System - by SMcNeill - 08-22-2022, 03:34 PM
RE: MemFile System - by SMcNeill - 08-22-2022, 04:07 PM
RE: MemFile System - by SMcNeill - 08-23-2022, 01:16 PM
RE: MemFile System INPUT output - by JRace - 08-23-2022, 10:56 PM
RE: MemFile System - by PhilOfPerth - 08-27-2022, 05:25 AM
RE: MemFile System - by PhilOfPerth - 08-27-2022, 06:22 AM
RE: MemFile System - by SMcNeill - 08-27-2022, 06:37 AM
RE: MemFile System - by PhilOfPerth - 08-27-2022, 08:02 AM
RE: MemFile System - by SpriggsySpriggs - 08-31-2022, 02:14 PM
RE: MemFile System - by SMcNeill - 08-31-2022, 03:01 PM
RE: MemFile System - by RhoSigma - 08-31-2022, 05:20 PM



Users browsing this thread: 3 Guest(s)