MemFile System - Printable Version

MemFile System - Printable Version

+- QB64 Phoenix Edition (https://staging.qb64phoenix.com)
+-- Forum: QB64 Rising (https://staging.qb64phoenix.com/forumdisplay.php?fid=1)
+--- Forum: Prolific Programmers (https://staging.qb64phoenix.com/forumdisplay.php?fid=26)
+---- Forum: SMcNeill (https://staging.qb64phoenix.com/forumdisplay.php?fid=29)
+---- Thread: MemFile System (/showthread.php?tid=797)

Pages: 1 2

RE: MemFile System - SMcNeill - 08-27-2022

I've got much better word lists and dictionaries, if you need something like that. The reason I tend to just use the one I chose here is simply due to its sheer size and number of entries. It makes a good baseline for timed tests to see how long it takes to load and process something. For just pure *words*, I'd suggest to just download and use the Official Scrabble Dictionary.txt. Wink

RE: MemFile System - PhilOfPerth - 08-27-2022

(08-27-2022, 06:37 AM)SMcNeill Wrote: I've got much better word lists and dictionaries, if you need something like that. The reason I tend to just use the one I chose here is simply due to its sheer size and number of entries. It makes a good baseline for timed tests to see how long it takes to load and process something. For just pure *words*, I'd suggest to just download and use the Official Scrabble Dictionary.txt.

I have, and I am. This one just caught my attention as the Scrabble one is about 280000 words. I sub-divided the Scrabble one into 26 files so I can call the appropriate file when checking words, to save search time. Smile

RE: MemFile System - SpriggsySpriggs - 08-31-2022

Something I just noticed with this is that if you are assuming that the line endings are CHR$(13) + CHR$(10) ("\r\n") then that might not work with a file that has UNIX line endings, which I think is just CHR$(10) ("\n"). You might want to split on just CHR$(10) and then check for CHR$(13) existing after the split. If it does, you can just delete those. A foolproof way that I split a file up is by using my tokenize function, which uses strtok. It takes a list of characters to split on and it works just fine regardless of the file having UNIX or Windows line endings.

RE: MemFile System - SMcNeill - 08-31-2022

(08-31-2022, 02:14 PM)Spriggsy Wrote: Something I just noticed with this is that if you are assuming that the line endings are CHR$(13) + CHR$(10) ("\r\n") then that might not work with a file that has UNIX line endings, which I think is just CHR$(10) ("\n"). You might want to split on just CHR$(10) and then check for CHR$(13) existing after the split. If it does, you can just delete those. A foolproof way that I split a file up is by using my tokenize function, which uses strtok. It takes a list of characters to split on and it works just fine regardless of the file having UNIX or Windows line endings.

Code: (Select All)
    'we want to auto-detect our CRLF endings

    'as we have the file in temp$ at the moment, we'll just search for it via instr

    If InStr(temp$, Chr$(13) + Chr$(10)) Then

        MemFile(i).CRLF = Chr$(13) + Chr$(10)

    ElseIf InStr(temp$, Chr$(10)) Then

        MemFile(i).CRLF = Chr$(10)

    ElseIf InStr(temp$, Chr$(13)) Then

        MemFile(i).CRLF = Chr$(13)

    Else

        Error 5: Exit Function

    End If

It searches your file to see what type of line endings you have in it. Unless you have mixed endings, (like some end with CHR$(10) and others end with CHR$(13), it'll work automagically for you. If you have mixed endings, you'll probably need to write a routine to normalize to one format or the other, before making use of these functions. I didn't want to tie up the INPUT times by having them do a series of IF checks to see if you have a 10, 13, or 1310 set of endings on each line. I was going a little more for speed and efficiency, which should work for 99.9% of most files, than flexibility to make certain we can read every mixed-ending file out there. Wink

RE: MemFile System - RhoSigma - 08-31-2022

(08-31-2022, 02:14 PM)Spriggsy Wrote: Something I just noticed with this is that if you are assuming that the line endings are CHR$(13) + CHR$(10) ("\r\n") then that might not work with a file that has UNIX line endings, which I think is just CHR$(10) ("\n"). You might want to split on just CHR$(10) and then check for CHR$(13) existing after the split. If it does, you can just delete those. A foolproof way that I split a file up is by using my tokenize function, which uses strtok. It takes a list of characters to split on and it works just fine regardless of the file having UNIX or Windows line endings.

If you are in need for a more comprehensive system, then take this one:
https://staging.qb64phoenix.com/showthread.php?tid=486

It's basically the same thing but build on a string array rather then _MEM.