MemFile System - Printable Version +- QB64 Phoenix Edition (https://staging.qb64phoenix.com) +-- Forum: QB64 Rising (https://staging.qb64phoenix.com/forumdisplay.php?fid=1) +--- Forum: Prolific Programmers (https://staging.qb64phoenix.com/forumdisplay.php?fid=26) +---- Forum: SMcNeill (https://staging.qb64phoenix.com/forumdisplay.php?fid=29) +---- Thread: MemFile System (/showthread.php?tid=797) Pages:
1
2
|
RE: MemFile System - SMcNeill - 08-27-2022 I've got much better word lists and dictionaries, if you need something like that. The reason I tend to just use the one I chose here is simply due to its sheer size and number of entries. It makes a good baseline for timed tests to see how long it takes to load and process something. For just pure *words*, I'd suggest to just download and use the Official Scrabble Dictionary.txt. RE: MemFile System - PhilOfPerth - 08-27-2022 (08-27-2022, 06:37 AM)SMcNeill Wrote: I've got much better word lists and dictionaries, if you need something like that. The reason I tend to just use the one I chose here is simply due to its sheer size and number of entries. It makes a good baseline for timed tests to see how long it takes to load and process something. For just pure *words*, I'd suggest to just download and use the Official Scrabble Dictionary.txt. I have, and I am. This one just caught my attention as the Scrabble one is about 280000 words. I sub-divided the Scrabble one into 26 files so I can call the appropriate file when checking words, to save search time. RE: MemFile System - SpriggsySpriggs - 08-31-2022 Something I just noticed with this is that if you are assuming that the line endings are CHR$(13) + CHR$(10) ("\r\n") then that might not work with a file that has UNIX line endings, which I think is just CHR$(10) ("\n"). You might want to split on just CHR$(10) and then check for CHR$(13) existing after the split. If it does, you can just delete those. A foolproof way that I split a file up is by using my tokenize function, which uses strtok. It takes a list of characters to split on and it works just fine regardless of the file having UNIX or Windows line endings. RE: MemFile System - SMcNeill - 08-31-2022 (08-31-2022, 02:14 PM)Spriggsy Wrote: Something I just noticed with this is that if you are assuming that the line endings are CHR$(13) + CHR$(10) ("\r\n") then that might not work with a file that has UNIX line endings, which I think is just CHR$(10) ("\n"). You might want to split on just CHR$(10) and then check for CHR$(13) existing after the split. If it does, you can just delete those. A foolproof way that I split a file up is by using my tokenize function, which uses strtok. It takes a list of characters to split on and it works just fine regardless of the file having UNIX or Windows line endings. Code: (Select All) 'we want to auto-detect our CRLF endings It searches your file to see what type of line endings you have in it. Unless you have mixed endings, (like some end with CHR$(10) and others end with CHR$(13), it'll work automagically for you. If you have mixed endings, you'll probably need to write a routine to normalize to one format or the other, before making use of these functions. I didn't want to tie up the INPUT times by having them do a series of IF checks to see if you have a 10, 13, or 1310 set of endings on each line. I was going a little more for speed and efficiency, which should work for 99.9% of most files, than flexibility to make certain we can read every mixed-ending file out there. RE: MemFile System - RhoSigma - 08-31-2022 (08-31-2022, 02:14 PM)Spriggsy Wrote: Something I just noticed with this is that if you are assuming that the line endings are CHR$(13) + CHR$(10) ("\r\n") then that might not work with a file that has UNIX line endings, which I think is just CHR$(10) ("\n"). You might want to split on just CHR$(10) and then check for CHR$(13) existing after the split. If it does, you can just delete those. A foolproof way that I split a file up is by using my tokenize function, which uses strtok. It takes a list of characters to split on and it works just fine regardless of the file having UNIX or Windows line endings. If you are in need for a more comprehensive system, then take this one: https://staging.qb64phoenix.com/showthread.php?tid=486 It's basically the same thing but build on a string array rather then _MEM. |