Login

**SMcNeill** · 08-27-2022, 06:37 AM

I've got much better word lists and dictionaries, if you need something like that. The reason I tend to just use the one I chose here is simply due to its sheer size and number of entries. It makes a good baseline for timed tests to see how long it takes to load and process something. For just pure *words*, I'd suggest to just download and use the Official Scrabble Dictionary.txt. Wink

PhilOfPerth · 08-27-2022, 08:02 AM

(08-27-2022, 06:37 AM)SMcNeill Wrote: I've got much better word lists and dictionaries, if you need something like that. The reason I tend to just use the one I chose here is simply due to its sheer size and number of entries. It makes a good baseline for timed tests to see how long it takes to load and process something. For just pure *words*, I'd suggest to just download and use the Official Scrabble Dictionary.txt.

I have, and I am. This one just caught my attention as the Scrabble one is about 280000 words. I sub-divided the Scrabble one into 26 files so I can call the appropriate file when checking words, to save search time. Smile

SpriggsySpriggs · (This post was last modified: 08-31-2022, 02:18 PM by SpriggsySpriggs.)

Something I just noticed with this is that if you are assuming that the line endings are CHR$(13) + CHR$(10) ("\r\n") then that might not work with a file that has UNIX line endings, which I think is just CHR$(10) ("\n"). You might want to split on just CHR$(10) and then check for CHR$(13) existing after the split. If it does, you can just delete those. A foolproof way that I split a file up is by using my tokenize function, which uses strtok. It takes a list of characters to split on and it works just fine regardless of the file having UNIX or Windows line endings.

**SMcNeill** · 08-31-2022, 03:01 PM

(08-31-2022, 02:14 PM)Spriggsy Wrote: Something I just noticed with this is that if you are assuming that the line endings are CHR$(13) + CHR$(10) ("\r\n") then that might not work with a file that has UNIX line endings, which I think is just CHR$(10) ("\n"). You might want to split on just CHR$(10) and then check for CHR$(13) existing after the split. If it does, you can just delete those. A foolproof way that I split a file up is by using my tokenize function, which uses strtok. It takes a list of characters to split on and it works just fine regardless of the file having UNIX or Windows line endings.

Code: (Select All)
    'we want to auto-detect our CRLF endings

    'as we have the file in temp$ at the moment, we'll just search for it via instr

    If InStr(temp$, Chr$(13) + Chr$(10)) Then

        MemFile(i).CRLF = Chr$(13) + Chr$(10)

    ElseIf InStr(temp$, Chr$(10)) Then

        MemFile(i).CRLF = Chr$(10)

    ElseIf InStr(temp$, Chr$(13)) Then

        MemFile(i).CRLF = Chr$(13)

    Else

        Error 5: Exit Function

    End If

It searches your file to see what type of line endings you have in it. Unless you have mixed endings, (like some end with CHR$(10) and others end with CHR$(13), it'll work automagically for you. If you have mixed endings, you'll probably need to write a routine to normalize to one format or the other, before making use of these functions. I didn't want to tie up the INPUT times by having them do a series of IF checks to see if you have a 10, 13, or 1310 set of endings on each line. I was going a little more for speed and efficiency, which should work for 99.9% of most files, than flexibility to make certain we can read every mixed-ending file out there. Wink

RhoSigma · 08-31-2022, 05:20 PM

(08-31-2022, 02:14 PM)Spriggsy Wrote: Something I just noticed with this is that if you are assuming that the line endings are CHR$(13) + CHR$(10) ("\r\n") then that might not work with a file that has UNIX line endings, which I think is just CHR$(10) ("\n"). You might want to split on just CHR$(10) and then check for CHR$(13) existing after the split. If it does, you can just delete those. A foolproof way that I split a file up is by using my tokenize function, which uses strtok. It takes a list of characters to split on and it works just fine regardless of the file having UNIX or Windows line endings.

If you are in need for a more comprehensive system, then take this one:
https://staging.qb64phoenix.com/showthread.php?tid=486

It's basically the same thing but build on a string array rather then _MEM.

Login
Username/Email:
Password:	Lost Password?
	Remember me