String Tokenizer
#5
(05-25-2023, 11:04 PM)Kernelpanic Wrote: I know the "StringTokenizer" class from Java. Recreating this might not be easy. It would probably make more sense to be able to call a corresponding program in Java from QB64 with the transfer of a text. Just like it is with C.

In Java:
Code: (Select All)
/* StrinkTokenizer Beispiel - 26. Mai 2023 */

import java.util.*;

public class BeispielToken
{
   public static void main(String[] args)
   {
     String s = "Dies ist nur ein Test";
     StringTokenizer st = new StringTokenizer(s);
     while (st.hasMoreTokens())
         {
       System.out.println(st.nextToken());
     }
   }
}

[Image: String-Tokenizer2023-05-26.jpg]

The Java StringTokenizer is exactly what the design of this is based on. And after looking at RhoSigma's code I took some inspiration and got carried away. lol.

Code: (Select All)
$CONSOLE:ONLY
OPTION _EXPLICIT

REDIM mytokens(-2 TO -2) AS STRING

DIM s AS STRING: s = "Function MyFunc(MyStr As String, Optional MyArg1 As Integer = 5, Optional MyArg2 = 'Dolores Abernathy')"

DIM n AS LONG: n = TokenizeString(s, "(),= ", 0, "''", mytokens())
PRINT n; " tokens parsed"

DIM i AS LONG
FOR i = LBOUND(mytokens) TO UBOUND(mytokens)
    PRINT i; "="; mytokens(i)
    SLEEP 1
NEXT

END

' Tokenizes a string to a dynamic string array
' text - is the input string
' delims - is a list of delimiters (multiple delimiters can be specified)
' tokens() - is the array that will hold the tokens
' returnDelims - if True, then the routine will also return the delimiters in the correct position in the tokens array
' quoteChars - is the string containing the opening and closing "quote" characters. Should be 2 chars only
' Returns: the number of tokens parsed
FUNCTION TokenizeString& (text AS STRING, delims AS STRING, returnDelims AS _BYTE, quoteChars AS STRING, tokens() AS STRING)
    DIM strLen AS LONG: strLen = LEN(text)

    IF strLen = 0 THEN EXIT FUNCTION ' nothing to be done

    DIM arrIdx AS LONG: arrIdx = LBOUND(tokens) ' we'll always start from the array lower bound - whatever it is
    DIM insideQuote AS _BYTE ' flag to track if currently inside a quote

    DIM token AS STRING ' holds a token until it is ready to be added to the array
    DIM char AS STRING * 1 ' this is a single char from text we are iterating through
    DIM AS LONG i, count

    ' Iterate through the characters in the text string
    FOR i = 1 TO strLen
        char = CHR$(ASC(text, i))
        IF insideQuote THEN
            IF char = RIGHT$(quoteChars, 1) THEN
                ' Closing quote char encountered, resume delimiting
                insideQuote = 0
                GOSUB add_token ' add the token to the array
                IF returnDelims THEN GOSUB add_delim ' add the closing quote char as delimiter if required
            ELSE
                token = token + char ' add the character to the current token
            END IF
        ELSE
            IF char = LEFT$(quoteChars, 1) THEN
                ' Opening quote char encountered, temporarily stop delimiting
                insideQuote = -1
                GOSUB add_token ' add the token to the array
                IF returnDelims THEN GOSUB add_delim ' add the opening quote char as delimiter if required
            ELSEIF INSTR(delims, char) = 0 THEN
                token = token + char ' add the character to the current token
            ELSE
                GOSUB add_token ' found a delimiter, add the token to the array
                IF returnDelims THEN GOSUB add_delim ' found a delimiter, add it to the array if required
            END IF
        END IF
    NEXT

    GOSUB add_token ' add the final token if there is any

    IF count > 0 THEN REDIM _PRESERVE tokens(LBOUND(tokens) TO arrIdx - 1) AS STRING ' resize the array to the exact size

    TokenizeString = count

    EXIT FUNCTION

    ' Add the token to the array if there is any
    add_token:
    IF LEN(token) > 0 THEN
        tokens(arrIdx) = token ' add the token to the token array
        token = "" ' clear the current token
        GOSUB increment_counters_and_resize_array
    END IF
    RETURN

    ' Add delimiter to array if required
    add_delim:
    tokens(arrIdx) = char ' add delimiter to array
    GOSUB increment_counters_and_resize_array
    RETURN

    ' Increment the count and array index and resize the array if needed
    increment_counters_and_resize_array:
    count = count + 1 ' increment the token count
    arrIdx = arrIdx + 1 ' move to next position
    IF arrIdx > UBOUND(tokens) THEN REDIM _PRESERVE tokens(LBOUND(tokens) TO UBOUND(tokens) + 512) AS STRING ' resize in 512 chunks
    RETURN
END FUNCTION


I'll update the main post.
Reply


Messages In This Thread
String Tokenizer - by a740g - 05-25-2023, 03:33 PM
RE: String Tokenizer - by RhoSigma - 05-25-2023, 09:33 PM
RE: String Tokenizer - by a740g - 05-25-2023, 10:45 PM
RE: String Tokenizer - by Kernelpanic - 05-25-2023, 11:04 PM
RE: String Tokenizer - by a740g - 05-26-2023, 12:21 AM
RE: String Tokenizer - by Ultraman - 05-26-2023, 03:56 PM
RE: String Tokenizer - by Kernelpanic - 05-26-2023, 08:44 PM
RE: String Tokenizer - by Kernelpanic - 05-30-2023, 09:15 PM
RE: String Tokenizer - by Ultraman - 06-29-2023, 11:42 AM



Users browsing this thread: 1 Guest(s)