Text Parser
#1
UPDATE: I made changes to the code with great suggestions and code examples from Pete. The routine is much more stable now. Thanks Pete!
By the way - writing text parsing routines is much harder than it looks. So much to consider in their design.

While writing the library for lesson 20 of the tutorial I wrote a text parser that's fairly efficient. I needed a parsing routine that reported the number of lines the input text string would be parsed to and could deliver one line of text at a time on demand. This is what I came up with:

Code: (Select All)
FUNCTION ParseText$ (TextIn AS STRING, MaxWidth AS INTEGER, Action AS INTEGER) STATIC

    '-> Modifications added suggested by Pete 10/03/22. Corrects issue of crashing and mishandling of text
    '   in certain situations. (additions remarked in code below)
    '   - Function exit if text sent in is null
    '   - Handles non-breaking strings larger than space allocated
    '   - Handles end of text that has no trailing space
    '   - Clear text for next parsing event

    '-> Parses the string passed in into multiple lines of the maximum width desired
    '   The first time the function is called, regardless of action, the TextIn string is fully parsed
    '   Subsequent calls to the function with the same text will not parse the TextIn string again

    '-> INPUT PARAMETERS:
    '   TextIn   - the text string sent into the function
    '   MaxWidth - maximum width of text on a line
    '   Action   - 1 reports number of lines created, 0 returns lines of parsed text ("" = finished)

    '-> EXAMPLE:
    '   t$ = "The rain in Spain falls mainly on the plain. The weather in Spain seems pretty good!"
    '   Lines = VAL(ParseText$(t$, 40, 1)) '          report number of lines the text was parsed into
    '   DO '                                          get all parsed lines of text
    '       TextLine$ = ParseText$(t$, 40, 0) '       return the next line of parsed text
    '       IF TextLine$ <> "" THEN PRINT TextLine$ ' could easily be saved to an array as well
    '   LOOP UNTIL TextLine$ = "" '                   ParseText$ returns null when all lines returned

    '-> There is no need to have ParseText$ report the number of lines needed. A simple counter could be
    '   placed within the DO...LOOP as well. The report for the number of lines the text was parsed into will
    '   be returned as a string and need converted to a value VAL() as seen in the 2nd line of the example.

    DIM PText AS STRING '    previous text that was sent in
    DIM Index AS INTEGER '   array index counter
    DIM Plen AS INTEGER '    parse string length
    DIM Char AS STRING * 1 ' character analyzer
    DIM Parse AS STRING '    parsed string
    DIM WText AS STRING '    working text string
    DIM Done AS INTEGER '    flag to indicate parsing finished

    IF MaxWidth <= 0 THEN ParseText$ = "": EXIT FUNCTION '        (Pete) leave if null text sent in
    IF PText <> TextIn THEN '                                            was a new text string sent in?
        PText = TextIn '                                                 yes, remember text that was sent in
        WText = TextIn '                                                 get text sent in to work with
        Index = 0 '                                                      reset index counter
        Done = 0 '                                                       reset finished flag
        REDIM Text(0) AS STRING '                                        reset text array
    END IF
    IF NOT Done THEN '                                                   has parsing already been performed?
        DO '                                                             no, begin array loop
            Index = Index + 1 '                                          increment index counter
            REDIM _PRESERVE Text(Index) AS STRING '                      increase size of array

            ' (Pete) Non-breaking string larger than space alloted checked below.

            IF LEN(WText) > MaxWidth AND INSTR(MID$(WText, 1, MaxWidth + 1), " ") = 0 THEN ' (Pete)
                Plen = MaxWidth '                                 (Pete)  set length to maximum size
                Parse = MID$(WText, 1, Plen) '                    (Pete)  get the maximum size string allowed
            ELSE
                IF MID$(WText, MaxWidth + 1, 1) <> " " THEN '     (Pete) text with no trailing space?
                    Plen = MaxWidth '                             (Pete) yes, set length to remaining text
                ELSE '                                                   no, there is a a trailing space
                    Plen = MaxWidth + 1 '       (+1 for trailing spaces) set length to include space
                END IF
                DO '                                                     begin parse loop
                    IF LEN(WText) <= Plen THEN '                         remaining text all that is left?
                        Parse = MID$(WText, 1, Plen) '                   yes, get remaining text
                        Done = -1 '                                      parsing is done
                    ELSE '                                               no, text still longer than max width
                        IF INSTR(MID$(WText, 1, Plen), " ") = 0 THEN '   space found in text? (Pete)
                            Plen = Plen + 1 '                     (Pete) no, increment length
                        ELSE
                            DO '                                         begin space search loop
                                Char = MID$(WText, Plen, 1) '            get last character
                                IF Char <> " " THEN Plen = Plen - 1 '    if not a space then move back one
                            LOOP UNTIL Char = " " '                      leave when space found
                            Parse = LEFT$(WText, Plen - 1) '             get parsed string without space at end
                        END IF
                    END IF
                LOOP UNTIL Char = " " OR Done '                          leave when space found or parsing done
            END IF
            Text(Index) = Parse '                                        save the parsed text
            IF NOT Done THEN WText = MID$(WText, Plen + 1, LEN(WText)) ' remove parsed text from string
        LOOP UNTIL Done '                                                leave when parsing done
        Index = 0 '                                                      reset index counter for reporting
    END IF
    IF Action = 1 THEN '                                                 report number of lines?
        ParseText$ = STR$(UBOUND(Text)) '                                yes, return number of lines as a string
    ELSE '                                                               no, report parsed lines found
        Index = Index + 1 '                                              increment index counter
        IF Index > UBOUND(Text) THEN '                                   have all lines been reported?
            ParseText$ = "" '                                            yes, report nothing remaining
            PText = "" '                                          (Pete) clear for next parsing event
        ELSE '                                                           no, parsed text remains
            ParseText$ = Text(Index) '                                   report next line of text
        END IF
    END IF

END FUNCTION

Drop the function into your own code and parse away.
Reply
#2
Thank you for this. However, "Action" is not necessary and in fact produces confusion. To actually get lines must set it to zero, so why even need this parameter? For more serious work (eg. include into a simple word processor) check for hyphenated words as well as spaces to split lines.
Reply
#3
(10-03-2022, 12:49 AM)mnrvovrfc Wrote: Thank you for this. However, "Action" is not necessary and in fact produces confusion. To actually get lines must set it to zero, so why even need this parameter? For more serious work (eg. include into a simple word processor) check for hyphenated words as well as spaces to split lines.

Oh sure, the action parameter was added for my library needs. Strip the action code out if all you need is a simple parser with the ability to report one line at time during subsequent calls.

The library I'm writing has the need for number of lines to be reported. Pretty handy.

Here's an example use:

t$ = "big long crazy text string here. blah blah blah blah ... and blah"
FOR x=1 TO VAL(ParseText$(t$, 40, 1))
    Array(x).text = ParseText$(t$, 40, 0)
NEXT x

Or in my library's case I need to know the number of lines the text will be placed in to size ASCII text box windows appropriately.

Box(Index).height = VAL(ParseText$(t$, 40, 1)) + 4 ' top border + header text + header line delimiter + border bottom + lines of text
'Then, as the box is drawn lines of parsed text can be brought in one at time as needed

"as well as spaces to split lines"

It does do that.
Reply
#4
@TerryRitchie ,

did you had a look on this parsing function, giving the right parameters it's able to do the task, not only line parsing is possible and it will return the number of parsed components.
Reply
#5
I have one of these WP routines that gets rid of unwanted text characters in SCREEN 0.

I call it, Parsing My ASCII Off.bas

Pete

- There's always a party somewhere, and I went to way too many of them.
Reply
#6
This crashes the program if there are semicolons in the input string on my windows machine.
Reply
#7
(10-03-2022, 11:35 AM)James D Jarvis Wrote: This crashes the program if there are semicolons in the input string on my windows machine.

You've to be more specific what crashes, Terry's routine, my routine or Pete's? Well could imagine Pete's routine is supposed to crash, I'd bet my ass semicolon is an "unwanted" char for him on SCREEN 0 Big Grin
Reply
#8
(10-03-2022, 11:35 AM)James D Jarvis Wrote: This crashes the program if there are semicolons in the input string on my windows machine.

Yep, creating a text parser is hard, a lot harder than I originally thought it would be. Pete got hold of me last night to point out some improvements that can be made. I just got up. I'll make those changes and upload the results in a bit.
Reply
#9
(10-03-2022, 11:35 AM)James D Jarvis Wrote: This crashes the program if there are semicolons in the input string on my windows machine.
You'll have to post your example. I've tried something with at least two semicolons on Terry's function and it works just fine. I assume you mean Terry's function because it's why this thread was started. I scanned the source code on the first post and there doesn't seem to be a special check for a character other than CHR$(32) space.
Reply
#10
Ok, I studied Pete's suggestions and added his code to the routine. Much, much, better. Thanks Pete!

The code in the original post above has been updated.
Reply




Users browsing this thread: 4 Guest(s)