Testing for extended unicode?
#8
(03-24-2023, 04:43 PM)RhoSigma Wrote: If your goal is just to know if it's english, or better regular 7-bit ASCII (0-127) then you just need to identify the UTF-8 markers. Everything which is not a UTF-8 sequence is automatically pure ASCII.

I use such a check in the code which renders the Wiki help text in the IDE, it's basically as follows:
Code: (Select All)
'UTF-8 handling
text$ = "whatever you get from your input"
FOR currPos% = 1 TO LEN(text$)
    seq$ = MID$(text$, currPos%, 4) '   'get next 4 chars (becomes less 4 at the end of text$)
    seq$ = seq$ + SPACE$(4 - LEN(seq$)) 'fill missing chars with space (safety for ASC())
    IF (((ASC(seq$, 1) AND &HE0~%%) = 192) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) THEN
        '2-byte UTF-8
    ELSEIF (((ASC(seq$, 1) AND &HF0~%%) = 224) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) AND ((ASC(seq$, 3) AND &HC0~%%) = 128) THEN
        '3-byte UTF-8
    ELSEIF (((ASC(seq$, 1) AND &HF8~%%) = 240) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) AND ((ASC(seq$, 3) AND &HC0~%%) = 128) AND ((ASC(seq$, 4) AND &HC0~%%) = 128) THEN
        '4-byte UTF-8
    ELSE
        '1st char of seq$ = regular ASCII
    END IF
NEXT

PERFECT!!!!!!!!!!!!!!!!!    Thank you very much!!! That was exactly what I was looking for. I knew there had to be a way. I don't fully understand the coding example yet, I've never needed to use Hex before, or multi-byte text. But I understand then in principle, and kind of understand what you did. I'm sure with a little reading I will understand it completely. This will certainly get me there. Thank you again and I can't wait to dig into this and get it working. 

Maybe one of these days I'll dig into libraries...
Reply


Messages In This Thread
Testing for extended unicode? - by tothebin - 03-23-2023, 06:01 AM
RE: Testing for extended unicode? - by RhoSigma - 03-23-2023, 08:29 AM
RE: Testing for extended unicode? - by tothebin - 03-23-2023, 09:48 PM
RE: Testing for extended unicode? - by mnrvovrfc - 03-24-2023, 01:13 AM
RE: Testing for extended unicode? - by tothebin - 03-24-2023, 02:15 PM
RE: Testing for extended unicode? - by RhoSigma - 03-24-2023, 04:43 PM
RE: Testing for extended unicode? - by tothebin - 03-24-2023, 10:01 PM
RE: Testing for extended unicode? - by mnrvovrfc - 03-24-2023, 10:26 PM
RE: Testing for extended unicode? - by tothebin - 03-24-2023, 10:43 PM



Users browsing this thread: 6 Guest(s)