03-24-2023, 10:01 PM
(03-24-2023, 04:43 PM)RhoSigma Wrote: If your goal is just to know if it's english, or better regular 7-bit ASCII (0-127) then you just need to identify the UTF-8 markers. Everything which is not a UTF-8 sequence is automatically pure ASCII.
I use such a check in the code which renders the Wiki help text in the IDE, it's basically as follows:
Code: (Select All)'UTF-8 handling
text$ = "whatever you get from your input"
FOR currPos% = 1 TO LEN(text$)
seq$ = MID$(text$, currPos%, 4) ' 'get next 4 chars (becomes less 4 at the end of text$)
seq$ = seq$ + SPACE$(4 - LEN(seq$)) 'fill missing chars with space (safety for ASC())
IF (((ASC(seq$, 1) AND &HE0~%%) = 192) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) THEN
'2-byte UTF-8
ELSEIF (((ASC(seq$, 1) AND &HF0~%%) = 224) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) AND ((ASC(seq$, 3) AND &HC0~%%) = 128) THEN
'3-byte UTF-8
ELSEIF (((ASC(seq$, 1) AND &HF8~%%) = 240) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) AND ((ASC(seq$, 3) AND &HC0~%%) = 128) AND ((ASC(seq$, 4) AND &HC0~%%) = 128) THEN
'4-byte UTF-8
ELSE
'1st char of seq$ = regular ASCII
END IF
NEXT
PERFECT!!!!!!!!!!!!!!!!! Thank you very much!!! That was exactly what I was looking for. I knew there had to be a way. I don't fully understand the coding example yet, I've never needed to use Hex before, or multi-byte text. But I understand then in principle, and kind of understand what you did. I'm sure with a little reading I will understand it completely. This will certainly get me there. Thank you again and I can't wait to dig into this and get it working.
Maybe one of these days I'll dig into libraries...