Login

tothebin · 03-24-2023, 10:01 PM

(03-24-2023, 04:43 PM)RhoSigma Wrote: If your goal is just to know if it's english, or better regular 7-bit ASCII (0-127) then you just need to identify the UTF-8 markers. Everything which is not a UTF-8 sequence is automatically pure ASCII.

I use such a check in the code which renders the Wiki help text in the IDE, it's basically as follows:

Code: (Select All)
'UTF-8 handling text$ = "whatever you get from your input" FOR currPos% = 1 TO LEN(text$) seq$ = MID$(text$, currPos%, 4) ' 'get next 4 chars (becomes less 4 at the end of text$) seq$ = seq$ + SPACE$(4 - LEN(seq$)) 'fill missing chars with space (safety for ASC()) IF (((ASC(seq$, 1) AND &HE0~%%) = 192) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) THEN '2-byte UTF-8 ELSEIF (((ASC(seq$, 1) AND &HF0~%%) = 224) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) AND ((ASC(seq$, 3) AND &HC0~%%) = 128) THEN '3-byte UTF-8 ELSEIF (((ASC(seq$, 1) AND &HF8~%%) = 240) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) AND ((ASC(seq$, 3) AND &HC0~%%) = 128) AND ((ASC(seq$, 4) AND &HC0~%%) = 128) THEN '4-byte UTF-8 ELSE '1st char of seq$ = regular ASCII END IF NEXT

PERFECT!!!!!!!!!!!!!!!!! Thank you very much!!! That was exactly what I was looking for. I knew there had to be a way. I don't fully understand the coding example yet, I've never needed to use Hex before, or multi-byte text. But I understand then in principle, and kind of understand what you did. I'm sure with a little reading I will understand it completely. This will certainly get me there. Thank you again and I can't wait to dig into this and get it working.

Maybe one of these days I'll dig into libraries...

Login
Username/Email:
Password:	Lost Password?
	Remember me