03-24-2023, 04:43 PM
If your goal is just to know if it's english, or better regular 7-bit ASCII (0-127) then you just need to identify the UTF-8 markers. Everything which is not a UTF-8 sequence is automatically pure ASCII.
I use such a check in the code which renders the Wiki help text in the IDE, it's basically as follows:
I use such a check in the code which renders the Wiki help text in the IDE, it's basically as follows:
Code: (Select All)
'UTF-8 handling
text$ = "whatever you get from your input"
FOR currPos% = 1 TO LEN(text$)
seq$ = MID$(text$, currPos%, 4) ' 'get next 4 chars (becomes less 4 at the end of text$)
seq$ = seq$ + SPACE$(4 - LEN(seq$)) 'fill missing chars with space (safety for ASC())
IF (((ASC(seq$, 1) AND &HE0~%%) = 192) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) THEN
'2-byte UTF-8
ELSEIF (((ASC(seq$, 1) AND &HF0~%%) = 224) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) AND ((ASC(seq$, 3) AND &HC0~%%) = 128) THEN
'3-byte UTF-8
ELSEIF (((ASC(seq$, 1) AND &HF8~%%) = 240) AND ((ASC(seq$, 2) AND &HC0~%%) = 128) AND ((ASC(seq$, 3) AND &HC0~%%) = 128) AND ((ASC(seq$, 4) AND &HC0~%%) = 128) THEN
'4-byte UTF-8
ELSE
'1st char of seq$ = regular ASCII
END IF
NEXT
GuiTools, Blankers & other Projects:
https://staging.qb64phoenix.com/forumdisplay.php?fid=32
Libraries & useful Functions:
https://staging.qb64phoenix.com/forumdisplay.php?fid=23
https://staging.qb64phoenix.com/forumdisplay.php?fid=32
Libraries & useful Functions:
https://staging.qb64phoenix.com/forumdisplay.php?fid=23