Posts: 2,700
Threads: 124
Joined: Apr 2022
Reputation:
134
(08-26-2022, 06:05 PM)SMcNeill Wrote: (08-26-2022, 05:59 PM)bplus Wrote: Wow BF makes that much a difference? Seems like something is wrong about Line?
BF is highly optimized and is *much* faster than line, just as DO is faster than FOR...
A FOR loop has to track Start, Stop, Step, counting up or down... A DO loop is just DO.. LOOP and the user has to deal with the exit conditions. There's a lot less code to process for a DO-LOOP than there is a FOR-Next...
Same way with a line vs a line, BF. A line has slope. You calculate rise/run, do a loop, plot the necessary pixels, increment to the next pixel...
BF is just:
FOR y = start to stop
memfill x, x.start, s.stop, kolor
NEXT
Care to guess which is going to be faster, once you think about the basic premise behind them?
BF is always quite a bit faster than without it.
Oh yes, got it! Thanks for explaining.
b = b + ...
Posts: 545
Threads: 116
Joined: Apr 2022
Reputation:
39
Excellent!
here's what I got on my machine. It varies form run to run. I'm assuming that's because of how offscreen writes are dealt with.
free online screenshot no download
Posts: 1,510
Threads: 53
Joined: Jul 2022
Reputation:
47
08-26-2022, 09:44 PM
(08-26-2022, 06:33 PM)James D Jarvis Wrote: Excellent!
here's what I got on my machine. It varies form run to run. I'm assuming that's because of how offscreen writes are dealt with.
free online screenshot no download Thank you for proposing the image-sharing site to me! I had to make this post to discover where to click to see the full image.
Posts: 70
Threads: 8
Joined: Apr 2022
Reputation:
6
I was curious about if drawing with _mem commands would improve performance. So, I used you test as inspiration for my own test.
I'm using Bresenham Circle drawing algorithm, just as you had, but I get very different results, from your implementation. Most slower, except while using 1 byte per pixel.
- Optimization flag is set.
- I tried using _SHL in place of multiplying by 2,4,8 and it slowed it down, which is surprising, but I guess the compiler is doing a better job of multiplying.
- I did not implement clipping in the MEM rountines just assuming it would just slow down the test.
- I've noticed that Line command seems to be faster at drawing a horizontal line then memfill with a unsigned long. (Suprising!)
Overall my times are pretty sad. I do have a slow computer.
- Bresenham normal - 12.68 sec
- Bresenham MEM 1bbp - .99 sec
- Bresenham MEM 4bbp - 16.09 sec
Can one of you guys point out my inefficiency? Perhaps I'm doing something unnesessary or dumb?
Code: (Select All) _TITLE "Fast Circle Test"
DIM AS LONG scrn
DIM AS LONG count
DIM AS SINGLE t0, t1
DIM AS STRING en
TYPE tRESULTS
AS SINGLE time
AS STRING test
END TYPE
DIM AS tRESULTS res(10)
scrn = _NEWIMAGE(800, 500, 256)
SCREEN scrn
CONST iterations = 640000
'____________________________________________________________________________________________________________________________________
res(0).test = "Bresenham Normal Test"
LOCATE 20, 1
PRINT res(0).test
INPUT "Press Enter to start ..."; en
count = 0
t0 = TIMER
DO
CircleBresenham INT(RND * 800), INT(RND * 500), 30, INT(RND * 255)
count = count + 1
LOOP WHILE count < iterations
t1 = TIMER
res(0).time = t1 - t0
'____________________________________________________________________________________________________________________________________
res(1).test = "Bresenham MEM 1bpp (no clip)"
LOCATE 20, 1
PRINT res(1).test
INPUT "Press Enter to start ..."; en
count = 0
t0 = TIMER
DIM AS _MEM scr
scr = _MEMIMAGE(scrn)
DO
CircleBresenham1bpp scr, 30 + INT(RND * 740), 30 + INT(RND * 440), 30, INT(RND * 255)
count = count + 1
LOOP WHILE count < iterations
_MEMFREE scr
t1 = TIMER
res(1).time = t1 - t0
'____________________________________________________________________________________________________________________________________
res(2).test = "Bresenham MEM 4bpp (no clip)"
LOCATE 20, 1
PRINT res(2).test
INPUT "Press Enter to start ..."; en
_TITLE "Fast Circle Test"
scrn = _NEWIMAGE(800, 500, 32)
SCREEN scrn
LOCATE 20, 1
count = 0
t0 = TIMER
scr = _MEMIMAGE(scrn)
DO
CircleBresenham4bpp scr, 30 + INT(RND * 740), 30 + INT(RND * 440), 30, _RGB32(INT(RND * 255), INT(RND * 255), INT(RND * 255))
count = count + 1
LOOP WHILE count < iterations
_MEMFREE scr
t1 = TIMER
res(2).time = t1 - t0
'____________________________________________________________________________________________________________________________________
PRINT "Circle count:"; iterations
FOR count = 0 TO 2
PRINT res(count).test; " Time:"; res(count).time
NEXT
'____________________________________________________________________________________________________________________________________
SUB CircleBresenham (xc AS LONG, yc AS LONG, r AS LONG, c AS LONG)
DIM AS LONG e, x, y, w
DIM AS LONG l0, l1
w = _WIDTH(0) * 4
x = r
y = 0
e = 0
$CHECKING:OFF
DO
l0 = x * 2
l1 = y * 2
LINE (xc - x, yc - y)-(xc - x + l0, yc - y), c
LINE (xc - x, yc + y)-(xc - x + l0, yc + y), c
LINE (xc - y, yc - x)-(xc - y + l1, yc - x), c
LINE (xc - y, yc + x)-(xc - y + l1, yc + x), c
IF x <= y THEN EXIT DO
e = e + y * 2 + 1
y = y + 1
IF e > x THEN
e = e + 1 - x * 2
x = x - 1
END IF
LOOP
$CHECKING:ON
END SUB
SUB CircleBresenham1bpp (scr AS _MEM, xc AS LONG, yc AS LONG, r AS LONG, c AS _UNSIGNED _BYTE)
DIM AS LONG e, x, y, w
DIM AS LONG xof0, xof1, xof2, xof3, l0, l1
DIM AS LONG yof0, yof1, yof2, yof3
DIM AS LONG xq0, yq0, xq1, yq1, xq2, yq2, xq3, yq3
w = _WIDTH(0)
x = r
y = 0
e = 0
$CHECKING:OFF
DO
l0 = x * 2
l1 = y * 2
xq0 = xc - x
yq0 = yc - y
xof0 = xq0
yof0 = yq0 * w
_MEMFILL scr, scr.OFFSET + xof0 + yof0, l0, c AS _UNSIGNED _BYTE
xq1 = xc - x
yq1 = yc + y
xof1 = xq1
yof1 = yq1 * w
_MEMFILL scr, scr.OFFSET + xof1 + yof1, l0, c AS _UNSIGNED _BYTE
xq2 = xc - y
yq2 = yc - x
xof2 = xq2
yof2 = yq2 * w
_MEMFILL scr, scr.OFFSET + xof2 + yof2, l1, c AS _UNSIGNED _BYTE
xq3 = xc - y
yq3 = yc + x
xof3 = xq3
yof3 = yq3 * w
_MEMFILL scr, scr.OFFSET + xof3 + yof3, l1, c AS _UNSIGNED _BYTE
IF x <= y THEN EXIT DO
e = e + y * 2 + 1
y = y + 1
IF e > x THEN
e = e + 1 - x * 2
x = x - 1
END IF
LOOP
$CHECKING:ON
END SUB
SUB CircleBresenham4bpp (scr AS _MEM, xc AS LONG, yc AS LONG, r AS LONG, c AS LONG)
DIM AS LONG e, x, y, w
DIM AS LONG xof0, xof1, xof2, xof3, l0, l1
DIM AS LONG yof0, yof1, yof2, yof3
DIM AS LONG xq0, yq0, xq1, yq1, xq2, yq2, xq3, yq3
w = _WIDTH(0) * 4
x = r
y = 0
e = 0
$CHECKING:OFF
DO
l0 = x * 8
l1 = y * 8
xq0 = xc - x
yq0 = yc - y
xof0 = xq0 * 4
yof0 = yq0 * w
_MEMFILL scr, scr.OFFSET + xof0 + yof0, l0, c AS _UNSIGNED LONG
xq1 = xc - x
yq1 = yc + y
xof1 = xq1 * 4
yof1 = yq1 * w
_MEMFILL scr, scr.OFFSET + xof1 + yof1, l0, c AS _UNSIGNED LONG
xq2 = xc - y
yq2 = yc - x
xof2 = xq2 * 4
yof2 = yq2 * w
_MEMFILL scr, scr.OFFSET + xof2 + yof2, l1, c AS _UNSIGNED LONG
xq3 = xc - y
yq3 = yc + x
xof3 = xq3 * 4
yof3 = yq3 * w
_MEMFILL scr, scr.OFFSET + xof3 + yof3, l1, c AS _UNSIGNED LONG
IF x <= y THEN EXIT DO
e = e + y * 2 + 1
y = y + 1
IF e > x THEN
e = e + 1 - x * 2
x = x - 1
END IF
LOOP
$CHECKING:ON
END SUB
Posts: 1,507
Threads: 160
Joined: Apr 2022
Reputation:
116
For starters, why calculate the same values multiple times in your loop?
xof1 = xof0
xpf3 = xpf2
There's several math operations removed completely and easily from the loop.
Posts: 70
Threads: 8
Joined: Apr 2022
Reputation:
6
(08-28-2022, 03:32 AM)SMcNeill Wrote: For starters, why calculate the same values multiple times in your loop?
xof1 = xof0
xpf3 = xpf2
There's several math operations removed completely and easily from the loop.
Yea that was an artifact of code used for 4 byte per pixel code. It's actually the fastest version of the circle routine, so I didn't go back optimize it.
Posts: 1,507
Threads: 160
Joined: Apr 2022
Reputation:
116
One big point of efficiency is to swap to _MEMPUT over _MEMFILL. It's hard to beat a simple _MEMPUT when it comes to working with shoving values into memory.
Code: (Select All) _Title "Fast Circle Test"
Dim As Long scrn
Dim As Long count
Dim As Single t0, t1
Dim As String en
Type tRESULTS
As Single time
As String test
End Type
Dim As tRESULTS res(10)
scrn = _NewImage(800, 500, 256)
Screen scrn
Const iterations = 640000
'____________________________________________________________________________________________________________________________________
res(0).test = "Bresenham Normal Test"
Locate 20, 1
Print res(0).test
Input "Press Enter to start ..."; en
count = 0
t0 = Timer
Do
' CircleBresenham Int(Rnd * 800), Int(Rnd * 500), 30, Int(Rnd * 255)
count = count + 1
Loop While count < iterations
t1 = Timer
res(0).time = t1 - t0
'____________________________________________________________________________________________________________________________________
res(1).test = "Bresenham MEM 1bpp (no clip)"
Locate 20, 1
Print res(1).test
Input "Press Enter to start ..."; en
count = 0
t0 = Timer
Dim As _MEM scr
scr = _MemImage(scrn)
Do
CircleBresenham1bpp scr, 30 + Int(Rnd * 740), 30 + Int(Rnd * 440), 30, Int(Rnd * 255)
count = count + 1
Loop While count < iterations
_MemFree scr
t1 = Timer
res(1).time = t1 - t0
'____________________________________________________________________________________________________________________________________
res(2).test = "Bresenham MEM 4bpp (no clip)"
Locate 20, 1
Print res(2).test
Input "Press Enter to start ..."; en
_Title "Fast Circle Test"
scrn = _NewImage(800, 500, 32)
Screen scrn
Locate 20, 1
count = 0
t0 = Timer
scr = _MemImage(scrn)
Do
CircleBresenham4bpp scr, 30 + Int(Rnd * 740), 30 + Int(Rnd * 440), 30, _RGB32(Int(Rnd * 255), Int(Rnd * 255), Int(Rnd * 255))
count = count + 1
Loop While count < iterations
_MemFree scr
t1 = Timer
res(2).time = t1 - t0
'____________________________________________________________________________________________________________________________________
Print "Circle count:"; iterations
For count = 0 To 2
Print res(count).test; " Time:"; res(count).time
Next
'____________________________________________________________________________________________________________________________________
Sub CircleBresenham (xc As Long, yc As Long, r As Long, c As Long)
Dim As Long e, x, y, w
Dim As Long l0, l1
w = _Width(0) * 4
x = r
y = 0
e = 0
$Checking:Off
Do
l0 = x * 2
l1 = y * 2
Line (xc - x, yc - y)-(xc - x + l0, yc - y), c
Line (xc - x, yc + y)-(xc - x + l0, yc + y), c
Line (xc - y, yc - x)-(xc - y + l1, yc - x), c
Line (xc - y, yc + x)-(xc - y + l1, yc + x), c
If x <= y Then Exit Do
e = e + y * 2 + 1
y = y + 1
If e > x Then
e = e + 1 - x * 2
x = x - 1
End If
Loop
$Checking:On
End Sub
Sub CircleBresenham1bpp (scr As _MEM, xc As Long, yc As Long, r As Long, c As _Unsigned _Byte)
Dim As Long e, x, y, w
Dim As Long xof0, xof1, xof2, xof3, l0, l1
Dim As Long yof0, yof1, yof2, yof3
Dim As Long xq0, yq0, xq1, yq1, xq2, yq2, xq3, yq3
w = _Width(0)
x = r
y = 0
e = 0
$Checking:Off
Do
l0 = x * 2
l1 = y * 2
xq0 = xc - x
yq0 = yc - y
xof0 = xq0
yof0 = yq0 * w
_MemFill scr, scr.OFFSET + xof0 + yof0, l0, c As _UNSIGNED _BYTE
xq1 = xc - x
yq1 = yc + y
xof1 = xq1
yof1 = yq1 * w
_MemFill scr, scr.OFFSET + xof1 + yof1, l0, c As _UNSIGNED _BYTE
xq2 = xc - y
yq2 = yc - x
xof2 = xq2
yof2 = yq2 * w
_MemFill scr, scr.OFFSET + xof2 + yof2, l1, c As _UNSIGNED _BYTE
xq3 = xc - y
yq3 = yc + x
xof3 = xq3
yof3 = yq3 * w
_MemFill scr, scr.OFFSET + xof3 + yof3, l1, c As _UNSIGNED _BYTE
If x <= y Then Exit Do
e = e + y * 2 + 1
y = y + 1
If e > x Then
e = e + 1 - x * 2
x = x - 1
End If
Loop
$Checking:On
End Sub
Sub CircleBresenham4bpp (scr As _MEM, xc As Long, yc As Long, r As Long, c As _Unsigned Long)
Dim As _Offset e, x, y, w
Dim As _Offset xof0, xof1, xof2, xof3, l0, l1
Dim As _Offset yof0, yof1, yof2, yof3
Dim As _Offset xq0, yq0, xq1, yq1, xq2, yq2, xq3, yq3
w = _Width(0) * 4
x = r
y = 0
e = 0
$Checking:Off
'start time of 7.03 seconds
'by swapping to memput, the time is now 1.9 seconds.
Dim As _Offset start, finish
Do
l0 = x * 8
l1 = y * 8
xq0 = scr.OFFSET + (xc - x) * 4
yq0 = (yc - y) * w
' _MemFill scr, xq0 + yq0, l0, c
start = xq0 + yq0
finish = start + l0
Do
_MemPut scr, start, c
start = start + 4
Loop Until start > finish
yq1 = (yc + y) * w
'_MemFill scr, xq0 + yq1, l0, c
start = xq0 + yq1
finish = start + l0
Do
_MemPut scr, start, c
start = start + 4
Loop Until start > finish
xq2 = scr.OFFSET + (xc - y) * 4
yq2 = (yc - x) * w
'_MemFill scr, xq2 + yq2, l1, c
start = xq2 + yq2
finish = start + l1
Do
_MemPut scr, start, c
start = start + 4
Loop Until start > finish
yq3 = (yc + x) * w
'_MemFill scr, xq2 + yq3, l1, c
start = xq2 + yq3
finish = start + l1
Do
_MemPut scr, start, c
start = start + 4
Loop Until start > finish
If x <= y Then Exit Do
e = e + y + y + 1
y = y + 1
If e > x Then
e = e + 1 - x - x
x = x - 1
End If
Loop
$Checking:On
End Sub
This went from 7 seconds down to less than 2 seconds on my PC. All you'd need to do is comment out the _MEMPUT and uncomment the _MEMFILL statements, and you can see the difference at play.
I think you'd really need to optimize the math itself to reduce times much more. For example, instead of counting x = x + 1, count x = x + 4 (move by 4 bytes instead of 1 pixel coordinate). Same with y. Instead of y = y + 1, y = y + w. (move a row of 4 byte pixels instead of by a coordinate) Then you can get rid of the * 4 and * w operators, simplifying the number of processes which your loop has to make before finishing.
But your biggest change is going to be _MEMPUT over _MEMFILL (a 300%+ speed improvement!).
Posts: 70
Threads: 8
Joined: Apr 2022
Reputation:
6
Thank you!
That is very interesting and puzzling. I guess I assumed then _memfill would be more efficient to fill an area of memory. It seems counter intuitive that it would be slower that hand rolled routine using _memput.
You are correct my math needed cleaning up. It seems that there were a lot simple optimizations that could of been done to make it faster. I would of thought bit shifting would of speed up my multiplying by powers of 2, but it appears to be slower that just multiplying.
Again thanks for your help!
|