A faster way to scale images?
#1
Recently, I added a level editor to my game project.  In this level editor, I included a "playhead" tool, similar to what you would see in a video or audio editing program - so when you hit "play level" from the editor, with the playhead set, it would invisibly simulate the level up to that point as fast as possible, then you'd start playing from the playhead.  This is a very useful thing in my level design process, since it means I don't have to watch minutes of level go by while I'm tweaking a later part.

However, I noticed the simulation was taking almost as long as actually playing the level up to where the playhead was.  I ran a visible frame counter during the simulation and discovered it was only about 30% faster.

At the time I chalked this up to poor design on my part, maybe some inefficiencies in my code, and looked into it a little bit, but didn't find anything conclusive.  To boost the simulation speed a bit, I copied the main gameplay loop and stripped it down to bare essentials, only what the simulation would need.  To my surprise, it was now running EVEN SLOWER than before.

So I started commenting out pieces of the simulation loop, to see how it affected the time.  Nothing made a difference - the position updates, the level script check, collision detection, background scrolling, very mysterious.  So I took the loop, with everything but the actual frame counter commented out, and still, no difference, it was running just as slowly.  Good news - those chunks of code are actually super fast!  But how could the frame counter be causing so much slowdown that it was running under 60 loops per second?  It's just a PRINT statement with a variable plugged in.

Well, when you've eliminated every other possibility, whatever remains, however unlikely, must be true.  Below the PRINT statement was a SUB call: display_screen.  Here it is:


Code: (Select All)
sub display_screen
putimage(0, 0)-((screenw * option_window_size) - 1, (screenh * option_window_size) - 1), full_screen, scaled_screen(option_window_size), (0, 0)-(screenw - 1, screenh - 1)
display
end sub


I had recently added a screen scaling option to the option menu.  Players can choose to run the game at x1, x2, or x3; since it's a pixelart game, the window size options are doubling or tripling the pixels.  The x1 resolution is 640x360 (16:9 ratio), so x2 is 1280x720 and x3 is 1920x1080.

So it's that PUTIMAGE statement; more specifically, the scaling.  Apparently using PUTIMAGE, in any situation where the output size and input size are different, is a massive resource hog.  I was running the playhead 1643 frames into the level in each case... at x2 scaling, this took 37.14 seconds, which is about 44 frames per second.  Running x1 scaling took 4.56 seconds, or 360 fps.  Removing the multiplier from the PUTIMAGE statement, curiously, took 7.8 seconds, or 210 fps.  And finally, commenting out the display_screen call entirely (including the DISPLAY statement) caused it to take 1.42 seconds, which is 1157 fps.

I have some workarounds in mind, such as pre-processing all source images into three size versions, and toggling between them based on the player's chosen size option.  But that's a lot of work, so first I have to know, is there a way to speed this up without such a huge overhaul to the code?  I've seen games do this exact kind of window size option before, although they weren't made in QB64.
Reply
#2
One thing you need to determine to begin with is to decide if scaling is the bottleneck in your performance, or if it's just a case of the larger image sizes.  Keep in mind that an image 2x the size uses 4x the memory and processing power.  An image 3x the original size used 9x the memory and processing power.

For example, let's take a simple 10x10 pixel image and scale it by a factor of 3.  That's now a 30x30 pixel image, for a total of 900 pixels to process, compared to the 100 the original picture had.

If you're certain the scaling is the issue and that's what's bottlenecking your performance, then simply prescale your images.

INPUT "How much do you want to scale  your image?  "; scale%
scaledImage = _NEWIMAGE(_width(originalImage)* scale%, _height(originalImage) * scale%, 32)
_PUTIMAGE ,originalImage, scaledimage

^Now you have the image prescaled and can just _PUTIMAGE the contents to the screen without having to constantly resize on the fly.  Smile
Reply
#3
Thanks for your input.  I think we might be getting somewhere... maybe... I got some more ideas.

To be clear, I haven't been constantly resizing every little thing on the fly, I've been drawing all elements at source size, then once the screen is fully assembled, resizing that once to produce the final result, so one resizing per game state update.  Prescaling doesn't just mean resizing some source image ahead of time, it means updating how every single piece is drawn, from the ground up, and adjusting the source and destination coordinates everywhere flexibly depending on the player's window size option.

I had already drawn up the changes I would need to make to pre-scale everything, I'll provide the checklist here so you can get an idea of the work involved and why I wanted to scout a simpler solution first:

Quote:- Replace every PUTIMAGE with a new subroutine call that automatically scales the coordinates provided.  I can't just scale up the base images to x2 and x3 versions, I have to be able to scale up the coordinates involved too.

- Split the sprite sheet image handle references into three branches, and update every usage of them in the code.  Object specs have attached sprite references.

- The sprite sheet parser routine needs to be passed three image handles instead of one, and scale up the first one to produce the other two.  This is the one easy part.

- Manual draws need to be branched to several versions, to account for the scaling option.

- Slanted resource meter slicing needs to be branched as well, so the slicing will be accurate at all scalings.

- Font routine library may need to be imported and branched to account for scaling - on font load, scale up the source files as done with the sprite sheets, and update every call accordingly.  This one may be the biggest headache; I don't use the font support QB64 provides, because the way I'm using "fonts" is rather pixelart-drawn words, with shading, coloring, etc.


I ran ten tests based on your suggestion.  The first five tests, I took a 640x360 image and drew it 100 times to the screen, scaled up to 1280x720.  In the latter five tests, I took a prescaled 1280x720 image, and drew it 100 times to the screen without any scaling.  Here are the results in seconds:

1.92578125
1.8671875
1.8125
1.8671875
1.8125
0.60546875
0.55078125
0.546875
0.55078125
0.60546875

So yes, scaling is increasing the expense of the operation by about a factor of 3.

I wonder, is there a command that will let me resize the program window, like a user would with $RESIZE turned on, but to a new size I determine?  Allowing user resize would not be a good idea here, as people who have worked with pixelart can confirm - it makes things look very jagged and distorted, and switching to $RESIZE:SMOOTH does not help.  So the window size option in my game requires strict singling, doubling, or tripling of pixels.  With smart choice of base resolution, the resulting options will yield at least one favorable size for anyone's hardware - typically people would probably choose the x2 option, or 1280x720, but 640x360 will work if someone's on a small notebook, and x3 would fit people with enormous monitors.  Adding a x4 would take very little extra work, too.

So I ran another set of ten tests, this time all of them drawing 640x360 without any scaling.  I added $RESIZE:STRETCH to the header, and after five of the tests, I used my mouse to resize the window to about double width and height.  There was no significant difference, as you can see:

0.21875
0.1640625
0.109375
0.1640625
0.11328125
0.11328125
0.109375
0.1640625
0.109375
0.109375

So far, I've been unable to find commands to set the window stretch size in QB64, but as $RESIZE is capable of letting the user do it themselves, I'm sure even if it doesn't exist, it could exist?  If it does, it's a silver bullet.  The whole scaling thing gets boiled down to a single line of code, and rather than adding more data management to make it faster, I can remove the stuff I already put in, and it gets even faster for free.
Reply
#4
The resizing that you're speaking about is all tied in with windows DPI Awareness.  Back in September, before my heart surgery, I was looking into fixing it to do like you're suggesting -- add a _WindowScale percentage and then have the window scale to that amount.  What I had back then worked, kinda, but it wasn't stable and couldn't be called inside your program without taking risks with it blowing up on you.  (The issue then was the same as what some people reported with _screenmove _middle and several other commands --  our QB64 programs tend to run on a couple of different threads, with one handling a lot of the glut stuff and the other handling the rest of our stuff.  The problem which we kept running into was basically a race condition which depended on when a glut command was called, the state of the other thread, and which one finished first.)

Since then, Matt has overhauled the glut system we have, making it a lot more independent and streamlined, stopping those race conditions from popping up, but nobody has gotten around to working on the DPI Awareness issue and being able to toggle between automatic and manual scaling (which is what you're wanting to do, and which can't currently be done since we're letting the OS handle scaling automatically).  It's an issue we're aware of, and we more or less have a good idea of what all pieces are necessary to have scaling work properly. (For Windows, at least.  Mac and Linux will take a little more digging into.)  At this point, it's just a matter of someone finding time and motivation to go in and make the changes to the language to add a toggleable automatic/manual scaling system for us -- I just can't say when that might happen.  a740g is working on fixing _LOADIMAGE to work with images in memory and not just from your drive/filesystem.  (This means something like iamgehandle = _LOADIMAGE("www.mywebsite.png",32) will become possible in the near future.)  Matt is working on all sorts of stuff (tons of bug fixes anytime they're reported, and the https support, and correcting our keyboard input routines, and...).  As for me, I've just been hibernating mainly.  As anyone who's been around the project for a while knows, I've got arthritis and the cold hurts my knuckles.  I don't do much coding, writing, or anything much except hibernate in the wintertime...

...So manual scaling is on the ToDo list, but nobody is currently working on it due to other stuff that they've just prioritized first.  Honestly, I'd imagine it to be in the 6+ month timeline before it gets all sorted and added to the language, so you'll probably want to look for a workaround rather than wait for it to be included in a package from us.

Question:  Can you transform the images over to hardware images instead of software images?  They render a TON faster, and your 3x delay probably wouldn't be of any concern with them.  See my demo of the speed differences which we're talking about here: https://staging.qb64phoenix.com/showthread.php?tid=1399.  Perhaps they might be the simplest solution to the problem which you're facing.  Wink
Reply
#5
Well, it was a very stressful week, but I finally found the time and mental resources to finish
tackling hardware images and implement in my code.  I had been hesitant, since it seemed like I
would have to overhaul a bunch of things to make it work, but after some more study and testing,
I ended up only needing to add five lines of code and change one existing line:

- At start of program, "displayorder hardware"
- At start of level editor, which does not use the same display method, "displayorder software hardware"
- At start of play level routine, which can be run from level editor, "displayorder hardware"
- In display_screen routine, "hardware_image = copyimage(full_screen, 33)"
- In display_screen's putimage statement, replace full_screen with hardware_image
- At end of display_screen routine, "freeimage hardware_image"

Rather than completely rework every image to be processed as hardware, for now it's enough to
simply offload the final resize operation to a hardware draw, since it's by far the most expensive step.

I ran tests using the frame advance playhead feature of my level editor, putting the playhead
at the end of the stage, about 12,000 frames in, which is about three and a half minutes of level.
With the stretching being done at software level, it took 260.9336 seconds, which is LONGER than the
level should take to play normally.  Using hardware just for the stretch, as described above,
it took 19.78125 seconds, which is a drop by 92.42%.

(If I didn't do any drawing at all during the simulation process, only at the end to check the overall
time, it took 8.628906 seconds, so while not really relevant to the hardware image question, it is
what I'll want to do when actually using the level editor.  I wonder if I can get that to be even faster,
9 seconds is a while to wait every time I want to check subtle changes I make to the script!)

I did run into some confusion, for a while I was having to hunt down an "invalid handle" error.
This was very confusing, since all my investigation confirmed my image handles were not invalid
anywhere.  It turns out that when using a hardware image as the source handle, PUTIMAGE really does
not want there to be a destination handle at all.  I think this should be returning a "syntax error"
or something, not "invalid handle," since the latter makes it seem like one of the handles does not
refer correctly to an image surface.  It also introduces a limitation, since it means the source
coordinates can't be used, meaning hardware images can only be drawn in entirety, not pieces of them
at a time.  That makes them either extremely awkward or totally useless for sprite sheets.

Under some circumstances, at the end of the simulation, the screen would go black at a SLEEP statement.
Still not sure what caused this, it's something for me to keep in mind going forward in case it causes
any issues, though I don't normally use SLEEP unless I'm running a test on something.

Clearing up the massive performance drain of a software screenwide image stretch on every frame,
it became possible for me to see perceptible differences in the expense of other elements of the
simulation loop, which gives me useful information on where performance can be improved.  There are
things I haven't gotten around to yet, such as storing unit vector information, in order to reduce
trigonometric operations in favor of much less expensive linear interpolation.


Code: (Select All)
Performance test results of simulation loop:
- Position updates 17.25% (where most trigonometry happens)
- Script execution 21.68%
- Collision handling 31.87% (pairwise, including a sort and exclusion based on object properties)
- Object termination check 17.88% (this seems much higher than it should be)


This doesn't land on 100%, because these elements interact with each other, so commenting out one
influences how much the others have to do.  The simulation does not include handling player input
nor player-specific collision interactions.  But these are good starting ballpark figures.



Thanks very much for your help in pointing me toward hardware images, while it's not a catch-all
solution for other things, it did help tremendously in this case!
Reply
#6
(02-13-2023, 12:10 PM)johannhowitzer Wrote: It also introduces a limitation, since it means the source coordinates can't be used, meaning hardware images can only be drawn in entirety, not pieces of them
at a time.

You can draw pieces of a hardware image onto the display.  Let me give a very simple demo:

Code: (Select All)
SCREEN _NEWIMAGE(800, 600, 32)
RANDOMIZE TIMER
FOR i = 1 TO 100
    LINE (RND * _WIDTH, RND * _HEIGHT)-(RND * _WIDTH, RND * _HEIGHT), _RGB32(RND * 256, RND * 256, RND * 256), BF
NEXT

hw = _COPYIMAGE(0, 33)
CLS , 0

x = RND * _WIDTH
y = RND * _HEIGHT
xMove = RND * 3 - 2
yMove = RND * 3 - 2

xsize = 400
ysize = 300
sizechange = 1
_DISPLAYORDER _HARDWARE
DO
    _PUTIMAGE (x, y)-(xsize, ysize), hw, , (x, y)-(xsize, ysize)
    _DISPLAY
    _LIMIT 120
    x = x + xMove
    y = y + yMove
    xsize = xsize + sizechange
    ysize = ysize + sizechange
    IF x < 0 OR x > _WIDTH THEN xMove = -xMove
    IF y < 0 OR y > _HEIGHT THEN yMove = -yMove
    IF ysize < 0 OR ysize > _WIDTH THEN sizechange = -sizechange
LOOP UNTIL _KEYHIT
Reply
#7
Oh, you can just omit the destination handle and still put in the source coordinates, thanks, I had no idea  Big Grin
Reply
#8
Okay, something is greatly confusing me here, very strange behavior since I've started using
hardware images.  I'm getting black screens a lot where it seems like I shouldn't, and PUTIMAGE
correctly placed the hardware image on the image surface set by SCREEN, not the one set by DEST
afterward.

After the header, I have this:

Code: (Select All)
screen scaled_screen(option_window_size)
dest full_screen

if debug_mode = true then
  set_font f_kharon, left_align, full_screen
  glass_fonts "Any key to start level editor, N to start game normally", 50, 50
  display_screen
  do
      limit 60
      k$ = inkey$
  loop while k$ = ""
  if lcase$(k$) <> "n" then edit_stage
end if


So that's a SCREEN statement, DEST after, then in the conditional, a display_screen call.
Now here's display_screen:

Code: (Select All)
sub display_screen

preserve& = dest
dest scaled_screen(option_window_size)
print preserve&, dest: display: sleep

hardware_image = copyimage(full_screen, 33)
putimage(0, 0)-((screenw * option_window_size) - 1, (screenh * option_window_size) - 1), hardware_image
display
freeimage hardware_image

dest preserve&

end sub


That PRINT statement is me trying to see the values of those two handles.  So before the call, SCREEN
is set to scaled_screen(option_window_size), and then in the routine, DEST is set to the same.
Then a PRINT statement is used, DISPLAY, then SLEEP.

I get a black screen.

To make things even more mysterious, if I change display_screen to this, eliminating four lines:

Code: (Select All)
sub display_screen

hardware_image = copyimage(full_screen, 33)
putimage(0, 0)-((screenw * option_window_size) - 1, (screenh * option_window_size) - 1), hardware_image
display
freeimage hardware_image

end sub


...Then PUTIMAGE *still* draws visibly, which means it's drawing to scaled_screen(option_window_size),
despite DEST having been set to full_screen!  This version of display_screen *should* result in a black
screen, but isn't.  It seems like PUTIMAGE is using the current graphics surface used by SCREEN, rather
than paying any attention to DEST.  But PRINT also seems to be ignoring DEST, but also not drawing
to the SCREEN surface.  I have no idea where that's drawing.  SLEEP isn't the culprit, when I replace it
with a DO: DISPLAY: LOOP, I get the same result.

What's going on here?
Reply
#9
What's your _DISPLAYORDER?  If you don't have a _SOFTWARE in the order somewhere, you'll get black screens where nothing gets processed.
Reply
#10
I tried some more tests, commenting various things out, moving stuff around, adding in extra calls,
and the mystery only gets weirder.

- When I added SOFTWARE to the DISPLAYORDER at the start and commented out the others - or just
commented out every DISPLAYORDER and went with default - I got everything to show up, but it was
twice as slow.

- When I commented out the running frame counter display during the simulation, the ending result
display afterward flickered once and went black.

- When I added a DISPLAY statement before the simulation, the ending result stayed visible.
I also noticed that without it, the window size did not change after leaving the editor (which
operates at 1:1), until the simulation was done.  It seems SCREEN does not resize the window
by itself, if you are not in AUTODISPLAY; you have to use a DISPLAY statement to get it to realize
the window size should change.  I understand how this would update the window size, but still have
no idea why this prevents the flicker, or rather, why neglecting it causes the flicker in the
first place!


And still no clue why PUTIMAGE is sending to the SCREEN handle and ignoring the subsequent DEST.
Do hardware images just ignore DEST and place straight to whatever's SCREEN at the time?


Since adding hardware images, I had not run through other parts of the program, just the level
editor and the frame counter on simulation.  Upon trying to start the game from the title screen,
there were several other sticking points where I ran into black screens when it seemed like I
shouldn't.  Putting a DISPLAY in a few spots did seem to smooth them over, but now I'm going to
have to go through everything and comb for black screens and figure out how to prevent them.
Hardware images apparently behave very strangely with regard to DISPLAY, SCREEN, and DEST.
It will take me some time to test things, and figure out how the behavior works in detail, enough
to use hardware images confidently.  There are some really unusual quirks.
Reply




Users browsing this thread: 3 Guest(s)