Comparison QB64 compiled with gcc optimizations and without
#1
I have started to test the version of qb64 compiled with the option -Ofast and the original version. resultas :

simple code using pset.

2.3x seconds : program compiled with qb64 -Ofast
3.5x seconds : program compiled with original qb64

Interesting results. Quite an important gain.

Code: (Select All)
x% = 1280: y% = 768
Screen _NewImage(x%, y%, 32)
_Delay 0.2
_ScreenMove _Middle
start = Timer(.001)
For b% = 1 To 100
    For i% = 1 To x%
        For n% = 1 To y%
            PSet (i%, n%), _RGB(Rnd * 255, 0, 0)
        Next n%
    Next i%
Next b%
Print Timer(.001) - start; "seconds"

the _ScreenMove _Middle command is supposed to center the window but if I don't put _Delay 0.2 before, sometimes it doesn't work. strange.
Reply
#2
Fractal Tree : I got this code from the site rosettacode.

4.1x seconds : program compiled with qb64 -Ofast
6.1x seconds : program compiled with original qb64

it seems that there is an interesting gain when graphic commands like pset or line are used.

Code: (Select All)
_Title "Fractal Tree"
Const sw% = 640
Const sh% = 480

Screen _NewImage(sw, sh, 8)
_Delay 0.2
_ScreenMove _Middle

start = Timer(.001)
For i% = 1 To 1000
    Color (Rnd * 15)
    Call tree(sw \ 2, sh - 10, _Pi * 1.5, _Pi / 180 * 29, 112, 15)
Next i%
Print Timer(.001) - start; "seconds"
Sleep
System

Sub tree (x As Integer, y As Integer, initAngle As Double, theta As Double, length As Double, depth As Integer)
    Dim As Integer iL, newX, newY, iX, iY, iD
    iL = length: iX = x: iY = y: iD = depth
    newX = Cos(initAngle) * length + iX
    newY = Sin(initAngle) * length + iY
    Line (iX, iY)-(newX, newY)
    iL = length * .78
    iD = iD - 1
    If iD > 0 Then
        Call tree(newX, newY, initAngle - theta, theta, iL, iD)
        Call tree(newX, newY, initAngle + theta, theta, iL, iD)
    End If
End Sub
Reply
#3
Bucketsort : I got this code from the site rosettacode.

3.4x seconds : program compiled with qb64 -Ofast
3.4x seconds : program compiled with original qb64

strangely, disappointing. almost no gain.

Code: (Select All)
'* Complexity Class: O(N^2)
Type MINMaxRec
    min As Long
    max As Long
End Type

ReDim a(0 To 1048575) As Double
For FillArray& = LBound(a) To UBound(a)
    a(FillArray&) = Rnd
Next
DoRecurse% = -1
DemoOrder& = 1 '* -1 = descending
Print "start...": Print
start = Timer(.001)
BucketSort a(), LBound(a), UBound(a), DemoOrder&, DoRecurse% '* without the recursive initial call, executiom time is FAR slower.
Print Timer(.001) - start; "seconds"

Sub BucketSort (Array() As Double, start As Long, finish As Long, order&, recurse%)
    Dim BS_Local_NBuckets As Integer
    Dim BS_Local_ArrayRange As Double
    Dim BS_Local_N As Long
    Dim BS_Local_S As Long
    Dim BS_Local_Z As Long
    Dim BS_Local_Remainder As Integer
    Dim BS_Local_Index As Integer
    Dim BS_Local_Last_Insert_Index As Long
    Dim BS_Local_Current_Insert_Index As Long
    Dim BS_Local_BucketIndex As Integer
    ReDim BSMMrec As MINMaxRec
    GetMinMaxArray Array(), start, finish, BSMMrec
    BS_Local_ArrayRange = Array(BSMMrec.max) - Array(BSMMrec.min)
    If BS_Local_ArrayRange > 0 Then
        BS_Local_NBuckets = 2 * Int(Log(finish - start + 1) / Log(2)) + 1
        BS_Local_N = (finish - start)
        BS_Local_Remainder = BS_Local_N Mod BS_Local_NBuckets
        BS_Local_NBuckets = BS_Local_NBuckets - 1
        ReDim BS_Buckets_Array(BS_Local_NBuckets, 0 To (BS_Local_NBuckets * (1 + (BS_Local_N - BS_Local_Remainder) / BS_Local_NBuckets))) As Double
        ReDim BS_Count_Array(0 To BS_Local_NBuckets) As Long
        For BS_Local_S = start To finish
            BS_Local_BucketIndex = BS_Local_NBuckets * ((Array(BS_Local_S) - Array(BSMMrec.min)) / BS_Local_ArrayRange)
            BS_Buckets_Array(BS_Local_BucketIndex, BS_Count_Array(BS_Local_BucketIndex)) = Array(BS_Local_S)
            BS_Count_Array(BS_Local_BucketIndex) = BS_Count_Array(BS_Local_BucketIndex) + 1
        Next
        BS_Local_Last_Insert_Index = start
        BS_Local_Current_Insert_Index = start
        For BS_Local_S = 0 To BS_Local_NBuckets
            If BS_Count_Array(BS_Local_S) > 0 Then
                BS_Local_Last_Insert_Index = BS_Local_Current_Insert_Index
                For BS_Local_Z = 0 To BS_Count_Array(BS_Local_S) - 1
                    Array(BS_Local_Current_Insert_Index) = BS_Buckets_Array(BS_Local_S, BS_Local_Z)
                    BS_Local_Current_Insert_Index = BS_Local_Current_Insert_Index + 1
                Next
                If recurse% Then
                    '* Without this, Bucketort() will be much slower
                    BucketSort Array(), BS_Local_Last_Insert_Index, BS_Local_Current_Insert_Index - 1, order&, 0
                Else
                    '* using MergeSort will speed this significantly, however, this will be left as an exercise
                    '* MergeSort will keep this sorting algorithm quite competitive.
                    InsertionSort Array(), BS_Local_Last_Insert_Index, BS_Local_Current_Insert_Index - 1, order&
                End If
            End If
        Next
        Erase BS_Buckets_Array, BS_Count_Array
    End If
End Sub

Sub GetMinMaxArray (array() As Double, Start&, finish&, GetMinMaxArray_minmax As MINMaxRec)
    n& = finish& - Start&
    t% = n& - 10000 * (n& \ 10000)
    If (t% Mod 2) Then
        GetMinMaxArray_minmax.min = Start&
        GetMinMaxArray_minmax.max = Start&
        GetGetMinMaxArray_minmaxArray_i = Start& + 1
    Else
        If array(Start&) > array(finish&) Then
            GetMinMaxArray_minmax.max = Start&
            GetMinMaxArray_minmax.min = finish&
        Else
            GetMinMaxArray_minmax.min = finish&
            GetMinMaxArray_minmax.max = Start&
        End If
        GetGetMinMaxArray_minmaxArray_i = Start& + 2
    End If

    While GetGetMinMaxArray_minmaxArray_i < finish&
        If array(GetGetMinMaxArray_minmaxArray_i) > array(GetGetMinMaxArray_minmaxArray_i + 1) Then
            If array(GetGetMinMaxArray_minmaxArray_i) > array(GetMinMaxArray_minmax.max) Then
                GetMinMaxArray_minmax.max = GetGetMinMaxArray_minmaxArray_i
            End If
            If array(GetGetMinMaxArray_minmaxArray_i + 1) < array(GetMinMaxArray_minmax.min) Then
                GetMinMaxArray_minmax.min = GetGetMinMaxArray_minmaxArray_i + 1
            End If
        Else
            If array(GetGetMinMaxArray_minmaxArray_i + 1) > array(GetMinMaxArray_minmax.max) Then
                GetMinMaxArray_minmax.max = GetGetMinMaxArray_minmaxArray_i + 1
            End If
            If array(GetGetMinMaxArray_minmaxArray_i) < array(GetMinMaxArray_minmax.min) Then
                GetMinMaxArray_minmax.min = GetGetMinMaxArray_minmaxArray_i
            End If
        End If
        GetGetMinMaxArray_minmaxArray_i = GetGetMinMaxArray_minmaxArray_i + 2
    Wend
End Sub

Sub InsertionSort (array() As Double, start As Long, finish As Long, order&)
    Dim InSort_L_ArrayTemp As Double
    Dim InSort_L_i As Long
    Dim InSort_L_j As Long
    Select Case order&
        Case 1
            For InSort_L_i = start + 1 To finish
                InSort_L_ArrayTemp = array(InSort_L_i)
                InSort_L_j = InSort_L_i - 1
                Do Until InSort_L_j < start
                    If (InSort_L_ArrayTemp < array(InSort_L_j)) Then
                        array(InSort_L_j + 1) = array(InSort_L_j)
                        InSort_L_j = InSort_L_j - 1
                    Else
                        Exit Do
                    End If
                Loop
                array(InSort_L_j + 1) = InSort_L_ArrayTemp
            Next
        Case Else
            For InSort_L_i = start + 1 To finish
                InSort_L_ArrayTemp = array(InSort_L_i)
                InSort_L_j = InSort_L_i - 1
                Do Until InSort_L_j < start
                    If (InSort_L_ArrayTemp > array(InSort_L_j)) Then
                        array(InSort_L_j + 1) = array(InSort_L_j)
                        InSort_L_j = InSort_L_j - 1
                    Else
                        Exit Do
                    End If
                Loop
                array(InSort_L_j + 1) = InSort_L_ArrayTemp
            Next
    End Select
End Sub
Reply
#4
Most interesting. The file (for Windows) is:
.\internal\c\makeline_win.txt

And the optimization parameter may be inserted as follows:
c_compiler\bin\g++ -s -Ofast -Wfatal-errors -w -Wall qbx.cpp -lws2_32 -lwinspool parts\core\os\win\src.a -lopengl32 -lglu32 -lwinmm -lgdi32 -mwindows -static-libgcc -static-libstdc++ -D GLEW_STATIC -D FREEGLUT_STATIC -lksguid -lole32 -lwinmm -ldxguid -o ..\..\

That an "O", not a zero.

My chess program went from about 1 to 1.8 million moves per second.

Does anyone know a downside to doing this?
Reply
#5
This document:
https://gcc.gnu.org/onlinedocs/gcc/Optim...tions.html

"Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program."

Gosh, so it adds a few seconds to the compile time? Small price to pay!

What is "debugging"? I don't sugar my code, so ants have no interest.
Reply
#6
Got another performance boost by adding the parameter -march=native, which is only
suitable for code to be executed on the compiling machine, or one with an identical processor.
Reply
#7
*** My chess program went from about 1 to 1.8 million moves per second.

this is an impressive result. for the moment i haven't done any arithmetic test or disk access comparison...

the gain is sometimes very important. but it is not recommended to use -march=native because the program will only be able to run on the same type of processor. for debugging, I intend to keep the original version for that and as soon as the program is debugged compile it with the optimized version of qb64. I can't say anything for windows. I don't use it at all anymore. but i think you have to modify the setup_win.bat file to recompile qb64 with the Ofast parameter. i still advise to keep the original version.
Reply
#8
when using gcc/g++ optimazation options I rarely use anything above O2, using Ofast may actually produce wrong or inaccurate result, I rarely use O3 but only to see if it would be faster than O2 and frequently O2 is faster than O3
Reply
#9
Chess is up to 2.4 million moves evaluated per second, and I see no problem(s) with using -Ofast.
This level of optimization appears very worthwhile for any program that does a lot of number crunching.

Reading up on all the compiler options gives me a headache.  Very obtuse stuff!
Reply
#10
-ffast-math

    Sets the options -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range and -fexcess-precision=fast.

    This option causes the preprocessor macro __FAST_MATH__ to be defined.

    This option is not turned on by any -O option besides -Ofast since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications.

https://gcc.gnu.org/onlinedocs/gcc/Optim...tions.html
Reply




Users browsing this thread: 2 Guest(s)