Comparison QB64 compiled with gcc optimizations and without - Coolman - 05-07-2022
I have started to test the version of qb64 compiled with the option -Ofast and the original version. resultas :
simple code using pset.
2.3x seconds : program compiled with qb64 -Ofast
3.5x seconds : program compiled with original qb64
Interesting results. Quite an important gain.
Code: (Select All) x% = 1280: y% = 768
Screen _NewImage(x%, y%, 32)
_Delay 0.2
_ScreenMove _Middle
start = Timer(.001)
For b% = 1 To 100
For i% = 1 To x%
For n% = 1 To y%
PSet (i%, n%), _RGB(Rnd * 255, 0, 0)
Next n%
Next i%
Next b%
Print Timer(.001) - start; "seconds"
the _ScreenMove _Middle command is supposed to center the window but if I don't put _Delay 0.2 before, sometimes it doesn't work. strange.
RE: Comparison QB64 compiled with Ofast and without - Coolman - 05-07-2022
Fractal Tree : I got this code from the site rosettacode.
4.1x seconds : program compiled with qb64 -Ofast
6.1x seconds : program compiled with original qb64
it seems that there is an interesting gain when graphic commands like pset or line are used.
Code: (Select All) _Title "Fractal Tree"
Const sw% = 640
Const sh% = 480
Screen _NewImage(sw, sh, 8)
_Delay 0.2
_ScreenMove _Middle
start = Timer(.001)
For i% = 1 To 1000
Color (Rnd * 15)
Call tree(sw \ 2, sh - 10, _Pi * 1.5, _Pi / 180 * 29, 112, 15)
Next i%
Print Timer(.001) - start; "seconds"
Sleep
System
Sub tree (x As Integer, y As Integer, initAngle As Double, theta As Double, length As Double, depth As Integer)
Dim As Integer iL, newX, newY, iX, iY, iD
iL = length: iX = x: iY = y: iD = depth
newX = Cos(initAngle) * length + iX
newY = Sin(initAngle) * length + iY
Line (iX, iY)-(newX, newY)
iL = length * .78
iD = iD - 1
If iD > 0 Then
Call tree(newX, newY, initAngle - theta, theta, iL, iD)
Call tree(newX, newY, initAngle + theta, theta, iL, iD)
End If
End Sub
RE: Comparison QB64 compiled with Ofast and without - Coolman - 05-07-2022
Bucketsort : I got this code from the site rosettacode.
3.4x seconds : program compiled with qb64 -Ofast
3.4x seconds : program compiled with original qb64
strangely, disappointing. almost no gain.
Code: (Select All) '* Complexity Class: O(N^2)
Type MINMaxRec
min As Long
max As Long
End Type
ReDim a(0 To 1048575) As Double
For FillArray& = LBound(a) To UBound(a)
a(FillArray&) = Rnd
Next
DoRecurse% = -1
DemoOrder& = 1 '* -1 = descending
Print "start...": Print
start = Timer(.001)
BucketSort a(), LBound(a), UBound(a), DemoOrder&, DoRecurse% '* without the recursive initial call, executiom time is FAR slower.
Print Timer(.001) - start; "seconds"
Sub BucketSort (Array() As Double, start As Long, finish As Long, order&, recurse%)
Dim BS_Local_NBuckets As Integer
Dim BS_Local_ArrayRange As Double
Dim BS_Local_N As Long
Dim BS_Local_S As Long
Dim BS_Local_Z As Long
Dim BS_Local_Remainder As Integer
Dim BS_Local_Index As Integer
Dim BS_Local_Last_Insert_Index As Long
Dim BS_Local_Current_Insert_Index As Long
Dim BS_Local_BucketIndex As Integer
ReDim BSMMrec As MINMaxRec
GetMinMaxArray Array(), start, finish, BSMMrec
BS_Local_ArrayRange = Array(BSMMrec.max) - Array(BSMMrec.min)
If BS_Local_ArrayRange > 0 Then
BS_Local_NBuckets = 2 * Int(Log(finish - start + 1) / Log(2)) + 1
BS_Local_N = (finish - start)
BS_Local_Remainder = BS_Local_N Mod BS_Local_NBuckets
BS_Local_NBuckets = BS_Local_NBuckets - 1
ReDim BS_Buckets_Array(BS_Local_NBuckets, 0 To (BS_Local_NBuckets * (1 + (BS_Local_N - BS_Local_Remainder) / BS_Local_NBuckets))) As Double
ReDim BS_Count_Array(0 To BS_Local_NBuckets) As Long
For BS_Local_S = start To finish
BS_Local_BucketIndex = BS_Local_NBuckets * ((Array(BS_Local_S) - Array(BSMMrec.min)) / BS_Local_ArrayRange)
BS_Buckets_Array(BS_Local_BucketIndex, BS_Count_Array(BS_Local_BucketIndex)) = Array(BS_Local_S)
BS_Count_Array(BS_Local_BucketIndex) = BS_Count_Array(BS_Local_BucketIndex) + 1
Next
BS_Local_Last_Insert_Index = start
BS_Local_Current_Insert_Index = start
For BS_Local_S = 0 To BS_Local_NBuckets
If BS_Count_Array(BS_Local_S) > 0 Then
BS_Local_Last_Insert_Index = BS_Local_Current_Insert_Index
For BS_Local_Z = 0 To BS_Count_Array(BS_Local_S) - 1
Array(BS_Local_Current_Insert_Index) = BS_Buckets_Array(BS_Local_S, BS_Local_Z)
BS_Local_Current_Insert_Index = BS_Local_Current_Insert_Index + 1
Next
If recurse% Then
'* Without this, Bucketort() will be much slower
BucketSort Array(), BS_Local_Last_Insert_Index, BS_Local_Current_Insert_Index - 1, order&, 0
Else
'* using MergeSort will speed this significantly, however, this will be left as an exercise
'* MergeSort will keep this sorting algorithm quite competitive.
InsertionSort Array(), BS_Local_Last_Insert_Index, BS_Local_Current_Insert_Index - 1, order&
End If
End If
Next
Erase BS_Buckets_Array, BS_Count_Array
End If
End Sub
Sub GetMinMaxArray (array() As Double, Start&, finish&, GetMinMaxArray_minmax As MINMaxRec)
n& = finish& - Start&
t% = n& - 10000 * (n& \ 10000)
If (t% Mod 2) Then
GetMinMaxArray_minmax.min = Start&
GetMinMaxArray_minmax.max = Start&
GetGetMinMaxArray_minmaxArray_i = Start& + 1
Else
If array(Start&) > array(finish&) Then
GetMinMaxArray_minmax.max = Start&
GetMinMaxArray_minmax.min = finish&
Else
GetMinMaxArray_minmax.min = finish&
GetMinMaxArray_minmax.max = Start&
End If
GetGetMinMaxArray_minmaxArray_i = Start& + 2
End If
While GetGetMinMaxArray_minmaxArray_i < finish&
If array(GetGetMinMaxArray_minmaxArray_i) > array(GetGetMinMaxArray_minmaxArray_i + 1) Then
If array(GetGetMinMaxArray_minmaxArray_i) > array(GetMinMaxArray_minmax.max) Then
GetMinMaxArray_minmax.max = GetGetMinMaxArray_minmaxArray_i
End If
If array(GetGetMinMaxArray_minmaxArray_i + 1) < array(GetMinMaxArray_minmax.min) Then
GetMinMaxArray_minmax.min = GetGetMinMaxArray_minmaxArray_i + 1
End If
Else
If array(GetGetMinMaxArray_minmaxArray_i + 1) > array(GetMinMaxArray_minmax.max) Then
GetMinMaxArray_minmax.max = GetGetMinMaxArray_minmaxArray_i + 1
End If
If array(GetGetMinMaxArray_minmaxArray_i) < array(GetMinMaxArray_minmax.min) Then
GetMinMaxArray_minmax.min = GetGetMinMaxArray_minmaxArray_i
End If
End If
GetGetMinMaxArray_minmaxArray_i = GetGetMinMaxArray_minmaxArray_i + 2
Wend
End Sub
Sub InsertionSort (array() As Double, start As Long, finish As Long, order&)
Dim InSort_L_ArrayTemp As Double
Dim InSort_L_i As Long
Dim InSort_L_j As Long
Select Case order&
Case 1
For InSort_L_i = start + 1 To finish
InSort_L_ArrayTemp = array(InSort_L_i)
InSort_L_j = InSort_L_i - 1
Do Until InSort_L_j < start
If (InSort_L_ArrayTemp < array(InSort_L_j)) Then
array(InSort_L_j + 1) = array(InSort_L_j)
InSort_L_j = InSort_L_j - 1
Else
Exit Do
End If
Loop
array(InSort_L_j + 1) = InSort_L_ArrayTemp
Next
Case Else
For InSort_L_i = start + 1 To finish
InSort_L_ArrayTemp = array(InSort_L_i)
InSort_L_j = InSort_L_i - 1
Do Until InSort_L_j < start
If (InSort_L_ArrayTemp > array(InSort_L_j)) Then
array(InSort_L_j + 1) = array(InSort_L_j)
InSort_L_j = InSort_L_j - 1
Else
Exit Do
End If
Loop
array(InSort_L_j + 1) = InSort_L_ArrayTemp
Next
End Select
End Sub
RE: Comparison QB64 compiled with Ofast and without - ChiaPet - 05-07-2022
Most interesting. The file (for Windows) is:
.\internal\c\makeline_win.txt
And the optimization parameter may be inserted as follows:
c_compiler\bin\g++ -s -Ofast -Wfatal-errors -w -Wall qbx.cpp -lws2_32 -lwinspool parts\core\os\win\src.a -lopengl32 -lglu32 -lwinmm -lgdi32 -mwindows -static-libgcc -static-libstdc++ -D GLEW_STATIC -D FREEGLUT_STATIC -lksguid -lole32 -lwinmm -ldxguid -o ..\..\
That an "O", not a zero.
My chess program went from about 1 to 1.8 million moves per second.
Does anyone know a downside to doing this?
RE: Comparison QB64 compiled with Ofast and without - ChiaPet - 05-07-2022
This document:
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
"Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program."
Gosh, so it adds a few seconds to the compile time? Small price to pay!
What is "debugging"? I don't sugar my code, so ants have no interest.
RE: Comparison QB64 compiled with Ofast and without - ChiaPet - 05-07-2022
Got another performance boost by adding the parameter -march=native, which is only
suitable for code to be executed on the compiling machine, or one with an identical processor.
RE: Comparison QB64 compiled with Ofast and without - Coolman - 05-07-2022
*** My chess program went from about 1 to 1.8 million moves per second.
this is an impressive result. for the moment i haven't done any arithmetic test or disk access comparison...
the gain is sometimes very important. but it is not recommended to use -march=native because the program will only be able to run on the same type of processor. for debugging, I intend to keep the original version for that and as soon as the program is debugged compile it with the optimized version of qb64. I can't say anything for windows. I don't use it at all anymore. but i think you have to modify the setup_win.bat file to recompile qb64 with the Ofast parameter. i still advise to keep the original version.
RE: Comparison QB64 compiled with Ofast and without - Jack - 05-07-2022
when using gcc/g++ optimazation options I rarely use anything above O2, using Ofast may actually produce wrong or inaccurate result, I rarely use O3 but only to see if it would be faster than O2 and frequently O2 is faster than O3
RE: Comparison QB64 compiled with Ofast and without - ChiaPet - 05-07-2022
Chess is up to 2.4 million moves evaluated per second, and I see no problem(s) with using -Ofast.
This level of optimization appears very worthwhile for any program that does a lot of number crunching.
Reading up on all the compiler options gives me a headache. Very obtuse stuff!
RE: Comparison QB64 compiled with Ofast and without - Jack - 05-07-2022
-ffast-math
Sets the options -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range and -fexcess-precision=fast.
This option causes the preprocessor macro __FAST_MATH__ to be defined.
This option is not turned on by any -O option besides -Ofast since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications.
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
|