05-23-2022, 08:45 PM
Sometimes, it's easiest to just take an idea and toss it out the door and start over completely from scratch -- and that's what I've decided to do here!
The code I shared originally was basically ripped directly from the QB64 source, then stitched together, and then operated on and altered like Frankenstein, until it could cough and sputter and produce a semi-reasonable result...
But it's long. And messy. And almost impossible to follow and sort out what's doing what and where it's doing it and why it's doing it...
So, I've decided to back up and reboot on my approach of handling this type issue. What I have now is this much simpler code:
Only about 80 lines, but this already connects to the wiki and downloads us 2 pages of vitally important data -- the lists of all the pages inside our wiki!!
Just by parsing these, I should now be able to make a simple list of all the page names, as they exist in our wiki, and easily grab them and download them one after another and save them wherever I want.
I don't have a whole wiki downloader yet, but I've got the wiki page-list downloader now in less than 80 lines of code. It shouldn't be very hard to go from this to the finished form now, and the whole program should come in at less than a few hundred lines in total.
The code I shared originally was basically ripped directly from the QB64 source, then stitched together, and then operated on and altered like Frankenstein, until it could cough and sputter and produce a semi-reasonable result...
But it's long. And messy. And almost impossible to follow and sort out what's doing what and where it's doing it and why it's doing it...
So, I've decided to back up and reboot on my approach of handling this type issue. What I have now is this much simpler code:
Code: (Select All)
$Console:Only
Const HomePage$ = "https://qb64phoenix.com"
NumberOfPages = DownloadPageLists
Function DownloadPageLists
FileLeft$ = "Page List("
FileRight$ = ").txt"
FileCount = 1
CurrentFile$ = ""
url$ = "/qb64wiki/index.php/Special:AllPages" 'the first file that we download
Do
file$ = FileLeft$ + _Trim$(Str$(FileCount)) + FileRight$
Download url$, file$
url2$ = GetNextPage$(file$)
p = InStr(url2$, "from=")
If p = 0 Then Exit Do
If Mid$(url2$, p + 5) > CurrentFile$ Then
CurrentFile$ = Mid$(url2$, p + 5)
FileCount = FileCount + 1
url$ = url2$
Else
Exit Do
End If
Loop
DownloadPageLists = FileCount
End Function
Function CleanHTML$ (OriginalText$)
text$ = OriginalText$ 'don't corrupt incoming text
Type ReplaceList
original As String
replacement As String
End Type
'Expandable HTML replacement system
Dim HTML(1) As ReplaceList
HTML(0).original = "&": HTML(0).replacement = "&"
HTML(1).original = "%24": HTML(1).replacement = "$"
For i = 0 To UBound(HTML)
Do
p = InStr(text$, HTML(i).original)
If p = 0 Then Exit Do
text$ = Left$(text$, p - 1) + HTML(i).replacement + Mid$(text$, p + Len(HTML(i).original))
Loop
Next
CleanHTML$ = text$
End Function
Sub Download (url$, outputFile$)
url2$ = CleanHTML(url$)
'Print "https://qb64phoenix.com/qb64wiki/index.php?title=Special:AllPages&from=KEY+n"
'Print HomePage$ + url2$
Shell "curl -o " + Chr$(34) + outputFile$ + Chr$(34) + " " + Chr$(34) + HomePage$ + url2$ + Chr$(34)
End Sub
Function GetNextPage$ (currentPage$)
SpecialPageDivClass$ = "<div class=" + Chr$(34) + "mw-allpages-nav" + Chr$(34) + ">"
SpecialPageLink$ = "<a href="
SpecialPageEndLink$ = Chr$(34) + " title"
Open currentPage$ For Binary As #1
l = LOF(1)
t$ = Space$(l)
Get #1, 1, t$
Close
sp = InStr(t$, SpecialPageDivClass$)
If sp Then
lp = InStr(sp, t$, SpecialPageLink$)
If lp Then
lp = lp + 9
lp2 = InStr(lp, t$, SpecialPageEndLink$)
link$ = Mid$(t$, lp, lp2 - lp)
GetNextPage$ = CleanHTML(link$)
End If
End If
End Function
Only about 80 lines, but this already connects to the wiki and downloads us 2 pages of vitally important data -- the lists of all the pages inside our wiki!!
Just by parsing these, I should now be able to make a simple list of all the page names, as they exist in our wiki, and easily grab them and download them one after another and save them wherever I want.
I don't have a whole wiki downloader yet, but I've got the wiki page-list downloader now in less than 80 lines of code. It shouldn't be very hard to go from this to the finished form now, and the whole program should come in at less than a few hundred lines in total.