Login

**SMcNeill** · 05-23-2022, 08:45 PM

Sometimes, it's easiest to just take an idea and toss it out the door and start over completely from scratch -- and that's what I've decided to do here!

The code I shared originally was basically ripped directly from the QB64 source, then stitched together, and then operated on and altered like Frankenstein, until it could cough and sputter and produce a semi-reasonable result...

But it's long. And messy. And almost impossible to follow and sort out what's doing what and where it's doing it and why it's doing it...

So, I've decided to back up and reboot on my approach of handling this type issue. What I have now is this much simpler code:

Code: (Select All)
$Console:Only

Const HomePage$ = "https://qb64phoenix.com"

NumberOfPages = DownloadPageLists

Function DownloadPageLists

    FileLeft$ = "Page List("

    FileRight$ = ").txt"

    FileCount = 1

    CurrentFile$ = ""

    url$ = "/qb64wiki/index.php/Special:AllPages" 'the first file that we download

    Do

        file$ = FileLeft$ + _Trim$(Str$(FileCount)) + FileRight$

        Download url$, file$

        url2$ = GetNextPage$(file$)

        p = InStr(url2$, "from=")

        If p = 0 Then Exit Do

        If Mid$(url2$, p + 5) > CurrentFile$ Then

            CurrentFile$ = Mid$(url2$, p + 5)

            FileCount = FileCount + 1

            url$ = url2$

        Else

            Exit Do

        End If

    Loop

    DownloadPageLists = FileCount

End Function

Function CleanHTML$ (OriginalText$)

    text$ = OriginalText$ 'don't corrupt incoming text

    Type ReplaceList

        original As String

        replacement As String

    End Type

    'Expandable HTML replacement system

    Dim HTML(1) As ReplaceList

    HTML(0).original = "&amp;": HTML(0).replacement = "&"

    HTML(1).original = "%24": HTML(1).replacement = "$"

    For i = 0 To UBound(HTML)

        Do

            p = InStr(text$, HTML(i).original)

            If p = 0 Then Exit Do

            text$ = Left$(text$, p - 1) + HTML(i).replacement + Mid$(text$, p + Len(HTML(i).original))

        Loop

    Next

    CleanHTML$ = text$

End Function

Sub Download (url$, outputFile$)

    url2$ = CleanHTML(url$)

    'Print "https://qb64phoenix.com/qb64wiki/index.php?title=Special:AllPages&from=KEY+n"

    'Print HomePage$ + url2$

    Shell "curl -o " + Chr$(34) + outputFile$ + Chr$(34) + " " + Chr$(34) + HomePage$ + url2$ + Chr$(34)

End Sub

Function GetNextPage$ (currentPage$)

    SpecialPageDivClass$ = "<div class=" + Chr$(34) + "mw-allpages-nav" + Chr$(34) + ">"

    SpecialPageLink$ = "<a href="

    SpecialPageEndLink$ = Chr$(34) + " title"

    Open currentPage$ For Binary As #1

    l = LOF(1)

    t$ = Space$(l)

    Get #1, 1, t$

    Close

    sp = InStr(t$, SpecialPageDivClass$)

    If sp Then

        lp = InStr(sp, t$, SpecialPageLink$)

        If lp Then

            lp = lp + 9

            lp2 = InStr(lp, t$, SpecialPageEndLink$)

            link$ = Mid$(t$, lp, lp2 - lp)

            GetNextPage$ = CleanHTML(link$)

        End If

    End If

End Function

Only about 80 lines, but this already connects to the wiki and downloads us 2 pages of vitally important data -- the lists of all the pages inside our wiki!!

Just by parsing these, I should now be able to make a simple list of all the page names, as they exist in our wiki, and easily grab them and download them one after another and save them wherever I want.

I don't have a whole wiki downloader yet, but I've got the wiki page-list downloader now in less than 80 lines of code. It shouldn't be very hard to go from this to the finished form now, and the whole program should come in at less than a few hundred lines in total. Wink

Login
Username/Email:
Password:	Lost Password?
	Remember me