Archive-dot-org simple helper
#1
This is a program that could make life a bit easier to navigate "archive-dot-org" if the user is only looking to download music or video.

N.B. This requires a bit of research to configure the program as desired. As it stands it only works for audio (FLAC, MP3, OGG, WAV etc.) This research is to obtain the "subjects" which are tags that have to be written precisely into a web address. On "archive-dot-org" some categories are written out like plain English, capitalized short phrases with spaces, which cannot stand into a web address. The site has a chooser of subjects which puts down stuff which could be unpredictable. (Sometimes it chooses "multiple categories" which is the same word or words but in different upper-lower-case combinations.) Therefore the user must tinker a little bit to obtain a subject tag for use with this program. It's a vain attempt to make this program more flexible.

This program requires one text file, and it's recommended to provide another. The required file has one line which is the full path of the executable to the web browser. Because I programmed this on Linux, I'm not familiar with a way to launch the web browser from an user's QB64 program on MacOS or on Windows. I also programmed to launch the AppImage which might appear clumsy to some of you. This file is not provided, you will have to create it. It is called "helparchorg-browser.txt", it must reside in the same directory as the executable. This program only reads the first line of this file, so make sure it has a correct entry. Smile

It's recommended to have also "helparchorg-category.txt". It could also be called "helparchorg-subject.txt". Here you will put down a subject, one per line, for the media that is sought. If you want two categories at a time then put each tag joined by a plus sign. At the moment no more than two categories could be joined.

The program reads the text files, tells the user that it found the web browser, and then shows a menu with the categories. If there's only one then it's "electronic", at the moment, but this could be changed in the source code. The user types in a number for the subject or subjects he/she desires and presses [ENTER]. Pressing [ENTER] with no entry quits the program.

After that, the user is asked what year of creation or release for the media sought, starting with 2013. Again, this could be modified in the source code. Type in the menu choice for the year, not the year itself LOL, and press [ENTER]. Press [ENTER] without entry at this point to leave the program.

This program then launches the web browser with the address fabricated from the data it was given.

Code: (Select All)

'by mnrvovrfc 6-July-2023
OPTION _EXPLICIT

DIM AS INTEGER c, lsubj, j, plu
DIM prefx$, afile$, launchprog$, comd$, asubj$, ayear$, entry$
DIM fe AS LONG

prefx$ = "helparchorg-"
afile$ = prefx$ + "browser.txt"
IF NOT _FILEEXISTS(afile$) THEN
PRINT "The web browser wasn't found! Aborting."
END
END IF

fe = FREEFILE
OPEN afile$ FOR INPUT AS fe
DO UNTIL EOF(fe)
LINE INPUT #fe, entry$
entry$ = _TRIM$(entry$)
IF entry$ <> "" AND launchprog$ = "" THEN
launchprog$ = entry$
EXIT DO
END IF
LOOP
CLOSE fe

IF NOT _FILEEXISTS(launchprog$) THEN
PRINT "The web browser wasn't found! Aborting."
END
END IF

PRINT "Discovered web browser executable called:"
PRINT launchprog$

afile$ = prefx$ + "subject.txt"
IF NOT _FILEEXISTS(afile$) THEN
afile$ = prefx$ + "category.txt"
END IF
IF _FILEEXISTS(afile$) THEN
fe = FREEFILE
OPEN afile$ FOR INPUT AS fe
DO UNTIL EOF(fe)
LINE INPUT #fe, entry$
entry$ = _TRIM$(entry$)
IF entry$ <> "" THEN lsubj = lsubj + 1
LOOP
CLOSE fe
IF lsubj < 1 THEN
PRINT "At least one entry required from input file!"
END
END IF
REDIM subj(1 TO lsubj) AS STRING
c = 0
fe = FREEFILE
OPEN afile$ FOR INPUT AS fe
DO UNTIL EOF(fe)
LINE INPUT #fe, entry$
entry$ = _TRIM$(entry$)
IF entry$ <> "" THEN
c = c + 1
subj(c) = entry$
END IF
LOOP
CLOSE fe
ELSE
lsubj = 1
REDIM subj(1 TO lsubj) AS STRING
subj(lsubj) = "electronic"
END IF

PRINT "*** archive-dot-org helper ***"
IF lsubj = 1 THEN
PRINT: PRINT "There's only one category available: "; subj(1)
asubj$ = subj(1)
ELSE
PRINT: PRINT "Please choose your category."
FOR j = 1 TO lsubj
PRINT USING "(##)"; j;
PRINT " "; subj(j)
NEXT
LINE INPUT entry$
entry$ = _TRIM$(entry$)
IF entry$ = "" THEN SYSTEM
c = VAL(entry$)
IF c > 0 AND c <= lsubj THEN
asubj$ = subj(c)
ELSE
PRINT "Incorrect input given! Aborting."
END
END IF
END IF

PRINT: PRINT "Please choose the year of release."
FOR j = 2013 TO 2023
PRINT USING "(####)"; j - 2012;
PRINT " "; j
NEXT
LINE INPUT entry$
entry$ = _TRIM$(entry$)
IF entry$ = "" THEN SYSTEM
c = VAL(entry$)
IF c > 0 AND c < 12 THEN
ayear$ = _TRIM$(STR$(c + 2012))
ELSE
PRINT "Incorrect input given! Aborting."
END
END IF

comd$ = launchprog$ + " 'https://archive.org/details/audio?and[]=year%3A%22" + ayear$ + _
"%22&and[]=mediatype%3A%22audio%22&and[]=subject%3A%22"
plu = INSTR(asubj$, "+")
IF plu > 0 THEN
comd$ = comd$ + LEFT$(asubj$, plu - 1) + "%22&and[]=subject%3A%22" + MID$(asubj$, plu + 1) + "%22'"
ELSE
comd$ = comd$ + asubj$ + "%22'"
END IF
SHELL _HIDE _DONTWAIT comd$
SYSTEM

For this program as it stands, try this as "helparchorg-category.txt":
Code: (Select All)
electronic
podcast
Popular Music+Jazz
Reply
#2
This is cool, @mnrvovrfc

I had to make some changes to get it to work on windows (filename with space in path, and firefox didn't like the input surrounded in single quotes for the URL).

Code: (Select All)

comd$ = CHR$(34) + launchprog$ + CHR$(34) + " https://archive.org/details/audio?and[]=year%3A%22" + ayear$ + _
"%22&and[]=mediatype%3A%22audio%22&and[]=subject%3A%22"
plu = INSTR(asubj$, "+")
IF plu > 0 THEN
comd$ = comd$ + LEFT$(asubj$, plu - 1) + "%22&and[]=subject%3A%22" + MID$(asubj$, plu + 1) + "%22"
ELSE
comd$ = comd$ + asubj$ + "%22"
END IF
PRINT launchprog$
PRINT comd$
SLEEP
SHELL _HIDE _DONTWAIT comd$

Here is my helparchorg-browser.txt file:
Code: (Select All)
C:\Program Files\Mozilla Firefox\firefox.exe

I tried to put the double quotes in this file, but it wasn't finding the browser as a file that existed per the stuff at the top that checks for it. However, it found it without double quotes there.

For the benefit of others here are screenshots of it working:

[Image: archiveorghelp-example.png]
grymmjack (gj!)
GitHubYouTube | Soundcloud | 16colo.rs
Reply
#3
If you wanted to you could detect the OS and then act accordingly I guess?

I know Linux/Mac needs the single quoted string because of shell expansion, but that made it try to open the string as a "file://" on Windows.

Really neat program, thank you for sharing.
grymmjack (gj!)
GitHubYouTube | Soundcloud | 16colo.rs
Reply
#4
Last, I checked for a real API and it exists (sort of):

https://archive.org/advancedsearch.php

Check that out, it might be helpful - or you may have even already used it to build out what you have created.

https://blog.archive.org/developers/

I've done similar things too before, but using dissection/hacking-at-it-to-know-how-it-works, by looking at the form fields, messing with the http posts, etc.

Take it easy
grymmjack (gj!)
GitHubYouTube | Soundcloud | 16colo.rs
Reply
#5
Thumbs Up 
(07-09-2023, 04:16 PM)grymmjack Wrote: If you wanted to you could detect the OS and then act accordingly I guess?

I know Linux/Mac needs the single quoted string because of shell expansion, but that made it try to open the string as a "file://" on Windows.

Really neat program, thank you for sharing.

You have done much more than me after I posted this program. Thank you also!
Reply
#6
Hmmm..... An API? Could be good fun. I am quite fond of all things API.
Schuwatch!
Yes, it's me. Now shut up.
Reply
#7
This simple little code will just let you see if a URL is available on the Wayback Machine and automatically open the browser to the webpage if it does.

Code: (Select All)

Option Explicit
$NoPrefix
$Unstable:Http

$Console:Only

Title "Wayback Machine API"
ConsoleTitle "Enter Link"

Dim As String link
Do
    Cls
    Line Input "Link: ", link
Loop Until link <> ""

ConsoleTitle Title$ + " - " + link

link = "https://archive.org/wayback/available?url=" + link

Dim As Long hConn: hConn = OpenClient("HTTP:" + link)
If hConn < 0 And StatusCode(hConn) = 200 Then
    Dim As String buf, json
    Dim As Long length: length = LOF(hConn)
    While Not EOF(hConn)
        Get hConn, , buf
        json = json + buf
    Wend
    Dim As String available: available = Mid$(json, InStr(json, Chr$(34) + "available" + Chr$(34) + ": ") + Len(Chr$(34) + "available" + Chr$(34) + ": "))
    available = Mid$(available, 1, InStr(available, ",") - 1)
    If available = "true" Then
        Dim As String url
        url = Mid$(json, InStrRev(json, Chr$(34) + "url" + Chr$(34) + ": ") + 8)
        url = Mid$(url, 1, InStr(url, Chr$(34)) - 1)
        Shell Hide DontWait "start " + url
        Print "Opening in default browser..."
        GoTo exiting
    Else
        Print "URL not available on the Wayback Machine"
        GoTo exiting
    End If
    Close hConn
End If

exiting:
Dim As Long l: l = CsrLin
Dim As _Byte i
For i = 3 To 0 Step -1
    Locate l, 0
    Print "Exiting in"; i; "seconds..."
    Sleep 1
Next
System
Schuwatch!
Yes, it's me. Now shut up.
Reply
#8
I'd add to the above so the user only has to type in the "main part" of the web address. Then from a menu decide if it's "dot-com", "dot-org" or what is the suffix. Maybe combine that suffix menu for secure protocol or not. Because some "main parts" of web addresses are bad enough to type. Otherwise have to copy an address into the clipboard from somewhere.

The presented program is for Windows only, because it's using "start" command. Nevertheless this is a good trick.
Reply
#9
Yeah, I probably ain't changing any of the code. It was just for a quick one-off to show using the Wayback Machine API on that one particular call.
Schuwatch!
Yes, it's me. Now shut up.
Reply
#10
I modified this program because I was bored today. Now it could work on Linux, however it assumes a web browser was "traditionally" installed as set via "Default Applications". Not from an AppImage and maybe not from Flatpak or Snaps.

This also works on Windows but the way to input has been changed.

Now in the first prompt it is only necessary to put down "qb64phoenix" (without double-quotation marks) instead of the whole shebang. Actually write "s.qb64phoenix" but I will tell you why a bit later.

There's a second prompt that asks which suffix to use. If you already put down a suffix on the first prompt then choose "none". If you don't see the one you want out of the ones listed then choose option #5 and enter it, the dot isn't necessary. For this example for this site choose option #1.

This should give you "https://qb64phoenix.com" if you did include the "s." at the front of the first prompt of this modified program.

In other words:

Code: (Select All)
First prompt:               Second prompt:              Result:
s.qb64phoenix               com                         https://qb64phoenix.com
qb64phoenix                 com                         http://qb64phoenix.com
s.freebasic                 net                         https://freebasic.net
s.freesound                 org                         https://freesound.org
s.dol                       other->gov                  https://dol.gov (U.S.A. Department of Labor)
s.kx77free.free.fr          none                        https://kx77free.free.fr (Claudia Kalensky talented musician and music plug-in builder)
petesqbsite                 com                         http://petesqbsite.com

This is in case the site you're looking for is a rather old one or some other reason it cannot be upgraded by web browser to "secure" protocol.

On Linux the screen might act a bit funny while it counts down and then launches the web browser. Also be ready for when the web browser complains "archive-dot-org" itself isn't on "secure" protocol LOL because I had that happen to me while I was trying to test these modifications to look for a different site from decade-2000.

Code: (Select All)
'by Spriggsy
Option Explicit
$NoPrefix
$Unstable:Http

$Console:Only

Title "Wayback Machine API"
ConsoleTitle "Enter Link"

Dim As String link, ess, entry, dotwhat
'by mnrvovrfc: changed here so it's only necessary to put down the "fat part", and hint if the user prefers "secure" protocol
Cls
Line Input "Link: ", link
if link = "" then system
if lcase$(left$(link, 2)) = "s." then
ess = "s"
link = mid$(link, 3)
end if
print
print "What is the suffix?"
line input "(1) com; (2) org; (3) net; (4) none; (5) other:", entry
if entry = "" then system
if entry = "5" then
line input "Please enter the suffix:", dotwhat
if dotwhat = "" then system
if left$(dotwhat, 1) = "." then dotwhat = mid$(dotwhat$, 2)
elseif entry = "1" then
dotwhat = "com"
elseif entry = "2" then
dotwhat = "org"
elseif entry = "3" then
dotwhat = "net"
end if
link = "http" + ess + "://" + link
if entry <> "4" then link = link + "." + dotwhat

ConsoleTitle Title$ + " - " + link

link = "https://archive.org/wayback/available?url=" + link

Dim As Long hConn: hConn = OpenClient("HTTP:" + link)
If hConn < 0 And StatusCode(hConn) = 200 Then
Dim As String buf, json
Dim As Long length: length = LOF(hConn)
While Not EOF(hConn)
Get hConn, , buf
json = json + buf
Wend
Dim As String available: available = Mid$(json, InStr(json, Chr$(34) + "available" + Chr$(34) + ": ") + Len(Chr$(34) + "available" + Chr$(34) + ": "))
available = Mid$(available, 1, InStr(available, ",") - 1)
If available = "true" Then
Dim As String url
url = Mid$(json, InStrRev(json, Chr$(34) + "url" + Chr$(34) + ": ") + 8)
url = Mid$(url, 1, InStr(url, Chr$(34)) - 1)
$IF WIN THEN
Shell Hide DontWait "start " + url
$ELSE
Shell Hide DontWait "xdg-open " + url
$END IF
Print "Opening in default browser..."
GoTo exiting
Else
Print "URL not available on the Wayback Machine"
GoTo exiting
End If
Close hConn
End If

exiting:
Dim As Long l: l = CsrLin
Dim As _Byte i
For i = 3 To 0 Step -1
Locate l, 0
Print "Exiting in"; i; "seconds..."
Sleep 1
Next
System
Reply




Users browsing this thread: 1 Guest(s)