# Gemfeeds downloader 2024-02-28T14:15:32Z So, I wrote a gemfeed parser and news downloader. I read a lot of ATOM/RSS feeds with rss2email or sfeed, especially sfeed: => https://github.com/rss2email/rss2email => https://codemadness.org/sfeed.html However, it doesn't support gemfeeds. I wasn't happy to open a gemini client to check for new entries. It felt not natural to me, and even if lagrange or amfora are great to these tasks, I finally never opened them. But it was disappointing, there are capsules I miss reading. So, here I am with this piece of awk: gemfeeds. awk is great to parse text, and gemtext is line-oriented so it's perfect. There is one part I don't really like: to download over gemini protocol, I use openssl. I miss gemini support in curl, and feel lazy to install one. I should install "gg": => https://gmid.omarpolo.com/ For now, I use gemfeeds this way: ``` gemfeeds file-with-list-of-gemfeeds.txt ``` in file-with-list-of-gemfeeds.txt, there is : ``` gemini://ploum.net/ gemini://adele.pollux.casa/gemlog/ ... ``` When a new entry is found, it is downloaded in the current directory, and the item url is recorded in "~/.gemfeeds-items.urls". Feel free to check the code below and suggest improvements ;) ```gemfeeds.awk #!/usr/bin/awk -f # gemfeeds : download new items from gemfeeds # # 1. read a gemfeed url as input # 2. download new items # 3. keep track of old items in ~/.gemfeeds-items.urls # # ex: gemfeed list-of-gemfeeds-urls.txt # require: openssl BEGIN { # set defaults if ( oldurls == "" ) { oldurls = ENVIRON["HOME"] "/.gemfeeds-items.urls" } } function fetch_gemini_cmd(url) { # return command to get gemini content host = "" split(url, a, "/") if (a[3] !~ /:[[:digit:]]+/) { a[3] = sprintf("%s:1965", a[3]) } host = a[3] # FIXME : check response code cmd = sprintf("printf \"%s\\n\" |\ openssl s_client -crlf -quiet -connect \"%s\" 2> /dev/null |\ sed '1d'", url, host) return cmd } function isnew(url) { # quite slow... ret = 1 while ((getline o < oldurls) == 1 ) { if ( o == url ) { ret = 0 break } } close oldurls return ret } function download_item(url) { if (isnew(url) == 1) { printf "new item: %s\n", url print url >> oldurls # get filename n = split(url, a, "/") filename = sprintf("%s-%s", a[3], a[n]) fetch_cmd = fetch_gemini_cmd(url) getitem_cmd = sprintf("%s > %s", fetch_cmd, filename) system(getitem_cmd) } } function gemfeed(url) { fetch_cmd = fetch_gemini_cmd(url) while ((fetch_cmd | getline) == 1) { link = "" # skip non-links if ( $1 != "=>" ) { continue } # skip if date not YYYY-mm-dd, # gemini://geminiprotocol.net/docs/companion/subscription.gmi if ( $3 !~ /[0-9]{4}-[0-9]{2}-[0-9]{2}/ ) { continue } # now build an appropriate link if ( $2 ~ /^gemini:\/\// ) { link = $2 } else if ( $2 ~ /^\// ) { # start with /, add domain in link split(url, a, "/") link = sprintf("%s//%s%s", a[1], a[3], $2) } else { # link relative to current url # remove page name if any if ( $2 ~ /\.gmi$/ ) { sub(/\/[^/]*\.gmi$/, "/", $2) } link = sprintf("%s%s", url, $2) } download_item(link) } close(cmd) } /gemini:\/\// { gemfeed($0) next } { printf "unhandled protocol, sorry\n" } ``` (I also added a phlog parser somewhere else, but curl support gopher protocol, and sfeed can parse atom feeds over gopher :)) --- Comments? Using email (anonymous) => mailto:bla@bla.si3t.ch?subject=gemfeeds-downloader Instructions => https://si3t.ch/log/_commentaires_.txt