# Gemfeeds downloader
2024-02-28T14:15:32Z

So, I wrote a gemfeed parser and news downloader.

I read a lot of ATOM/RSS feeds with rss2email or sfeed, especially sfeed:
=> https://github.com/rss2email/rss2email
=> https://codemadness.org/sfeed.html

However, it doesn't support gemfeeds.
I wasn't happy to open a gemini client to check for new entries.
It felt not natural to me, and even if lagrange or amfora are great to these tasks, I finally never opened them.

But it was disappointing, there are capsules I miss reading.

So, here I am with this piece of awk: gemfeeds.
awk is great to parse text, and gemtext is line-oriented so it's perfect.

There is one part I don't really like: to download over gemini protocol, I use openssl. I miss gemini support in curl, and feel lazy to install one. I should install "gg":
=> https://gmid.omarpolo.com/


For now, I use gemfeeds this way:

```
gemfeeds file-with-list-of-gemfeeds.txt
```

in file-with-list-of-gemfeeds.txt, there is : 

```
gemini://ploum.net/
gemini://adele.pollux.casa/gemlog/
...
```

When a new entry is found, it is downloaded in the current directory, and the item url is recorded in "~/.gemfeeds-items.urls". 

Feel free to check the code below and suggest improvements ;)

```gemfeeds.awk
#!/usr/bin/awk -f
# gemfeeds : download new items from gemfeeds
#
# 1. read a gemfeed url as input
# 2. download new items
# 3. keep track of old items in ~/.gemfeeds-items.urls
#
# ex: gemfeed list-of-gemfeeds-urls.txt
# require: openssl

BEGIN {
	# set defaults
	if ( oldurls == "" ) {
		oldurls = ENVIRON["HOME"] "/.gemfeeds-items.urls"
	}
}

function fetch_gemini_cmd(url) {
	# return command to get gemini content

	host = ""

	split(url, a, "/")
	if (a[3] !~ /:[[:digit:]]+/) {
		a[3] = sprintf("%s:1965", a[3])
	}
	host = a[3]

	# FIXME : check response code
	cmd = sprintf("printf \"%s\\n\" |\
		openssl s_client -crlf -quiet -connect \"%s\" 2> /dev/null |\
		sed '1d'", url, host)

	return cmd
}

function isnew(url) {
	# quite slow...
	ret = 1
	while ((getline o < oldurls) == 1 ) {
		if ( o == url ) {
			ret = 0
			break
		}
	}
	close oldurls
	return ret
}

function download_item(url) {
	if (isnew(url) == 1) {
		printf "new item: %s\n", url
		print url >> oldurls

		# get filename
		n = split(url, a, "/")
		filename = sprintf("%s-%s", a[3], a[n])
		fetch_cmd = fetch_gemini_cmd(url)
		getitem_cmd = sprintf("%s > %s", fetch_cmd, filename)
		system(getitem_cmd)
	}
}

function gemfeed(url) {

	fetch_cmd = fetch_gemini_cmd(url)

	while ((fetch_cmd | getline) == 1) {

		link = ""

		# skip non-links
		if ( $1 != "=>" ) { continue }

		# skip if date not YYYY-mm-dd,
		# gemini://geminiprotocol.net/docs/companion/subscription.gmi
		if ( $3 !~ /[0-9]{4}-[0-9]{2}-[0-9]{2}/ ) {
			continue
		}

		# now build an appropriate link

		if ( $2 ~ /^gemini:\/\// ) {
			link = $2
		} else if ( $2 ~ /^\// ) {
			# start with /, add domain in link
			split(url, a, "/")
			link = sprintf("%s//%s%s", a[1], a[3], $2)
		} else {
			# link relative to current url
			# remove page name if any
			if ( $2 ~ /\.gmi$/ ) {
				sub(/\/[^/]*\.gmi$/, "/", $2)
			}
			link = sprintf("%s%s", url, $2)
		}

		download_item(link)
	}
	close(cmd)
}

/gemini:\/\// {
	gemfeed($0)
	next
}

{
	printf "unhandled protocol, sorry\n"
}
```

(I also added a phlog parser somewhere else, but curl support gopher protocol, and sfeed can parse atom feeds over gopher :))

---
Comments?

Using email (anonymous)
=> mailto:prx@si3t.ch?subject=gemfeeds-downloader

Instructions