Building your own personal Google

The Wisplight logomark and wordmark with the text "Magic powers for your browser history."
Wispy the wisp, because every good product needs a cute mascot!

I've never been good at keeping notes. I have a Notion file with 36 links in it, it was last updated in 2019. I also have a Pocket account, I use it to read things on the tube. My usage varies, I don't always save things there, and when I do it's usually longer articles that I know I won't read if they're stuck in a browser tab.

I've also tried Google Keep, Evernote, Feedly, posting things in my chat room in the hopes they will come up in conversation later. I'm just not good at keeping notes.

The problem with all of these is manual work. When I read something, I'm not considering the conversation I'll have 6 months later when I may want to refer back to it. I read it, absorb the information as best I can, close the tab and move on with my day.

My issue with browser history

A screenshot of a Firefox window with the omnibox filled with "css pixels angle unit" with no results in the dropdown below.
I often find myself desperately typing keywords into Firefox's omnibar to try and find articles I've read. Sometimes it works, sometimes it doesn't.

"css pixels angular measurement" I swear I could remember reading a really nice article about how "px" in CSS does not map to screen pixels. I read it when I was working my first job in London, so around 2019. I don't remember the author's name, I don't remember the blog URL, I don't even remember the title. All I remember are the concepts the article covered and some keywords.

I've been using Firefox for many years, long before 2019, so let's check my browser history...

A screenshot of the Firefox "Library" window which shows history, tags and bookmarks.
lookin like an irc app over here

First of all, what decade is this UI from... it feels like this "Library" window hasn't received an update for many years. Why can't this be part of the main window?

Second of all, what's up with these time frames? I get "Within the last week" but "Older than 6 months" is a little broad. I can't seem to customise these preset time frames either.

There are "tags" which as far as I can tell, require opening up this dusty "Library" window and manually typing in tags separated by commas.

A screenshot of the menu below an item in the Firefox history window showing a tags field with the placeholder "Separate tags with commas".
Who actually uses this feature? Seriously, tweet me if you do...

Chrome isn't much better, though it does have this neat "Journeys" feature which groups items by domain name.

Do you ever go into your history and click these suggested searches?

The core problem with both browsers is that they only index the title of the webpage and the URL. Nothing else. I couldn't remember the title but I could remember a bunch of topics and concepts the article talked about.

Now, I could use Google to search for this article. Google is already doing lots of clever natural language processing and concept mapping. And sure enough, I can find my article easily using the keywords I typed in:

It's a great article despite lacking any actual CSS in the page...

The title was actually "CSS px is an Angular Measurement"

Google found this because "pixels" is often shortened to "px", "angle" is the root word of "angular" and "unit" is somewhat related to "measurement" (depending on how you... measure it)

But the problem with Google is that it's an index of (almost) the entire internet. The internet I've browsed in my life is infinitesimal in comparison. It's also highly relevant in comparison. Google will show results from every website it has indexed but sometimes I don't want that, I want to search through pages that only I have visited and read.

Another problem I've noticed a lot with both Firefox and Chrome is the history is simply flaky. It really is, there's a decent chunk of sites I know I've visited but they're just not present in my history. I want my history to be a full log of everything I've visited.

What I needed was a browser history as intelligent as Google Search.

Architecting a solution

Browser extensions are pretty easy! Right?

I've built browser extensions before, the most complex being Lokalyze and it's not crazy hard. There are a few criteria for this solution:

  • It must index the page content and metadata, not just the title.
  • Text analysis must either be accurate or skipped, no half measures.
  • Data must be stored locally, not in the browser itself (there are size limits)

The first two are fairly easy, I've done a bunch of NLP before. TF-IDF is the standard in text search and most of the work is cleaning up the data.

The last one is where it gets tricky, I've never shipped a desktop app that isn't a dev tool before. This may require a user interface, notarisation, distribution, signing binaries, etc. I could run a cloud service but that's going to require maintenance and privacy/security concerns if I release this to the public.

A screenshot of a system tray menu on Mac OS with just one menu option: Quit
The first version of Wisplight. Yes it's literally just a menu.

Well, that was easy. No user interface, no Electron, no Qt, no Swift, no UWP or cross-platform worries. It's literally a Golang application using a library to create system tray icons and menus. It took me a few minutes to build this.

The next bit is search. I've worked with TF-IDF before but why waste time implementing it myself? Bleve Search is an open source, embedded full text search database written in Golang. I also need a key-value store for the actual metadata that doesn't need to be indexed for searching. For that, I can simply use BoltDB, which is now known as bbolt.

And the extension can just communicate with the desktop application via a simple HTTP API. The desktop app runs a small HTTP server on a high port.

Now all I need is a UI. But why bother building a UI for the desktop when there's already one bolted on to the extension: the browser! Browser extensions can provide "New tab" pages which just means you can supply a HTML file to show when you open a new tab instead of the browser's default.

A screenshot of a new tab in Google Chrome showing the Wisplight Search new tab page with an empty search box and no search results.
The tables have turned, Electron.

Searching is fairly basic at the moment, there's still some work to do around de-duplication and being more intelligent with results. There are lots of things to take into account, like the publish and update dates of articles (sometimes you visit the same URL twice but the content changes due to story developments or new information.)

There are also improvements to do around surfacing the context of a result and indicating why that result was ordered first or second in the list. Analysis picks out keywords and performs basic NLP such as stopword removal and entity recognition. There are quite a few clever things done at indexing time that aren't utilised when searching but it's early days.

A screenshot of a new tab in Google Chrome showing the Wisplight Search new tab page with the query "next seo" typed and several results of GitHub pages with the words "next" and "seo" highlighted in context.
Visiting the same site a few times artificially inflates its presence in results. This can be solved at index time or search time.

Shipping

And yes, I flatly refuse to use the ship, rocket or fire emojis in this section.

Shipping software is still awful. It's just awful in different ways. Each operating system is uniquely awful but they're all equal in their awfulness.

Windows

Which of the 17 app directories do I copy the binary to?

Windows is my main workstation OS, it's the most popular platform in the world and in my experience does a good job of window management, app stability, games, media work and software dev. Software installs are, however, always have been a mess.

There are various installer builders for Windows, each with varying levels of complexity. NSIS, WiX, Inno and of course Windows has its own SDK. These solutions are great for large software packages which need to set registry entries, set up services, write files to various locations, startup entries, start menu entries, etc. But all I wanted to do was copy a .exe file to AppData.

There's also the problem of automation. I don't want to be Parsec'ing into my workstation every time I want to release a copy, I wanted to run it all on GitHub's CI when I tagged a release commit.

NIH Syndrome

Of course like any good typical engineer, I built my own solution.

You know I had to do it

So I built an installer in Golang. It's 60 lines of code and literally copies an embedded file to %APPDATA%. It also has an embedded app manifest (standard on Windows to provide application metadata such as the icon and permission requests)

Simple. And it actually works really nicely!

Though I do need to figure out how to make it run on login still... that's a problem for another day, and I'm pretty sure it's as simple as just invoking mklink with the .exe and the user's Startup directory.

Mac

I paid £79 for disappointment. Thanks, Tim.

As you'd expect from Apple, this is an utter nightmare if you're not 100% in their ecosystem, in which case it's only a small nightmare.

Fortunately, shipping a single static binary as a .app file is fairly easy. The .app file isn't a file it's actually a directory. Seriously! Try running cd /Applications/some.app and you can list the files inside. Finder just treats .app directories special when you double-click on them.

Of course, none of this is documented, you have to just figure stuff out by looking at other apps. Some of the plist format is documented (which is the equivalent of the app manifest on Windows: permissions, icons, metadata, etc.)

The annoying bit is actually distributing the application. I had to line the pockets of Mr Tim Apple to the tune of $99 (£79) just to get a signing certificate so I can "notarise" my app. This basically does some cryptographic mumbojumbo to the .app contents so when people open it, it doesn't show a super scary message telling them I'm going to steal their firstborn. Instead, it just shows a slightly scary message telling them they downloaded it from the internet (in case they forgot.)

I guess this also raises the barrier of entry of malware on Mac. There was a fun article I read years ago (which of course I can't find since I didn't build Wisplight years ago) that was poking fun at how some Mac malware has a beautifully designed landing page. Apple products truly are for the elite I suppose.

Linux

I don't know why I even tried to be honest.

Yeah, no.

My Linux friends just told me to ship the raw binary and don't bother trying to mess around with the four thousand and twenty-one different app stores. I don't expect Linux people to even care about my app since it's not open source tbh.

Safari on Mac and iOS

yes it's actually possible to ship browser extensions for Safari! and yes it's a pain!

Safari, being the special snowflake of the browser world can't simply load a manifest or .crx file. You have to use XCode to build a fully fledged Swift app which embeds your extension as a resource and then loads this into Safari at runtime. Which means, more stuff to notarise, more moving parts and less stuff you can automate on CI.

But the result is amazing, having my browser history on iOS get saved to my custom full text search database and then new tabs on iOS Safari opening my custom search page is genuinely awesome.

iOS will have to wait

There's a minor detail here though which means iOS support for Wisplight won't be happening any time soon. Because Wisplight is, initially at least, primarily a privacy focused desktop application, there's no (easy) way to securely transfer your history from your iPhone to your computer.

I have built a very basic API which I spun up on Fly.io and set my iOS version of the extension to talk to - and I will likely use this as my own personal tool.

But this won't be available for the general public to use for one reason: privacy. I have to be very careful balancing the value proposition here because for a lot of people, the idea of their entire browser history (full URLs) being uploaded to a server someone else controls is an immediate turn-off.

I've discussed this with a few friends, both privacy minded and not and the general consensus is they're willing to sacrifice some feeling of privacy if the user experience is great, the product and monetisation model feel "geniune" and the problem is solved sufficiently (aka: it helps you find old websites you visited.)

There are a few solutions to this that I've thought of, ranging communicating the privacy impacts to users so they know exactly what's happening to using encryption somehow (not sure how yet...)

But this just isn't something I want to spend time on until I know for sure there's a market for a cloud offering. For now, I am focusing on building a really good desktop experience with a narrow target audience and maybe exploring monetisation in that region first before expanding towards subscription SaaS.

That sounds great, where do I sign up?

This is the part where I build an email list and spam you with my course*

*i don't have a course to spam you with, but it would be neat to stay in touch if you like what I do!

Closed Beta

I'm going to run a closed beta of the app first to work out all the weird issues that are inevitable when shipping to the desktop. So please fill in the form below to signal your intent!

Note for early bird users: if I ever do decide to monetise this as a product (I'm undecided) all early bird users will never have to pay for the desktop app!