Korny’s Blog

Playing with Rust and Copilot

2023-10-31T00:00:00+00:00

The world is full of AI hype aright now - and often it’s blown out of proportion, both on the “everything is fine” side and the “machines are going to take over” side. I don’t have time to write about my opinions on this properly!

I just thought I’d post my brief experience of tinkering with Github Copilot and a bit of simple rust code.

I have no interest in letting AIs generate huge amounts of code - that seems like a path to bloat and technical debt in my mind. But I can see value in using AIs like a slightly smarter autocomplete - and I thought I’d give it a go on some Rust code. This interested me partly as I like Rust - but also as it’s a fairly new language so Copilot can’t just slurp down vast amounts of other people’s code.

My problem is simple - I want to parse some markdown with frontmatter yaml data - stuff like:

---
title: 'my amazing file'
foo: bar
---
everything is _awesome_

I had some old code that wrote files like this, but nothing to read them, so I thought I’d take my existing structs as a starting point:

#[derive(Serialize, Deserialize, Debug)]
pub struct Metadata {
    pub title: String,
    #[serde(flatten)]
    pub extra_fields: HashMap<String, Value>,
}

#[derive(Debug, Clone)]
pub struct MarkdownFile {
    pub path: PathBuf,
    pub front_matter: Metadata,
    pub body: Vec<String>,
}

impl MarkdownFile {
}

Inside the implementation I typed:

pub fn read(

and it magically suggested a function:

pub fn read(path: PathBuf) -> Result<Self> {
    let contents = std::fs::read_to_string(&path)?;
    let (front_matter, body) = Self::split_front_matter(&contents)?;
    let front_matter: Metadata = serde_yaml::from_str(&front_matter)?;
    Ok(Self {
        path,
        front_matter,
        body,
    })
}

This doesn’t work immediately, but that’s fine - it’s good enough:

the function signature was OK (probably should be &Path though)
it reminded me how to slurp a file as a string
It suggested how to split the file into two parts (the split_front_matter function doesn’t exist - it’s just a suggestion)
It worked out that I wanted to get the front matter as YAML. I didn’t specify that - it must have inferred that from other markdown-parsing code.

I could have written all that myself, but it saved me some boilerplate typing, and guessing at API signatures.

However - then I asked it to generate split_front_matter:

pub fn split_front_matter(contents: &str) -> Result<(String, Vec<String>)> {
    let mut lines = contents.lines();
    let mut front_matter = String::new();
    let mut body = Vec::new();
    let mut in_front_matter = false;
    while let Some(line) = lines.next() {
        if line == "---" {
            if in_front_matter {
                break;
            } else {
                in_front_matter = true;
            }
        } else if in_front_matter {
            front_matter.push_str(line);
            front_matter.push('\n');
        } else {
            body.push(line.to_string());
        }
    }
    Ok((front_matter, body))
}

This was less good. It doesn’t work! (I wrote a unit test to prove this - I wonder if Copilot would have done better if I’d TDD’d it?). Like a lot of AI generated stuff, it looks OK - confidently OK - but the break is wrong, it should be setting in_front_matter to false. Also it doesn’t handle several edge cases like --- inside the markdown body.

Also it’s pretty ugly C-style procedural code. You can do this much more nicely with some iterators and splitting:¹

pub fn split_front_matter(contents: impl AsRef<str>) -> Result<(String, Vec<String>)> {
    let Some((prefix, frontmatter, body)) =
        contents.as_ref().splitn(3, "---\n").collect_tuple()
    else {
        return Err(anyhow!("No front matter"));
    };
    if prefix != "" {
        return Err(anyhow!("text before front matter!"));
    }
    Ok((
        frontmatter.to_string(),
        body.lines().map(|s| s.to_string()).collect(),
    ))
}

My conclusion so far from this tiny sample - Copilot is handy for this for a minor IDE boost for simple boilerplate code, but definitely not to be trusted for anything longer; at least not in rust.

I also added error checking, a more flexible contents parameter, and a minor cheat - I used collect_tuple from the itertools crate rather than doing more messy iterator-to-variable processing ↩

New job, new blog!

2023-10-14T00:00:00+01:00

I quit my job! Yes, after 12 amazing years at Thoughtworks, I decided it was time to move on.

I’m starting a new job in November - more on that later - but in the meantime I was a bit embarassed at they styling here - I’d had fun hacking something together using Tufte CSS but it was never quite right, with issues here and there with styling and layouts… and it used the venerable Ruby middleman blogging engine, which I’m sure is fine - but Jekyll seems to have taken over as a common default.

So I’m rebuilding everything in Jekyll - it’ll be fairly vanilla using Minimal Mistakes for now, but over time I will fiddle with the defaults. But in the spirit of incremental iterative working, I’ll try to do just enough to get it out with nothing broken, then improve over time.

Interesting folks to follow on Mastodon

2022-12-01T09:27:00+00:00

My highly-idiosyncratic list of interesting people on Mastodon

I’ve been really enjoying moving from the crumbling mess of Twitter, to the chaotic federated world of Mastodon. Plenty of other people have talked about the reasons, I’m not going over them all here.

However, one complaint I’ve heard is “I can’t find anyone to follow” which I find quite surprising, as I’ve found too many people to follow! So I thought I’d post a short¹, opinionated list, to help lure people in.

This is mostly tech folks - as that’s my interest area, and Mastodon is mostly attracting tech folks and activists right now. But there are quite a few random non-tech folks too - sorry, it’s a bit of an idiosyncratic list!

Aside - how to follow people

I thought I’d call this out separately, as it’s sometimes a bit fiddly, especially for new users.

This assumes you are using a browser. Mobile users can’t easily cut and paste URLs, so following someone not on your instance depends a bit on the app you use.

For a user @fergee@one.mega.cities their instance is one.mega.cities and that is where most of their information lives. You can browse their details with the url https://one.mega.cities/@fergee - and there is a Follow button there, but it’s not so easy to use if you are not also a one.mega.cities user. (the behaviour seems to differ by instance - or maybe it’s mobile vs browser? Some say ‘paste this url into search’ and some let you re-log in to your home instance)

The easy way to manage this is to find the user’s page on your instance. This is usually a URL like https://my.instance.name/@fergee@one.mega.cities though some have slightly different paths. You can manually make this URL in a browser address bar by just concatenating the https://my.instance.name/ part with a mastodon handle (starting with a @). Or you can build them using a tool like a spreadsheet, if you want.

Or, you can just paste the user’s handle @fergee@one.mega.cities into the search box on your main mastodon page, and search ‘accounts’, and you should end at the same user page on your own instance.

Once you get to the person’s page on your instance, you can just click ‘follow’ and a follow request is sent. Some people auto-accept follows, some manually check you aren’t a bot or a nasty person or whatever first.

Once you are following them, you can also add and remove people from custom user lists from the same page. But that’s a different subject.

My list

Some of these people are quite active, some are just lurking for now. Some of these people are also dual posting to Twitter, which seems fair enough if you have a big follower count.

I’ve sorted by number of updates, which will be biased towards really verbose people and those who’ve been on Mastodon a long time. But it’s easier than other categorisations - what makes someone ‘interesting’?? (The count includes posts and ‘boosts’ which are like re-tweets, so chronic over-sharers will have inflated update values)

I note that there are 5 signatories of the Agile Manifesto in this list!

Name	Handle	URL	Followers	Updates	Who
Cory Doctorow	@doctorow@mamot.fr	https://mamot.fr/@doctorow	5988	83968	Author, journalist, blogger
Eugen Rochko	@Gargron@mastodon.social	https://mastodon.social/@Gargron	262050	72812	Mastodon lead developer
Christine Lemmer-Webber	@cwebber@octodon.social	https://octodon.social/@cwebber	8668	33216	Developer, CTO, co-author of ActivityPub which underpins all of this
Cory Doctorow’s linkblog	@pluralistic@mamot.fr	https://mamot.fr/@pluralistic	27872	31622	This is his link-dump user, it is very very verbose!
Terence Eden	@Edent@mastodon.social	https://mastodon.social/@Edent	3304	4192	Unicode, W3C, open standards, cybersecurity
Lesley Carhart	@hacks4pancakes@infosec.exchange	https://infosec.exchange/@hacks4pancakes	21838	2446	Cyber security expert
David Gerard	@davidgerard@circumstances.run	https://circumstances.run/@davidgerard	1737	1634	Blockchain skeptic, writer, journalist
Leo Laporte	@leo@twit.social	https://twit.social/@leo	16163	979	Tech podcaster
Lisa Crispin	@lisacrispin@mastodon.social	https://mastodon.social/@lisacrispin	845	634	Agile tester, consultant, author
Timnit Gebru	@timnitGebru@dair-community.social	https://dair-community.social/@timnitGebru	14164	631	Computer scientist, diversity advocate, AI research
Brian Marick	@marick@mstdn.social	https://mstdn.social/@marick	723	600	Developer, podcaster, Ruby, Testing, thinking
Scott Hanselman	@shanselman@hachyderm.io	https://hachyderm.io/@shanselman	22020	599	Tech geek at Microsoft, speaker
Ian Betteridge	@ianbetteridge@mastodon.me.uk	https://mastodon.me.uk/@ianbetteridge	2896	568	Tech writer, the one named in “Betteridge’s Law”
Tim Bray	@timbray@mastodon.cloud	https://mastodon.cloud/@timbray	8347	533	Dev, activist, XML co-author
Adrian Cockcroft	@adrianco@mastodon.social	https://mastodon.social/@adrianco	1290	471	Analyst / Architect / DevOps, ex Netflix and many others
Charlie Stross	@cstross@wandering.shop	https://wandering.shop/@cstross	9789	349	Sci-fi/Fantasy author
James Gleick	@JamesGleick@sciencemastodon.com	https://sciencemastodon.com/@JamesGleick	8121	339	Author ‘Chaos’ and other books
Dr Sarah Hendrica Bickerton	@sarahhbickerton@mastodon.nz	https://mastodon.nz/@sarahhbickerton	1227	305	Sociology and Public Policy lecturer
Taylor Lorenz	@taylorlorenz@mastodon.social	https://mastodon.social/@taylorlorenz	72915	273	Tech columnist at the Washington Post
Matthew Skelton	@matthewskelton@mastodon.social	https://mastodon.social/@matthewskelton	1019	270	Team Topologies co-author
Jamie Zawinski	@jwz@mastodon.social	https://mastodon.social/@jwz	4836	264	Programmer, blogger, Netscape and Mozilla dev
Mary Robinette Kowal	@maryrobinette@wandering.shop	https://wandering.shop/@maryrobinette	2881	216	Sci-fi author
Julia Evans	@b0rk@mastodon.social	https://mastodon.social/@b0rk	14195	193	Programmer, speaker, tech zine person
Elisabeth Hendrickson	@testobsessed@ruby.social	https://ruby.social/@testobsessed	1018	187	Tester, author, change maker
J. B. Rainsberger	@jbrains@mastodon.social	https://mastodon.social/@jbrains	528	178	TDD person and awesome dev advocate
Molly White	@molly0xfff@hachyderm.io	https://hachyderm.io/@molly0xfff	31191	171	Wikipedia author, crypto skeptic
Brendan Eich	@BrendanEich@mastodon.social	https://mastodon.social/@BrendanEich	1788	163	Creator of Javascript
Neil Gaiman	@neilhimself@mastodon.social	https://mastodon.social/@neilhimself	160905	156	Author - Sandman, American Gods and many more
Chad Loder	@chadloder@kolektiva.social	https://kolektiva.social/@chadloder	18549	135	Activist, cybersecurity expert
Martin Fowler	@mfowler@toot.thoughtworks.com	https://toot.thoughtworks.com/@mfowler	9578	119	Tech loudmouth at Thoughtworks
Kelsey Hightower	@kelseyhightower@mastodon.social	https://mastodon.social/@kelseyhightower	8987	114	Google k8s dev, advocate, speaker
Josh Graham	@delitescere@mas.to	https://mas.to/@delitescere	101	96	Semi-retired CTO, speaker
Emily Webber	@ewebber@mastodon.social	https://mastodon.social/@ewebber	508	92	ex-GDS agile, Communities of Practice author
George Takei	@georgetakei@universeodon.com	https://universeodon.com/@georgetakei	226312	89	Star Trek Actor, Activist
Kevlin Henney	@kevlin@mastodon.social	https://mastodon.social/@kevlin	978	89	Speaker, author, thinker
Pamela Fox	@pamelafox@fosstodon.org	https://fosstodon.org/@pamelafox	1220	86	Python / Cloud advocate and teacher
Charles Oliver Nutter	@headius@mastodon.social	https://mastodon.social/@headius	1133	83	JRuby / JVM dev
Tom Lyon	@aka_pugs@mastodon.social	https://mastodon.social/@aka_pugs	1110	78	Old-school Unix coder, computer historian
Stefan Tilkov	@stilkov@innoq.social	https://innoq.social/@stilkov	1293	74	CEO / Principal Consultant at INNOQ
Brianna Wu	@briannawu@mstdn.social	https://mstdn.social/@briannawu	10746	72	Game writer, activist
Karen James	@kejames@mastodon.online	https://mastodon.online/@kejames	5579	72	Environmental geneticist, social justice advocate
Eric Meyer	@Meyerweb@mastodon.social	https://mastodon.social/@Meyerweb	3765	66	CSS standards advocate
Jessica Kerr	@jessitron@hachyderm.io	https://hachyderm.io/@jessitron	2637	58	Software developer, speaker, symmathecist
William Gibson	@GreatDismal@mastodon.social	https://mastodon.social/@GreatDismal	20201	50	Cyberpunk author
Joanne Harris	@joannechocolat@mastodon.online	https://mastodon.online/@joannechocolat	5865	45	Author of Chocolat, chair of Society of Authors
Paul Irish	@paul_irish@toot.cafe	https://toot.cafe/@paul_irish	3928	42	Chrome, Javascript, CSS developer and advocate
Paul Krugman	@pkrugman@mastodon.online	https://mastodon.online/@pkrugman	25238	41	Economist
Pragmatic Andy	@PragmaticAndy@mastodon.social	https://mastodon.social/@PragmaticAndy	748	40	Author and publisher
Ron Jeffries	@RonJeffries@mastodon.social	https://mastodon.social/@RonJeffries	1666	38	XP author and inventor
Dave Snowden	@snowded@mas.to	https://mas.to/@snowded	566	33	Cynefin author and thinker
Amanda Palmer	@amandapalmer@home.social	https://home.social/@amandapalmer	2275	31	Musician, writer
Trisha Gee	@trishagee@mastodon.social	https://mastodon.social/@trishagee	1312	28	Developer, author, Java advocate
Katie Mack	@AstroKatie@mastodon.social	https://mastodon.social/@AstroKatie	23469	23	Astrophysicist
Greta Thunberg	@gretathunberg@mastodon.nu	https://mastodon.nu/@gretathunberg	71468	20	Climate activist
Martin Kleppmann	@martin@nondeterministic.computer	https://nondeterministic.computer/@martin	1464	19	Author “Designing data-intensive applications”
Kent Beck	@kentbeck@hachyderm.io	https://hachyderm.io/@kentbeck	2195	17	Extreme Programming author
Michael Brunton-Spall	@Bruntonspall@octodon.social	https://octodon.social/@Bruntonspall	212	16	Civil servant, ex-GDS thinker, infosec
Antirez	@antirez@mastodon.social	https://mastodon.social/@antirez	1707	12	Redis creator
Dave Farley	@davefarley77@techhub.social	https://techhub.social/@davefarley77	46	11	Continuous Delivery and Software Engineering author
Felienne Hermans	@Felienne@mastodon.social	https://mastodon.social/@Felienne	1281	5	Scientist, researcher, SE Radio podcast host
Esther Derby	@estherderby@mstdn.social	https://mstdn.social/@estherderby	228	4	Agile author, thinker, change maker
Charity Majors	@mipsytipsy@hachyderm.io	https://hachyderm.io/@mipsytipsy	63	2	Software engineer and CTO of Honeycomb
Daniel Terhorst-North	@tastapod@mastodon.social	https://mastodon.social/@tastapod	1553	1	Agile guy, speaker, BDD, CUPID
Rebecca Parsons	@rjparson@toot.thoughtworks.com	https://toot.thoughtworks.com/@rjparson	111	1	Thoughtworks CTO, dev, speaker
Robert Virding	@rvirding@fosstodon.org	https://fosstodon.org/@rvirding	63	0	Erlang author

It’s not that short, I’ve just dumped a couple of my lists and cleaned up the names - making a really short list would be more work! ↩

Buying minecoins on a child’s Android Minecraft account

2022-11-10T11:01:00+00:00

Sharing here as this was a world of pain, and maybe I can save someone else this pain.

Information is as of November 2022 - earlier advice on the web is wrong, and this may well be wrong in the future too.

So, a friend gave us some cash as a present for our son and said “let him spend it in Minecraft” - sounds simple enough, right? He is a huge Minecraft fan, plays it a lot on his tablet.

The official way is to open up Minecraft, click “add coins” and pay with the device’s payment method. We have a Google Family setup, which means in theory he can make payments but only with parental approval, and then payments can come out of the family account.

However - at least for our particular set of accounts, this just doesn’t work. Payment doesn’t work - even if I totally disable parental controls and say “he can make any payments”, once you navigate through to payment you get a message “to complete this transaction secure your account” - with no indication of how to do this. 2FA? Something else? Nobody else seems to have an answer to this out there either.

So, I did some googling and found people saying “Add money to the child’s Microsoft account and then buy the minecoins online” - that makes perfect sense, right? Nope. Maybe this worked once? I put £10 in his account fine - but there is no way, as of this date, to actually buy minecoins from this money. There is helpful advice from various help forums, and none of it works. I dug far enough to find a single “buy 500 minecoins here” page, which looked promising - but even there, if you click the “buy” button, you get an error “please contact Microsoft support”. Yay.

(I wonder if there is any way to get that £10 back? I guess one day when he’s old enough for a computer, he can spend it on something else)

Anyway, this used up almost an hour of fiddling, and with a complaining grumpy child saying “does it work yet?” (sigh - tactful polite behaviour isn’t a thing at 5 3/4) and I gave up.

This morning I had the realisation - there isn’t a link between the Android user and the Minecraft/Microsoft user. So I can bypass the whole parental control stuff by logging in to Android as me, and Minecraft as my son!

I tried it, and there is one wrinkle - Minecraft won’t let you switch user easily! If I open Minecraft, choose “log out” then “log in” - it opens a browser window which remembers my previous login, and has no way to change it!

I did get past this - open a phone browser to https://xboxlive.com, log in, then choose “log out” and it deletes whichever cookie they use to track me.

Then I could log in to Minecraft as my son, choose “add coins”, and payment came out of my Google Pay account. Finally. And when I open Minecraft on my son’s tablet, I can see his precious minecoins there and spendable. I bet he spends them on something he immediately regrets…

So, the TL;DR is - if you want minecoins for your child:

Log in to an Android device as an adult
Log in to Minecraft as the child (see above for clearing your old login)
Buy coins!

Is Mastodon a Twitter replacement?

2022-11-05T21:10:00+00:00

It depends which bits of Twitter you want

I’ve been back on Mastodon in the past week, given the Twitter mess, and I’ve come to the realisation that I have two main patterns of Twitter usage - and Mastodon is a great option for one of them.

My Twitter usage is a mixture of:

Connection with enthusiasts - all kinds of enthusiasts. Tech people, political wonks, academics, climate activists, parents. Sometimes I’m just following people, but I often actively participate - these feel like peer relationships. Sure, sometimes the person I respond to is an expert in the field and I’m just a dabbler! But the key part is, it’s a conversation and active.
Following the zeitgeist. News updates, what famous or influential people say, huge trending events and movements, but generally chosen by “the algorithm” not me. Generally I’m a passive consumer not a participant - sometimes I’ll quote-tweet one from this category, which might move it into category 1 above if someone I know responds

Mastodon looks great for category 1 - the enthusiasts. I can talk to people, I can tweak my feed to follow topics, I can connect with other likeminded people. However it involves an investment of some effort - I need to curate who I follow, build searches and lists (more on that below). But already I’m seeing a ton of great interesting stuff, especially with the recent influx of new people - even if they are just dual-posting, I prefer to see their content on Mastodon where I have more control. (Also I note it tends to be a lot more positive and creative than Twitter - maybe as “the algorithm” emphasises negative content to drive conflict?)

Mastodon is unlikely to work for category 2 - the Zeitgeist. For one thing, there is no “algorithm” - nothing selects and filters content for me. I can follow a lot of celebrities and look at trending hashtags, but again, this is extra effort that a lot of people might not bother with. Mastodon is not going to say “Hey, Stephen King said this funny thing”. (I can get a bit of the zeitgeist when a person I follow boosts something - if one of my friends boosts Stephen King I’ll see it, but it won’t then decide “Korny likes Stephen King” unless I follow him)

But also - I don’t think it will appeal to typical non-enthusiast users. Already I’m seeing “this is too hard, I need to choose an instance??” posts. Which is utterly fair - if you want a global zeitgeist feed, you don’t want to have to spend hours fiddling with configuration! Centralised for-profit sites like Twitter or Reddit or others work well here - and they have slick user experience and no strange network effects or local server forks or any other background stuff you need to learn.

Mastodon is also a bit flakey. It is free and volunteer run. It depends on admins getting funding from donations, and those admins can burn out or not respond as fast as users want or have moderation views that don’t match yours. There are mitigations - it’s easy to migrate instances (and all your followers are auto-redirected!) and I suspect we will see more corporate or organisation servers over time too (like the wonderful toot.thoughtworks.com or the new EU instances )

But because of the above, I suspect it won’t work for a huge proportion of Twitter users - people who just want an easy way to see what is happening, and of course famous people who want an audience of followers with nice simple reliable commercial backing. Without the “non-enthusiast” users, Mastodon has less appeal for celebrities and it’s just going to be a different kind of place.

Thankfully, there are millions and millions of enthusiasts out there - already there are plenty on Mastodon to make it worthwhile, at least in my (admittedly geeky) interest areas. For me, it’s not “will Mastodon replace twitter?” - it already is replacing this part, and it turns out it’s the part I value most.

Mastodon still has plenty of hurdles - for a start it could do with some extra ways to curate the firehose of information. If I want to follow @foo_bar@geek.social who posts some cool tech stuff but also hourly cat pictures, and also follow Martin Fowler who posts something interesting every day or two, Martin’s posts can get easily swamped. (My current fix for this is to use lists to categorise people I want to see more often, but it’s a bit clunky)

Maybe something else will come along that does it better; maybe Mastodon will flounder under the weight of it’s own temporary success and users will move on. But at the moment it’s looking pretty good, at least for my needs.

And I’ll keep Twitter for where it’s useful (if it stays up!) - though I’ll also keep my eyes out for less commercial, less distorting, less dominated-by-horrible-people zeitgeist sources.

New polyglot code tools releases

2022-10-13T10:13:00+01:00

My sabbatical is winding up, I naturally got far less coding done than I expected! Our lovely daughter has had a big sleep regression, so a lot of my focus has been on just getting through life rather than perfecting my code.

Still, I have actually achieved quite a lot, now that I go back and look at it - so I thought it was time for an updated blog post.

See My initial announcement or The main Polyglot Code Tools site if you want more background.

Major changes:

Viewing activity by teams
Saving and loading settings
Moving to TypeScript for the explorer
Quite a few refactorings

Viewing by Teams

I put this at the top as it is the area that is most useful for users!

I’m a huge fan of the Team Topologies book - the Team should be the core unit of delivery in a well functioning agile organisation.

So, when investigating codebases, I wanted to be able to tell which teams were operating in which areas, and how they overlap. The end result is a view like this:

This does need a caveat though - you need to create team information yourself! Git doesn’t tell me which user is in which team. (In fact you also need to do a fair bit of work merging users, as git doesn’t tell me that foo@bar.com is actually the same user as Fulvio_Barrington@gmail.com …)

There is also a view that tries to show where multiple teams overlap, using SVG patterns - this is a bit experimental, but might be useful:

And you can focus on a single team (or a single user!) to see just their contribution compared to everyone else:

Here blue is the selected team, red is other users, and colours in between show overlap. Also brighter colours show more change, darker show less.

Saving and loading settings

Creating teams is a fair bit of manual work, and the Explorer, prior to version 0.6.0, was entirely stateless - there was no way to save that work!

Now, you can save the user and team settings, as well as all the other explorer settings, to JSON files or to browser local storage. See the docs for more.

Moving to Typescript

The explorer was originally written in pretty hacky JavaScript, with quite a bit of sloppy code - this is the side project of a busy parent after all! However, I felt the need to clean things up, and also to learn TypeScript after all the good things I’d heard about it - so did the painful job of rebuilding with types.

And it was pretty painful in places. I do love TypeScript now - it’s a brilliant way to apply flexible types to a pretty terrible language. But some things needed a quite different approach - and some areas, such as D3 visualisations, had almost no documentation at all. D3 does have types - but very very few examples use them, and I had to do a lot of reading source code and relying on VSCode’s excellent TS support to get it all working.

This does however mean that the code is a lot cleaner - I even have a few tests now! So future changes will be less painful and less risky.

Other refactorings

I won’t go into all the details here, but I also took the chance to clean up a bunch of code.

On the rust side, I got rid of a lot of somewhat dubious generic logic I’d written using JSON Value types. I’d foolishly tried to make the code too generic - I don’t know why after 30+ years of coding I still make the same mistakes - I need to have “YAGNI” as a tattoo, just to remind me to keep things simple.

I also enabled all the linters and checks I could, both in rust and TypeScript. Honestly this is one of the biggest coding improvements I’ve seen in the past decade or so - automated tooling has gotten so good at spotting errors and non-idiomatic code, it is wonderful, especially for learning languages.

Looking to the future, and for feedback

I’m going back to work in a couple of weeks - yay! (actually I do miss it - especially interacting with people outside my family). But I will keep making changes, as I can.

I have a long-term plan to rework the whole Voronoi layout tool - that’s probably the next thing on my list. But I’d love feedback if people are using this - what is good? What sucks? What would you like to see?

This blog has Disqus comments, but honestly I don’t read them much - probably better to chat to me on twitter or mastodon or face to face! Or you can raise issues on github for specific bugs.

A geeky kind of sabbatical

2022-09-12T15:31:00+01:00

A geeky kind of sabbatical

I am on sabbatical! After 10 years at Thoughtworks, I’m getting a nice long break. (I’ve actually been on sabbatical for a few weeks - school holidays plus usual procrastination delayed this post…)

I had to decide, a while ago - what would I do on my sabbatical? Some people use them to travel, to see the world, to expand their horizons - but I have two small kids, so that really didn’t sound much like it fit this stage of my life!

Instead, I wanted to think about things I could do while largely staying at or near home. On thinking further, my main life goals at the moment are:

Mental health
Physical health
Learning and self-improvement
Family and family maintenance

Several of these are overlapping - I started on a venn diagram but it got a bit messy!

However, I’m also a big geek - one thing that satistfies my “Mental health” and “Learning” areas is to actually write some code. To build something of value, and share it with others. This is especially the case as I haven’t been on a proper software delivery project since 2020 - I’ve done some fascinating work, but I have felt a bit disconnected from the art of writing code.

So - a major goal of my sabbatical is to try to make some big improvements to my pet project, the Polyglot Code Tools. It might not sound much like a holiday to some! But I love it.

I’m also doing plenty for the other categories - I’m back doing weekly yoga (which is awesome), I had a great summer holiday with the kids and my mum, and I’ve done lots of life-admin tasks that I won’t bore people with here. I’m also going to try for a few long bike rides - but right now I’m scratching my “I want to code” itch!

I have a few big epics planned around my tools - some of them are already done or nearing done! For example, you can now assign users to teams, and visualise which team has changed which areas of code most in a particular timespan:

This was from code I hacked together at a client, I’ve now re-written it cleanly, in TypeScript, and it is mostly working.

Work in progress

Moving to TypeScript - done! Some areas were quite tricky - I should blog about TypeScript…
Saving explorer settings to the browser or to file - done! This is essential as the UI grows - otherwise every time you reload the browser, all config like user teams would be lost.
Creating and visualizing teams - mostly done! Most of what remains is UI tweaks - letting users see teams in more contexts, for example.

Planned epics

Filtering the UI by folder and/or programming language
- this can really help speed on a large codebase. Sadly the layout can’t change on filtering, but that doesn’t matter for a lot of use cases.
Making it work without git - ideally adding support for other SCMs, but “not crashing the UI if there is no git data” would be a good start. And skipping git scanning would be a good way for users to get faster feedback.
Rewriting the Voronoi layout in rust / webassembly
- This is a big one - I’ve actually started a while ago, but it’s tricky, especially as the JavaScript code I use currently is (a) quite prone to crashing, and (b) very much not suited to a rust-style language - lots of random state fiddling all over the place.
- However, I suspect the result would be drastically faster - not just because rust, but also as I could ditch lots of time-consuming error-handling retries
If it is fast enough, embedding layout in the Explorer so you can change layout at run-time. This would make the system much more usable and the feedback loop for things like changing file ignore patterns much tighter.
Reading the research for more ideas! I have a number of academic papers I picked up over time, and a number of great Data Visualisation books - I want to mine those for ideas of value.
Much better documentation - I’d like an introduction video, for instance, for people who learn from videos better than words. And a matching step-by-step written guide.

Possible epics

Using a different lines-of-code tool. I’m using a fork of tokei which is nice and fast - but I had to fork it because I wanted to be able to strip comments from the code for complexity measures. And maintaining the fork is annoying. There are many other multi-language parsers out there, such as tree-sitter - these might also let me do some more complex metrics like class/method length, while still supporting a lot of languages.
Using a data server rather than JSON files. This would take away some of the unix-style simplicity of the tool - and make it harder to run in locked down environments. But it also might add a lot of power - some calculations could be offloaded to the server, allowing for things like actually observing code as it existed at a particular point in time. And the layout engine could run as compiled multithreaded rustm not as webassembly in a browser.
Using other layouts than Voronoi trees.
Other tools! I have a git log visualisation thing I built ages ago - it’d be great to rebuild something similar with new tools.

Important tasks but not really epics

Fixing publishing binaries (the tools I used to use have died!) - probably using github actions
Adding unit tests to the Explorer - yes, I’ve been lax here. The rust code is tested and mostly TDD, but the UI involved a lot of UI tweaking that just wasn’t worth testing. Nowadays there is plenty of quite testable logic as well - but having started with no tests, it’s hard to course correct.

Please send me your ideas

If anyone out there has used the polyglot tools, or might be interested but can’t for some reason, I’d love to hear your thoughts. What might you be interested in seeing in a code visualisation tool?

Also, I’d love suggestions of open-source code I can look at as examples. I can’t publish examples based on client code, so I’d like more real-world projects that are a bit like business code:

Multiple interrelated repositories - it’s nice to try out how the temporal coupling features work - so far they haven’t been awfully useful, but I’m hoping to find places where they are of value
Lots of teams of developers. This is a big issue with open-source samples - so much is done by individual contributors. In day-to-day work we like to work with teams as the unit of software delivery, so I’d love places where visualising teams makes sense.
Lots of languages. This isn’t hard really, almost everyone has a mix of languages these days. But it’d be nice to have some languages that lack existing tools, like SQL…
and of course, lots of code and years of git history. (but not too much code - I tried running against the linux kernel, and it works, but takes a long long time)

If you have any ideas, please contact me on mastodon or twitter (you can comment on this blog too, but it doesn’t get checked all that often!)

The inevitable update

Quick update 28th September - I really didn’t anticipate just how much impact sleep deprivation from our lovely daughter was going to impact my plans. I’ve done quite a bit (more blog updates to come) but I also need to lower my expectations a bit - I am focusing more on mental health and getting the family through a difficult year, than getting a vast amount done.

So I’m going to release some cool stuff, update the docs, and all that. But some of the later epics might have to wait a few more months / years!

Hiatus

2022-01-26T20:31:00+00:00

Long time no blog

I don’t often post personal things here, but I thought after a 15 month gap I should put something up.

We are all well and happy here, but since late 2020 I was incredibly busy at work - working on a “Red team” review of a major UK public sector project which was in real trouble, and then working with the organisation to help plan and build a complete reset of the project. It was a challenging and interesting and a great learning experience - but it also used up all my energy and spare time not already dedicated to family, so not much side-project work or blog ideas made it out.

Then in November 2021 our other side-project happened - we adopted a child! I’m not going to share details in a public medium, but it’s been a wonderful exhausting journey. I’m half way through 6 months of Adoption Leave, and things are sort-of almost calming down a bit - enough that I can start thinking about the year ahead, and maybe update this blog!

So - I might turn some draft posts into real posts at some stage; and I’m trying to get back into technical reading, when I have the time and brainpower.

Currently I’m reading Information Visualization: Perception for Design by Colin Ware - which is fascinating, an in-depth academic text on human visual perception and cognition - it’s basically covering the low-level stuff that our bodies and brains do to process visual info, with an eye on how this impacts the world of visualization.

There is so much here, backed by so much research - it’s not all immediately applicable to people building D3 interfaces, necessarily - you can get that from lots of other books. But if you want to know a bit more about why you might want to prefer one kind of colour scheme to another, or why visual programming languages might not ever match the hype, or how to support colour-blind users (and why some small proportion of the population can see colours that nobody else can!), or a million other fascinating facts about brains and eyes - it’s a really interesting read. If very dense to read through blurry, sleep-deprived eyes.

Anyway, enough blathering - just thought I’d post a “still alive” notice. There may be more posts soon, or it may be more months of silence here!

Introducing the Polyglot Code Explorer

2020-09-06T19:56:00+01:00

If you want a quick look at the explorer, you can see a simple demo here or a more complex one here. There is also a documentation site at https://polyglot.korny.info (currently a work-in-progress).

Welcome to the Polyglot Code Explorer

The Polyglot Code Explorer is an open-source tool for visualising complex codebases written in multiple programming languages.

In this article I am going to explain its purpose, how you can run it yourself, and what it does.

What is it for?

Fundamentally, I wanted to answer the question:

How can we visualise large codebases without needing complex language-specific parsers and logic?

Partly I wanted to easily spot toxic code - my colleague Erik Dörnenberg wrote some great articles on Toxic code visualisation and I wanted a way to spot some of these problem areas myself.

But also, I just wanted to be able to explore the code quickly. I’m a visual thinker, so my main focus is on visualisation - especially when trying to spot patterns in millions of lines of code.

It is far quicker for me to look at a diagram and see some unusual colouring in one area, than to see the same information in a table of numbers.

Why polyglot?

Polyglot means “speaking multiple languages” - in this case, it means these tools should work, to some degree, for any text-based programming language.

I’ve worked in many programming languages over the years, and a lot of them don’t have good or easy code quality tools - either they are too new for a community to have built them, or they are from ancient projects where even if such tools exist, getting them up and running is a headache. And each tool probably produces different metrics in different formats - it’s hard to get any sort of big-picture view.

Also many real world systems don’t use a single language - often it is better to use specialist languages for different tasks, rather than one general-purpose one. For example one project might have a UI built in JavaScript and HTML, a microservice built in Kotlin and a platform automation tool build in Rust.

Also I was inspired by reading Adam Tornhill’s book “Your code as a crime scene” - he talks about all the things you can learn from really simple metrics like lines of code, and indentation, and change history. None of these need a complex language parser - and complex language parsers tend to be touchy and flaky. Most of my code uses no language parser at all, or just a very simple which can distinguish code from comments.

And finally - supporting all the various languages out there is a lot of work! Quite a few of the other tools I found linked from Erik’s articles, and elsewhere, seem to have parsers for a number of languages - but progress is slow, and often they don’t keep up with new languages or language changes. Staying largely language-agnostic makes it much easier for me to maintain my code, and not have to worry about it stagnating.

How to run the Explorer

The explorer is actually the front end component of three tightly coupled applications:

The Polyglot Code Scanner is a rust application, which scans the source code and produces a JSON data file
The Polyglot Code Offline Layout tool is a node.js script which adds layout information to the JSON data file
The Polyglot Code Explorer is a react/D3 web app which provides the user interface for exploring the code

The code is open source, you can find it on GitHub:

I should add a disclaimer - I am not a rust guru, and I am definitely not a react guru! This is side project code, not commercial-quality - it may well have bugs, mistakes, ugliness, and it has far less testing than I’d usually expect :)

You may prefer to run these tools from source code - not all the executables have been tested on all platforms! There are some more detailed how-to guides on the docs site if you want to build them yourself, or need more details than the brief instructions below.

Getting the executable files

Each of the tools is packaged up as an executable file - the Scanner is written in rust, so it’s easy to just compile a binary. The Layout app is a node.js script, I’ve used pkg to build a bundled executable. And the Explorer can be run as a static website, so the packages are a zipped up bundle of all files needed to build the website, which you can run yourself.

Scanner executables can be downloaded from https://github.com/kornysietsma/polyglot-code-scanner/releases
Layout executables can be downloaded from https://github.com/kornysietsma/polyglot-code-offline-layout/releases
Explorer bundles can be downloaded from https://github.com/kornysietsma/polyglot-code-explorer/releases

If you are on a Mac you will need to strip Apple’s quarantine attributes from the binary files to avoid the “app is from an unknown developer” error:

tar zxf polyglot-code-scanner-vwhatever-x86_64-apple-darwin.tar.gz
cd polyglot-code-scanner-vwhatever-x86_64-apple-darwin
xattr -d com.apple.quarantine polyglot_code_scanner

unzip polyglot-code-offline-layout-macos.zip
xattr -d com.apple.quarantine polyglot-code-offline-layout

The Explorer is not an executable file - it’s a zip file containing the HTML, CSS and JavaScript files needed to run the site. You can run them locally by running a tiny web server yourself using Python - there are more detailed instructions here or there’s a big list of similar servers in other languages here - I’ll use Python 3 below.

Running them

A short sample of running these together might help:

$ cd ~/work
$ polyglot_code_scanner --coupling --years 3 -o my_project_1.json ~/src/my_project
# this can be slow for big projects, or if you scan back through many years of history
# coupling is optional, remove --coupling to speed it up if you don't want it
# Check there are no errors and the my_project_1.json file is there

$ polyglot-code-offline-layout -i my_project_1.json -o my_project_2.json
# this can be slow for big files
# Check there are no errors and the my_project_2.json file is there

# the first time, you need to unzip the explorer files
$ unzip ~/downloads/polyglot-code-explorer.zip
Archive:  polyglot-code-explorer.zip
   creating: polyglot-code-explorer/
$ cp my_project_2.json polyglot-code-explorer/data/default.json
$ cd polyglot-code-explorer
$ python3 -m http.server
Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/)

Then open a browser to http://0.0.0.0:8000 to start exploring!

Using the UI

The Explorer front end looks somewhat like this:

There is more about how to use the UI on the docs site

The centre of the display shows the files in your project - I’m using a Weighted Voronoi Diagram which has the big advantage of showing files roughly in proportional to their size. And by size I’m using lines of code, which is generally much more useful than bytes - especially as research tends to show that high lines of code is correlated with complexity and defects - so just looking for large lines of code is a good starting point for finding problems.

Viewing by programming language

This view is very simple - it just colours each file by programming language, showing the 10 most common languages. Mostly useful for getting an overview of what goes where - it’s usually easy to spot the front-end vs back-end code by the colours used. (only 10 languages are shown because beyond that, it’s hard to visually see different colours)

Lines of code

This view is simple enough - it uses a scale from blue for tiny files, through to yellow for giant files.

Note that this is not a linear scale - a lot of these use what I call a “Good/Bad/Ugly” scale - blue (0) is good, red (1000) is bad, and yellow (10000 and above) is just ugly. If I used a linear scale, it’d be harder to distinguish the good/bad files from each other. (yes, I could use a log scale, but that has it’s own problems)

Indentation

This metric is an interesting one. In Hindle, Abram, Michael W. Godfrey, and Richard C. Holt. 2008. ‘Reading Beside the Lines: Indentation as a Proxy for Complexity Metric’ they found that indentation is often useful as a way of looking for complexity - which makes common sense; files with a lot of indentation are often files with deeply nested “if” and “case” statements. You can choose a few sub-visualisations using the drop-down near the top-left - the default shows the standard deviation of indentation, which is often the most useful metric; you can also see the worst indentation in each file, and the “total area” which is useful for showing files which are both large and deeply indented.

Of course this metric can have false positives - heavy indentation might be due to a particular formatting style for long lines, or an actually valid data structure, or other valid reasons. But it is often surprisingly useful.

Age since last change

This view shows how long it is since each file was changed (from git history) - blue files are recently changed, red files haven’t changed in a year, yellow files haven’t changed in 4 years. Note that this is affected by the date selector down the bottom of the page:

Files that haven’t changed at all in the selected date range will show in grey. You need to select the whole project (drag the left side of the selector to the left of the screen) to see change information across the whole scanned date range.

This is a good/bad/ugly scale again, largely because generally files that haven’t changed for a long time are, in my experience, parts of the system that nobody understands or feels safe to touch.

However this is a bit contentious - it depends a lot on the culture of the organisation, and the kind of code - a lot of research in this field shows the flip-side of this, that files that haven’t changed for ages are stable. If they had bugs, people would have touched them - so these files might be “safe”. Personally, coming from an agile world where shared code ownership is important, and rapid change is the norm, I see old untouched files as something that might show stagnation and maintenance nightmares - I think a lot of what is “good” here depends on what you are looking for.

Creation date

This doesn’t use a good/bad scale - it’s not really about quality, but sometimes it’s useful to know which files are new, and which are old. This is especially handy when you are using the date selector, to give you a feel for how the code has changed over time.

However, there is a problem here that requires a bit of a digression

The problem with the date selector

The scanner starts with the files currently on your filesystem - and then it works backwards in time through the git logs. It doesn’t really keep track of the actual state of your system over time, ~or file renames,~ or deleted files. If you create a file foo.c, and do a pile of work on it, and then delete it, the scanner will not show it - there’s not really anywhere in the JSON data file to store that data! ~Similarly renames are not handled well - it sees a file rename, but isn’t great at tracking what happens to the file before the rename. (This is something I plan to fix, when I can! But it’s non-trivial - you can’t just track file renames by time, you need to track them by branch…)~

Update as of scanner version 0.2.0, it does now follow renames and respect deletes. You still can’t see any files that are not in the current HEAD revision when you scan! But if you rename foo.c to bar.c it will show all changes to foo.c when you look at bar.c. This is most important if you move directories around - I’ve had to rename src/hierachy to src/hierarchy in the past!)

So moving the date selector is handy for limiting some kinds of information, and getting some views of the past - but it’s not actually a window into the past state of the project.

Unique changers

This shows how many different people touched a file, in the selected date range. Again is a bit of an “it depends” metric - some studies show that few changers are good, as they tend to be just experts and not new inexperienced people. But again, too few changers can be a sign that only one person knows a piece of code, so you don’t have any collective code ownership, and if that one person leaves, you might have some unknown code. (There’s some really interesting research in this area, which I’d love to look into in the future - such as looking at how new/old each changer is to the organisation, how long they’ve been touching this area of the code, and the like).

This has a custom colour scheme because it’s not as simple as good/bad. Basically:

No changers is bad, so it’s highlighted in cyan. This probably means that no-one currently understands the code at all.
One changer might be OK, though I’d see it as an ownership risk. This is shown in dark red.
Two to Eight coders is, in my view, generally OK. This is a “two-pizza team” - it’s fine for the whole team to be changing a file.
Eight to 30 coders is definitely risky - maybe the file is tightly coupled with several areas of code, or full of bugs so people keep needing to fix it. High numbers are in brighter colours.

Note there is one current limitation here - the system treats unique user names / emails as unique individuals. So if you change email or git account, you will look like two people. I plan to add some way to flag duplicate names - possibly using the pretty obscure git .mailmap file format. But this is a fair way down my to-do list.

Churn

Churn shows the rate of change - how often a file has changed in the selected date window. This again isn’t necessarily good or bad - it depends a lot on what date range you are using. If a file changes every work day over several years, that’s probably bad! But if it changes every day over the course of a short project, that might be fine.

There are three sub-visualisations here:

Days containing a change - this is in proportion to the number of days selected. So “0.5” means the file has changed every second day, on average. This doesn’t care how often in the day a file changed, so 10 commits on one day looks the same as 1 commit.
Commits per day - this is the sum of commits, divided by the number of days. So “0.5” means on average one commit every two days - but this might mean 150 commits on one day, and none the rest of the year.
Lines per day - this is the sum of the number of lines changed (both adds and deletes) divided by the number of days. So tiny tweaks to files won’t show up nearly as brightly as large numbers of lines added or deleted. Good for seeing where more work is being done.

Temporal Coupling

This is based on ideas from Adam Tornhill’s books, plus some research - it tries to work out when files might be tightly coupled to each other, based on when the files change in git. Adam calls this “Temporal Coupling”.

The curved lines show which files seem to be temporally coupled to which other files.

Note each line is unidirectional - file A may be coupled to file B, but file B may not be coupled to file A.

For example, in the screenshot above, the file testprocessinggui.cpp had commits on 22 days in the date range selected.

The file qgisapp.cpp was also changed on 20 of the same days.

According to the current coupling algorithm, this means it has a ratio of 0.909 - 90% of commits to testprocessinggui.cpp seem to have also implied commits to qgisapp.cpp.

The converse might not be true - qgisapp.cpp might have changed on another 20 unrelated days, so it might not have a coupling connection back to testprocessinggui.cpp.

Obviously this logic can produce a lot of false positives, if files change a lot coincidentally.

At the moment, this either produces far too many links, or far too few. I think this needs a lot of work - at minimum, it should use a much smaller time window than a day! I am probably going to try to make it changes within an hour, and see if that helps.

Most of the research in this area tracks changes within a single commit - but this doesn’t work so well for projects with lots of repositories, such as microservices projects. A huge benefit of this sort of coupling display, if it works, is to find those hidden dependencies between projects - knowing that every time you change the Foo service, you also need to change a file in the Bar service, could be very useful.

Next steps

I’m keen to keep tinkering with this - I have a pile of possible enhancements, and a long list of research to read! And a lovely 3 year old child, and limited spare time :)

A few things are of fairly high priority - I’d like to handle git history renames better, projects with a lot of refactoring will have poorer quality metrics at the moment.

I’d also love to get feedback to help me prioritise - feel free to add comments on the Disqus form below, or contact me on Twitter or other social media - or for bugs / improvements you can raise issues on the linked GitHub projects.

Better D3 sites with react

2020-07-19T19:41:00+01:00

Disclaimers

I’m not a React nor a D3 expert. I’m too much of a generalist these days to consider myself an expert in anything really! I am happy to be told how to correct or improve any of these examples, and of course don’t just copy me - take what is useful from my stuff, and build your own, better things!

Also note I built my sample code using create-react-app - and I haven’t cleaned out all the files that creates, so there might be some junk hanging around.

TL;DR: my sample code is at https://github.com/kornysietsma/d3-react-demo

The ancient past - tinkering

I’ve been playing with D3 for quite a while now - I tinkered with D3 on a clojure server in 2013 and in 2018 I shared an approach that mostly worked for me - using modern JavaScript and CSS, ditching JQuery or other frameworks, and going serverless, because in most cases having a purely static site worked for me, and made it much easier to host and share visualisations.

However it was always painful to build the non-SVG parts of my visualisations. Forms, inputs, sliders, and the like, are a hassle to build yourself once you get any complexity at all.

What I needed was to integrate with a more modern JavaScript framework - in 2019 I finally found time to learn some React, and I decided it’d be good to combine the two.

The recent past - adding React

Unfortunately, it’s not that straightforward to do so. Basically React likes to control the DOM - tracking state changes, diffing a virtual DOM with the real DOM, and the like. D3 also likes to control the DOM - and you need to work out how to stop them fighting.

There are several approaches that can be used here - there’s a nice overview in “Bringing Together React, D3, And Their Ecosystem” by Marcos Iglesias - basically there’s a spectrum from letting React and D3 largely own their own parts of the DOM, through to letting React look after all the DOM and just using D3 to do D3 special bits. I was more keen on letting them be largely isolated - D3 is very good at what it does, and the less react-y it is, the more you can reuse some of the millions of great D3 examples that are out there.

I also found this great article: “React + D3 - the Macaroni and Cheese of the Data Visualization World” by Leigh Steiner which was extremely helpful, and the basis of most of my approach.

However, it didn’t go into all that much detail - and also, despite mentioning the newer React functional style and hooks, most of it was based on old componentDidUpdate logic. And state handling seemed tricky.

Also, another big thing for me, is it didn’t explain how to work with the D3 join model (D3 examples often don’t, sadly). The idea is, done properly, D3 rendering can detect changed in a diagram’s underlying data, and cleanly handle adding new elements, updating changed elements, and deleting removed elements - with transitions if you want. D3 recently added a cool join function which makes this even easier.

So I started tinkering with making this work my way…

The present - React + D3 with hooks

My current approach is at https://github.com/kornysietsma/d3-react-demo - to be precise, this article is based on code at this commit in case the repo has moved on by the time you read this.

The D3 parts

D3 only exists in the Viz.js file - everything else is React. The Viz component creates a single svg element:

    <aside className="Viz">
      <svg className="chart" ref={d3Container} />
    </aside>

That ref={d3Container} means React creates a reference to this DOM element for manipulation by the Viz component - see Refs and the DOM in the react docs for more.

The heart of the Viz component uses useEffect() as mentioned in the Macaroni and Cheese article, to trigger changes to the D3 component as a side-effect - if and only if the data being referenced has changed. The core of the Viz update logic is this code:

const Viz = (props) => {
  const d3Container = useRef(null);
  const { dataRef, state, dispatch } = props;

  const prevState = usePrevious(state);

  useEffect(() => {
      // d3 update logic hidden
  }, [dataRef, state, dispatch, prevState]);
    return (
    <aside className="Viz">
      <svg className="chart" ref={d3Container} />
    </aside>
  );
};

UseEffect takes four properties - and will only be called if any of these has changed:

dataRef is another ref - in this case to the raw data to be visualised. More on that later. As it’s a reference (think pointer) it doesn’t actually change, it’s included here to avoid React complaining
state is where I put all the visualisation state - what to show, what colours to use, interactions etc. Generally it’s the only thing that might change
dispatch is a global dispatch function that D3 can use to make changes to the state - more on that later. Again, it shouldn’t change, so it’s just here to keep d3 happy.
prevState is the previous state - this is a trick I got from this Stack Overflow question - it stores the value of state from last time Viz was shown, allowing me to detect what has really changed.

Initial setup, cheap changes, and expensive changes

One thing I wanted to handle was to separate out different kinds of visualisation updates. For simple things this is complete overkill - but I often find that my UI changes fall into two categories:

Cheap changes that really just need to update some colours or highlights, really quickly
Expensive changes that need more serious processing, possibly with some delay

For example, dragging a colour slider to change colours might be so cheap you want it to happen on every mouse drag. But changing a date selector might mean re-processing the underlying data for some reason, and that might be slow.

There are also the things you do once and only once - adding svg groups, for example.

So the code looks at the state, and the previousState, and works out what has changed:

    if (prevState === undefined) {
      initialize();
    } else if (!_.isEqual(prevState.expensiveConfig,
                          state.expensiveConfig)) {
      draw();
    } else if (!_.isEqual(prevState.config,
                          state.config)) {
      redraw();
    } else {
        // nothing to do
    }

I’m using lodash to do object comparison - state can be deeply nested, and JavaScript doesn’t have a reliable way to do deep object comparison.

I won’t go much into the initialize, draw and redraw functions at this stage - they are relatively straightforward. I don’t even actually use the cheap/expensive code in the demo - draw just calls redraw.

The only interesting thing to note is how to interact with the world outside D3 - using the dispatch function:

  .on("click", (node, i, nodeList) => {
    dispatch({ type: "selectData", payload: node.id });

How this works will be covered later.

Loading the data

The data for my demo is in a JSON file - you could just import it, but that’d load it synchronously - fine for small amounts of data, but for larger datasets I want to be able to warn the user that data is loading.

So instead of the default App component, I have a Loader, which again uses useEffect to load the initial data as a side-effect of rendering:

const Loader = () => {
  const url = `${process.env.PUBLIC_URL}/data.json`;

  const dataRef = useRef(null);

  const data = useFetch(url);
  dataRef.current = data;

  return data == null ? <div>Loading...</div> : <App dataRef={dataRef} />;
};

useFetch is a function that makes a fetch call (the modern alternative to XMLHttpRequest) to get the raw JSON data, and apply any needed postprocessing.

This again uses useEffect - see the react docs on this for more background. Effectively, the first time the Loader component is rendered, it will call useFetch which actually returns have no data so will show <div>Loading...</div> - and kick off useFetch which returns a null response.

useFetch looks like this:

const useFetch = (url) => {
  const [data, setData] = useState(null);

  useEffect(() => {
    async function fetchData() {
      const response = await fetch(url);
      const json = await response.json();
      // postprocessing removed for clarity
      setData(/* stuff */);
    }
    fetchData();
  }, [url]);

  return data;
};

In this code, useEffect takes a parameter [url] - this means it will only be run if the URL has changed (which should never happen in this example) so it runs once. When it has fetched the data, it calls setData which sets the data state - which triggers a re-render of the Loader (see the react docs for useState).

The second time Loader is rendered, the call to useFetch effectively does nothing, as the value of [url] has not changed. (If it changed it could get into a loop, which would be bad). But it will return the updated data value, which I put into yet another ref: dataRef and pass to the App:

<App dataRef={dataRef} />

I’m using a ref here so the App doesn’t need to check the whole data object to see if it should be re-rendered. (This may be unnecessary - I’m not clear enough about react internals to be sure what would happen if I just passed data around - it may have no real overhead?)

Showing the App

App is fairly straightforward, with a bit of magic to set up the state and dispatch mechanisms:

const App = props => {
  const { dataRef } = props;

  const [vizState, dispatch] = useReducer(
    globalDispatchReducer,
    dataRef,
    initialiseGlobalState
  );

  return (
    <div className="App">
      <header className="App-header">
        <h1>Korny&apos;s D3 React Demo</h1>
      </header>
      <Viz dataRef={dataRef} state={vizState} dispatch={dispatch} />
      <Controller dataRef={dataRef} state={vizState} dispatch={dispatch} />
      <Inspector dataRef={dataRef} state={vizState} dispatch={dispatch} />
    </div>
  );
};

The UI is basically three components, Viz which is the D3 visualisation, Controller for the user controls on the left panel, Inspector to inspect a particular data point. They all take the same parameters - dataRef for the raw data, state for the current state, and dispatch for updating the state.

State and Dispatching

State management is done through useReducer - see the react docs for more. Basically it takes three parameters:

the reducer function, globalDispatchReducer
the initial data, dataRef
an initialising function initialiseGlobalState - this allows for lazy calculation of the initial state.

The initialise function creates the initial state object - it has a shape roughly like this:

  {
    config: {
        // cheap state
    },
    expensiveConfig: {
        // expensive state
    },
    constants: {
        // state that never changes
    }
  }

As discussed earlier, I split the state into cheap and expensive, and rendering is different depending on what changes. There is also a constants section - this doesn’t really need to be in the state, but it’s useful, especially as sometimes something starts off as constant (like margins, in this example) but later might become modifiable, at which time you can move it somewhere else in the state.

The globalDispatchReducer is what gets called whenever anything calls dispatch() - earlier there was an example of an onClick handler which called dispatch({ type: "selectData", payload: node.id }) - the Controller also calls dispatch whenever a user clicks a control.

globalDispatchReducer is basically a large switch statement:

function globalDispatchReducer(state, action) {
  switch (action.type) {
    case "selectData": {
      const result = _.cloneDeep(state);
      result.config.selected = action.payload;
      return result;
    }
    // rest removed for clarity

It takes the current state and an action - which is { type: "selectData", payload: node.id } in the example above. Whatever it returns is set as the new state, which will trigger re-rendering of any affected react components.

I’m using lodash to clone the state here - alternatively you can just use es6 destructuring assignment, such as:

      return {
        ...state,
        config: { ...config, selected: action.payload }
      };

However this gets hairy for deeply nested structures, as the returned object is not a deep clone of the original object - in the above example, state.expensiveConfig.dateRange would be a shared reference between the original state and the new state, rather than an actual new object. That might be OK, but it can be quite counterintuitive - it’s caught me out before, so I like to use cloneDeep and be explicit. (It’d be nice to rework this with immutable.js but that’s a rabbit hole I don’t have time for now)

The overall event flow

The above might be a bit confusing - in a nutshell, I pass a dispatch function to every component, including d3 renderers.

When something calls dispatch:

globalDispatchReducer is called, returning a new state
React updates the vizState state owned by the App component, so re-renders App
App in turn re-renders everything else.
Normal components are updated in standard React fashion, using virtual DOM magic so not too much gets re-rendered
the Viz component looks at the updated state and redraws whichever bits of the D3 visualisation need to be redrawn.

All of this is surprisingly smooth - I’ve had pages with thousands of svg nodes which updated nicely as I drag a control slider. I initially thought I’d need to find ways to bypass react for some UI updates, but so far I haven’t.

The future

I’m using this for my polyglot code tools - I intend to write more about those when I have the time.

I’d really value feedback on this post - especially as I’m not a react expert, and there are probably major things I’ve missed! Feedback via Disqus below, or via @kornys on Twitter.

Update Mar 2022

I have tweaked the repo a bit - using Typescript now, and updated versions of React, D3, eslint and prettier. Haven’t really had time to update this blog post, but hopefully it’s mostly still relevant.

Korny’s Blog

Playing with Rust and Copilot

New job, new blog!

Interesting folks to follow on Mastodon

My highly-idiosyncratic list of interesting people on Mastodon

Aside - how to follow people

My list

Buying minecoins on a child’s Android Minecraft account

Is Mastodon a Twitter replacement?

It depends which bits of Twitter you want

New polyglot code tools releases

Viewing by Teams

Saving and loading settings

Moving to Typescript

Other refactorings

Looking to the future, and for feedback

A geeky kind of sabbatical

A geeky kind of sabbatical

Work in progress

Planned epics

Possible epics

Important tasks but not really epics

Please send me your ideas

The inevitable update

Hiatus

Long time no blog

Introducing the Polyglot Code Explorer

Welcome to the Polyglot Code Explorer

What is it for?

Why polyglot?

How to run the Explorer

Getting the executable files

Running them

Using the UI

Viewing by programming language

Lines of code

Indentation

Age since last change

Creation date

The problem with the date selector

Unique changers

Churn

Temporal Coupling

More information and further reading

Next steps

Better D3 sites with react

Disclaimers

The ancient past - tinkering

The recent past - adding React

The present - React + D3 with hooks

The D3 parts

Initial setup, cheap changes, and expensive changes

Loading the data

Showing the App

State and Dispatching

The overall event flow

The future

Update Mar 2022