Friends and readers, forgive my long absence. As I mentioned before, I only post when I feel compelled to put my thoughts into the ether. I suffer from two constraints, one is that I have a day job that is pretty intellectually interesting1 and I want to stay away from publishing things that are too close to that work. The second is that we have expanded the Quantum Observer family again2, and while adding more family members does scale sublinearly, one does end up pretty tired at the end of the day.
So in recognition that “done” is the enemy of “poasted”, I bring you this Thanksgiving post. This is about an incomplete personal project that I had aspired to finish and present with a lengthy accompanying post. After about a year of chipping away at it on and off3, I figured I’d just post my thoughts, current progress, and possible avenues for improvement and relieve the burden on my soul.
The Context
A few years ago I heard rumors about one or two stealth startups4 that, while not exactly QIS related, were very intriguing nonetheless. It was hard to get any information, but I heard the term “thermodynamic ai” thrown around. At first this sounded like a bullshit neologism, but some investigation with Google Scholar revealed that there was a lot of work going into this.
I found an interesting report on a thermodynamic computing workshop, fittingly titled Thermodynamic Computing5. Digging through the references in that paper, I stumbled upon the paper that lends its name to the title of this post, and which consumed much of my free time in the early months of 2023:
Thermodynamic Neural Network by Todd Hylton.
Reader, when I tell you I’ve never read any paper more closely than this, I am mostly not exaggerating. It certainly is the closest I’ve ever read a paper for fun rather than for profit. I went through this fucking thing line-by-line, equation by equation, over and over again.
But why? In short, I thought it would be easy6 and fun7 to reproduce.
The TNN paper essentially describes a neural network where each node exchanges ‘charges’ with its neighbors. Node states are sampled from a thermal (Boltzmann) distribution based on their current charge-state and when nodes reach an equilibrium they undergo an ‘irreversible’ update, where charges on the node are allowed to be ‘dissipated’ and the weights between the node and its neighbors are strengthened, allowing for better charge transport going forward. The paper includes links to a YouTube channel with videos of the output from Prof. Hylton’s various runs of the TNN under different conditions.
The paper also links to a GitHub repo with the code to run these simulations, which I intended to check out and fiddle with to write a cute little post and move on with my life. The paper disclaims the state of the code, rightfully pointing out that it’s research grade stuff and not really made for much public consumption. Unfortunately, I’m not great at parsing other peoples’ code, especially if I don’t know what it does. I wasn’t able to figure out which parts of the script were implementing which parts of the paper and how faithfully. It seemed like there were tons of simplifications being made, but I could barely make heads of tails of it. Eventually I became so frustrated that I figured I would just write my own implementation in python and stay as faithful to the text of the paper as I could. In principle, it would be easy for others to tell exactly which part of my code implemented which features of the TNN as described in the paper.
After all, how hard could it be?8
My Attempt
You can find my attempt at the repo associated with this substack:
In there you will find the main file of interest `network.py` along with my plotting utilities and two scratch documents which I am not too ashamed to share with you. Currently, `scratch.py` goes through TNN creation, evolution and plotting in order, so I suggest starting there if you want to mess around with this stuff. I may go back and curate these a little better, but I think any reasonably competent person can look at those and divine how to initialize and operate the TNN.
The Details
I decided to rely heavily on the NetworkX python package for creating, updating, and manipulating my TNNs. It ended up being very slow for even moderately sized nets, but the functionality of NetworkX helped me focus on implementing the logic rather than implementing various graph operations and reinventing the wheel, so to speak. I found it indispensable for making sure I had creating something that was conceptually identical to what was described in the TNN paper.
What I ended up with was a network class that appeared to function correctly, albeit slowly due to the substantial overhead that operating through NetworkX required. I am able to initialize and run a TNN, visualize the fluctuations and plot the total network energy. I could even do fun stuff like initialize the network at a high node temperature and then lower the temperature over the course of the simulation run, effectively annealing the network to some close-to-minimal energy much more rapidly than if I’d kept the temperature constant.
Unfortunately, I don’t seem to be able to qualitatively reproduce the same features that Prof. Hylton observed. Namely, lots of fluctuations and self-organizing blobs at certain temperatures and when bias nodes were included in the TNN9. My TNNs appear to lock into optimal configurations very quickly, according to the visualization. Current hypothesis is that something is borked between state sampling/assignment and recording/visualization.
Here’s a short list of what the code does:10
Initializes a Thermodynamic Neural Network from a networkx graph. Two default connectivities are available: Nearest Neighbor, and bipartatite next-nearest-neighbor. The user can also just pass a custom graph with any connectivity (I haven’t tested this functionality!).
You can specify network node and edge temperature separately.
You can specify static bias nodes with static temperatures.
You can specify node state space. I usually just use [-1,1], but you could do anything. The code should function to correctly sample from this space.
Evolves the state of the network and records node state and total network energy.
`scratch.py` has the infrastructure to initialize and implement all of the above, as well as record outputs and make fun little .gifs to visualize the results.
It should mostly work, but recent changes might’ve introduced some bugs.
Looking Forward
If I ever end up revisiting this project, my hitlist would be something like:
Implement much of the functionality as matrix/array operations. One major hurdle to iterating my model is that it’s so slooooow to run, in part because I’m iterating over node objects initialized from the NetworkX package. This was useful conceptually when I was starting, but needs to be jettisoned now that I’m interested in performance.
Hand solve a simple implementation. This thing is kind of challenging to debug. I think I would want to hand-iterate 11a small (3x3? 4x4?) implementation and see if I get results consistent with the TNN code.
Investigate parallelization. Right now this all runs on a single core, but I think this can be improved substantially. I think maybe you’d have to be careful about making conflicting changes to shared edges. I also suspect this is a solved problem in professional numerics circles.
Try a simpler formalism. Much of the TNN formalism as presented in the paper is an analogue of ‘forces and fluxes’ conceptualization which is pretty foreign to me12. I don’t see a reason why this type of model can’t be implemented by analogizing different physics that I’m more familiar with. Something with harmonic oscillators, perhaps13. I want to get rid of the substantial book-keeping involved with the charge compartment model in the paper.
Learn ML better/at all. I’m also interested in becoming an ML practitioner and gaining a much stronger appreciation for how ML models work. The TNN paper makes some claims about the advantages of a TNN vs standard neural networks, one of which is that no backprop is required. I have only the faintest idea of what this means, and no idea whether this claim holds up. Correcting this deficiency remains a long-term goal of mine.
Consider physical implementations. Part of the context in which the TNN paper is written, is that thermal fluctuations are ubiquitous and maybe they should be viewed as a resource, rather than an obstacle. Normal Computing is already pursuing something like this with their s(tochastic)-bits. I’m sure there are many low-hanging fruits here, just waiting to be plucked. Plus, I’m just a simple experimentalist, so eventually I must put my grubby paws on atoms, rather than bits.
Actually use it as a neural network. In principle, it would be interesting to actually feed the network inputs and train it to do basic stuff like character recognition or whatever the standard NN benchmarking task is.
Freedom
Well, there you have it. I hope that some of you are nerd sniped by this effort and spend entirely too much time fiddling with this project. Instead of feeling bad about not finishing a project, I will feel good about writing a blog post14.
Happy Thanksgiving!
If you enjoyed this tale of failure, you'll love my other posts!
A blessing!
Another blessing!
Initially quite on, but now mostly off, at this point.
One of these ended up being Normal Computing, which released their first paper titled Thermodynamic AI and the Fluctuation Frontier in February 2023. The other is still in stealth mode, I believe.
You really should read this, it’s very cool. It was particularly serendipitous for me, as I was entering my quantum thermodynamics arc.
No
Yes
Spoiler alert: pretty hard! GitHub copilot and GPT4 really saved me a lot of time here. Also, paradoxically, cost me a lot of time when I was too tired/lazy to sanity check their outputs. Overall, the AIs were very useful for simple tasks, or well defined functions.
At this point I would normally link to a YouTube video I made of the output, but ffmpeg doesn’t seem to be playing well with moviepy and I will be God damned before I spend multiple hours on Thanksgiving weekend debugging this dumb bullshit. Have I mentioned I hate technology?
Or, at least, what I intended it to do.
Not literally by hand, but close enough. I’m not a psychopath!
I think it’s a chemistry/chemical engineering thing.
Come on, you knew it was going to be oscillators.
Lemons —> Lemonade
“ done” is the enemy of “poasted”…Well said! Not sure if there was an intentional double entendre with “poasted” or just a spelling error but awesome!