Bitcoin Nodes: How Many is Enough?
Should we be worried about the declining number of nodes?
It’s fairly straightforward to see the computational strength of the Bitcoin mining infrastructure growing by viewing the graphs of the relentlessly-increasing hashrate and thus mining difficulty. However, miners are only part of the network infrastructure — nodes also play several important roles in supporting our use of the blockchain. Recently I’ve been trying to quantify the strength of Bitcoin’s infrastructure with respect to its full nodes. I keep tabs on the number of full nodes via Bitnodes, which recently updated its crawling algorithm to be faster and more accurate. This update caused the number of reported nodes to drop by an order of magnitude, from more than 100,000 to fewer than 10,000 because it no longer counts nodes that do not accept inbound connections. Is this a cause for concern?
Let’s begin by defining a Bitcoin node. To be technical, a node is a running instance of a Bitcoin daemon, which is can be either the Bitcoin Core reference client (Bitcoin-QT or bitcoind) or any of a number of alternative implementations. Full nodes contain an entire copy of the blockchain (19 GB and growing) and perform several distinct functions. According to Bitcoin Core developer Pieter Wuille (who often goes by the handle “Sipa”):
1) Full nodes provide lookup of historic blocks, which is necessary for new nodes synchronizing.2) Full nodes provide filtered transaction lookup for SPV clients, which is necessary for those clients to function.
3) Full nodes validate blocks and transactions, and relay them.
1 and 2 are easy… we just need enough to satisfy reasonable (non-attack) demand. 3 is a bit harder, relay of blocks and transactions is important, but technically doesn’t require a full node. What full nodes do, however, is make sure the network is honest. And this is not so much a question of how many there are, it’s more about how hard it is to run one.
According to a Reddit post by the Bitcoin Foundation’s Chief Scientist, Gavin Andresen:
Most ordinary folks should NOT be running a full node. We need full nodes that are always on, have more than 8 connections (if you have only 8 then you are part of the problem, not part of the solution), and have a high-bandwidth connection to the Internet.
What Andresen is hinting at is that anyone who runs a full node that doesn’t accept incoming connections is essentially taking resources from the network but not giving much back in terms of performing the prior listed functions of a node. If you’re familiar with Bittorrent, the analogy to “seeders” and “leechers” is appropriate in this context. It’s easy to accidentally run a “leeching” node if you are running it on a computer that is behind a router that does not port forward bitcoin requests to the computer running your node. The easiest way to check if your Bitcoin node is accepting incoming connections (seeding) is to use the “check node” tool on Bitnodes.
As you can see here, the architecture of the Bitcoin network uses a distributed model — each node ensures that it is connected to anywhere from 8 to 100+ other nodes. At this point it sounds like we want as many “seeder” nodes on the network as possible, so that they form a more dense mesh that results in greater distribution of load throughout the network mesh. Core developer Peter Todd stated on Reddit:
Each full node is an individual entity, for instance if one full node finds an invalid block, there is no way for it to warn other nodes that the block is fraudulent. They do provide more capacity for SPV, but it’s not provably “honest” capacity — a full node can lie about a lot of things to an SPV client and they’ll be none the wiser. For instance currently full nodes can censor transactions from SPV nodes, and there’s no way for them to know that’s happening if they are sybilled, and sybilling SPV nodes isn’t very hard.
Andresen, Todd, and Wuille all referenced the difficulty involved in running a full node. It’s difficult from the technical standpoint of maintaining one’s hardware and software to keep the node online and up-to-date, and it’s resource intensive with regard to disk space and bandwidth. While there have been some calls to incentivize users to run full nodes, it may behoove us to take the opposite approach. Perhaps we could design a supernode / lightweight full node that can support many outgoing connections but only stores and serves up the past day / week of blocks. However, this might be a futile project because it could be made obsolete if the Bitcoin Core developers implement Merkle Tree Pruning, which would greatly decrease the total size of the blockchain. Also, Todd is working on a project he’s calling Tree Chains that aims to address network scaling issues.
It’s clear that the inner workings of full nodes are fairly complex. Given the transparency of many aspects of the Bitcoin network’s infrastructure, I’m disappointed that we can’t get more insight into the volume of requests being served by the nodes. I’m quite interested in finding a way to measure the load the network is experiencing in realtime. It seems to me that if we develop a programmatic way of determining this, we could then automatically spool up new nodes on virtualized servers as needed to meet increases in demand. However, as far as I can tell, because any given node is helping the network in a myriad of ways and it’s not possible to query the nodes for more granular statistics, it’s not possible to programmatically determine how busy the nodes are. What are we to do? I’m currently pursuing an idea to develop a fork of Bitcoin Core that is focused upon reporting of statistics around what types of requests are being handled by the node. By deploying a number of these modified nodes throughout the network mesh, we should be able to gain more insight into the total load that the mesh is experiencing.
I have learned quite a bit while researching this topic, but remain unsatisfied with the answers I’ve uncovered thus far, as they have only spawned more questions. There seems to be a consensus among the community and the core developers that the network is currently under trivial load, but I would prefer to gain more insight and quantify the health of the network so that we can be proactive rather than reactive as Bitcoin use and thus network demand increases.
Update: after months of research and developing software to help collect and analyze internal Bitcoin node metrics, I posted my findings here.