As the project grows in complexity, and especially as the project operates, it will become harder and harder to understand what's going on, what should be happening, and observe bugs in the system. For this, the best way to operate is to start working on tools that will help us analyze and understand the distributed system both in normal operation conditions and in faulty conditions.

Obviously, there is a challenge with this: we cannot (and don't want to) monitor nodes in production set up by users. It goes against our privacy principles, and anyway, they can opt out.

The options for this are multiple, and some of the tools would be meant for users, some are just internal for the Testnet, and perhaps opt-in for production.

This is an initial list that we should start elaborating in more detail:

Visualization tools

Instrumentation

Monitoring