Whale Songs Explained

David Ma
Alliance
Published in
7 min readOct 3, 2023

--

A few weeks ago, I released a demo called 0xWhaleSongs.

This blog post will outline how this is achieved, and how to preempt the unexpected challenges if you decide to take this demo to the next level.

If you do take it to the next level, I encourage you to apply to Alliance and dm me about it because I would love to hear more!

Refresher

Whale Songs is a twitter account where anybody with $1M of on-chain assets on the EVM can anonymously tweet from. Not only anonymous to the readers, but also completely anonymous to the server.

Of course the $1M on-chain is an arbitrary condition. Any property that is verifiable with on-chain data is up for grab. And for the future whales out there, I’ve also included some other groups for demo purposes.

Depending on where you’re coming from, you can either see this tech demo as A) a person doxxed by the service provider, reclaiming some form of privacy, or B) the ability to convince others of your credibility without doxxing yourself.

Often, A) is what comes to mind when “zero knowledge” is mentioned, but I also find B) to be fascinating when it applies to online-natives (and therefore often anon-by-default) communities like message boards, or crypto-natives like crypto twitter.

Spartan-ECDSA

The star of the show is spartan-ecdsa by Personae Labs, an (unaudited) implementation of a zero-knowledge circuit that allows one to prove their knowledge of a private key whose public key belongs to a set of public keys without revealing which one specifically.

In practice, the circuit is slightly more complicated:

  1. Prove set membership with a merkle tree. This allows the proof to stay small as the membership set grows large.
  2. Use ECDSA message signatures instead of the raw private keys. This allows the private key to never leave the wallet.

To summarize the chart, the circuit takes in as public input (msgHash, merkleRoot), and private input (signature, merkle path), and produces a proof that the constraints binding the inputs are satisfied.

Architecture Overview

Using the zk-circuit in the previous section, one can anonymously prove ownership of $1M of on-chain assets. Collect all the on-chain addresses with such asset holdings, and then show they own at least one of the addresses.

The idea is to implement something like this:

  • The client wants to prove their whale status in a message.
  • In order not to leak their address, the client requests and downloads the whole set of whale public keys (or addresses).
  • The client signs the message they want to post, finds their own public key in the merkle tree of keys to build a merkle path. These two pieces of information are kept private and never sent to the server, as shown in the diagram below with shaded boxes.
  • The client generates a valid zk-proof, and sends it along with the message they want to post.
  • Finally, the server verifies the proof is valid and that the message hash matches. If everything checks out, the server tweets from 0xWhaleSongs.

Challenges

On-chain data: Pre-calculating address sets

Collecting all the addresses with $1M is a surprisingly difficult task. The surface of possible on-chain data access patterns is very large, and only the most repeatable ones get turned into services.

I started by looking for API services that can give me a list of addresses with more than 1M USDC, USDT, DAI, or the equivalent in ETH, WBTC, stETH, etc. But it turns out very few APIs provide that service. Some partially worked and failed for large holder sets. Etherscan’s API works but isn’t cheap at $200/month.

I also tried to write a dune query to achieve a similar result by summing erc20 transfer events, but spot checking a few results did not give me confidence that this was a valid approach. I was very surprised to see that Dune had no table/spell/view for account erc20 balances. If a Dune wizard can educate me here, I’m all ears.

Lastly, I resorted to portfolio valuation apps like Zapper, Zerion, and DeBank. The upside is their coverage of assets is much more comprehensive than what I could come up with. It’s part of their value proposition.

The downside is that it makes WhaleSongs dependent on a specific valuation model for the address set. Potentially flawed but still verifiable. Some assets could also be very illiquid and their value could be over estimated by valuation apps. A very common problem in finance: FTX borrowing with high FDV shitcoin collateral. Lending protocols allowing too much to be borrowed. Defi dependencies on oracles that can be manipulated. All the same.

None of the valuation apps actually expose their API, but DeBank has been providing public account valuations as of July 2023. A miracle for WhaleSongs.

Other address sets like NFT-set owners were a lot easier to compute, requiring just a simple call to Alchemy.

ZK on Ethereum: Indexing Public Keys

Using the address circuit is about 10x more computationally intensive than using the public key circuit.

It turns out, if we build the merkle tree with Ethereum addresses, that requires the zk-circuit to do an expensive transformation (in zk-circuit land) to the ecdsa public key: a keccak256 hash.

In practice, it’s the different between requiring 5 seconds vs 1 minute to post a message on Whale Songs. That’s why it’s a worthwhile to index all the public keys associated with the addresses.

Unfortunately, you cannot get the public key from an address because {public key → address} is a lossy function. The only way to get the public key is to recover it from a signed message. And the place to get signed messages is onchain transactions. This means we can only figure out the public addresses of addresses that have sent at least one transaction.

This optimization requires the maintenance of an additional offline task dedicated to indexing public keys, alongside distinct code pathways for a fallback mode to accommodate scenarios where a user’s public key hasn’t been acquired yet.

https://github.com/djma/badge-bazaar/blob/master/pages/index.tsx#L244

Heavy ZK objects: Building the tree or downloading the paths?

The initial idea was to have the client download the list of whales and build the merkle tree on the client side. However, it turns out building the whale-1M tree with ~30k addresses takes ~45 seconds in node, and therefore even longer in the browser.

Fortunately, the server can pre-generate every merkle path, but at the cost of a larger download. The list of 30k addresses is 1.3MB whereas the pre-generated paths are 32MB.

In general, downloading the paths takes much less than 45s, the time to build the tree.

This need to serve large files adds yet another dependency to the demo. I wrapped Wasabi, the cheapest and simplest hot blob storage I could find.

Serverless functions limits

The function that handles posting messages is pretty heavy. It potentially needs to download a 50MB circuit, verify a proof, upload a bunch of blobs, and finally tweet.

This takes long enough to exceed Vercel’s free tier.

I could re-architect the infra to kick off jobs asynchronously, but this goes to show the number of unexpected challenges that come with building around ZK.

Where to go from here?

This is where you come in with your ideas. Going from a demo to a product takes orders of magnitude more work. Also, tweaking a small part of how the product is packaged can be the difference between useful and useless, despite the underlying piece of technology being the same. Here are a few directions to get you started:

Personae Labs experiments

  • HeyAnoun: The OG project. Any Noun NFT holder can post a message anonymously.
  • Noun Nymz: Takes HeyAnoun further by allowing users to create pseudonymz that can be reused (and therefore have multiple messages provably linked to it).
  • Cred: A service (in alpha) to create Twitter anons with verifiable claims. Using Dune queries to generate interesting anonymity sets.

Other directions

  • PLUME: A much more thought out research blog on nullifiers. If you plan on having pseudonymous identities that accrue reputation, or need sybil resistance, it’s definitely worth considering. 👀 ERC-7775 Alpha leak
  • Create a login system based on credentials.
  • A chatroom, subreddit, or hackernews where people can flex their credentials.
  • Whale Dating. Or just one side is a whale.
  • Use friend.tech’s twitter account ↔ address mapping to bootstrap anonymity sets. I.e. allow anybody to “tweet” from a group they pick the members of.
  • Let friend.tech key holders message the host anonymously.
  • Prove that you’re an Ethereum OG by proving that you’ve sent a transaction before 2017.
  • Prove you’re well traveled (an address that has been on many EVM chains).
  • Let exchanges or other custodians prove asset ownership split over many addresses. Would need nullifiers here.
  • Let block builders anonymously signal their reliability to searchers.

If you made it this far, apply to Alliance and dm me. I would love to chat!

Github for the project is here.

Thanks Debank for the dataset, lsankar, Personae Labs, 0xPARC, PrivacyScaling for upstream work, and mfouda for reviewing.

--

--