Deciphering Bitcoin Blockchain Data by Cohort Analysis | Scientific Data – Nature

To obtain the best experience, we recommend you use a more up to date browser .

This enables us to create datasets and visualizations for some key Bitcoin transaction indicators, including the daily lifespan distributions of spent transaction output .

Bitcoin is a peer-to-peer electronic payment system that has rapidly grown in popularity in recent years1,2,3,4.

Immediately after Alice’s payment to Bob on January 1, 2021, UTXOs 1–3 are converted to STXOs with ages of 9 years, 1.5 years, and 0.5 years and 1-day old, respectively.

UTXOs 1, 2, and 3 were spent in a transaction taking place between Alice and Bob and were transformed to UTXOs 4 and 5.

To continue the analogy with the population data, we say a UTXO is born when it is generated as block rewards or the output of a transaction, and we say a UTXO is dead when it is spent as the input of another transaction.

With over 1.6 billion historical transactions on the Bitcoin blockchain, it has become increasingly difficult and computationally intensive now to download the complete Bitcoin blockchain records.

By doing so, we successfully create datasets and visualizations for some key Bitcoin transactions indicators, including the daily lifespan distributions of STXOs as percentages .

2021, the STXOs with lifespans of less than one day accounted for 80% of all STXOs, while those with lifespans between 1 day and 1 month accounted for another 15%.

2021, there were approximately 200k UTXOs less than 1 day old used as the medium of exchange and approximately 2 million UTXOs more than 10 years old lost or used as store of value.

Our final datasets include one dataset that characterizes STXOs and one that characterizes UTXOs, which are both smaller than 1 MB.

While the Bitcoin transaction output data are publicly available on its blockchain, we find the size of the raw data overwhelming to process, even with cloud computing platforms. To improve the efficiency of computation, we first retrieve the data relevant to the study to create a more manageable data table of only 45 GB.

To save the cost of the query, we create two partitioned tables based on the derived data table, one by the date in block_timestamp and one by the date in spent_block_timestamp.

The data structure of partitioned tables coincides with our need to process cohort data.

As in Task 1, we compute the total number of BTCs in UTXOs created and spent on that date by summing the number of BTCs in UTXOs in the birth cohort data and the death cohort data respectively.

Each UTXO that remains alive on a specific date must satisfy both conditions: a) its block_timestamp must be smaller than the end of the working date, which means that the UTXO was created sometime before or on the date, and b) its spent_block_timestamp must either be null, which means the UTXO was not spent before 2021-02-10, or be larger than the end of the working date, which means that the UTXO was spent sometime after the working date but before 2021-02-10.

The result of our analysis is condensed into time-series data that include the number of BTCs in UTXOs created and spent, the weighted average lifespan, the lifespan distribution, and the age distribution on each date from 2009-01-03 to 2021-02-10.

We will update the visualizations according to the latest development of Bitcoin, and researchers may easily repeat our work in part or in whole based on their needs.

In addition to examining Bitcoin, we apply the same cohort analysis to five other cryptocurrencies and generate twelve datasets in total.

Supply of the BTCs originates from the block rewards, so the cumulative sum of block rewards is the total number of BTCs in UTXOs, i.e., the circulating supply of BTC.

In addition, we calculate the circulating supply of BTCs by summing all UTXOs in different age cohorts because existing BTC are essentially just UTXOs of different ages.

Our data can produce new technical indicators for financial studies to predict cryptocurrency bubbles21,22, measure cryptocurrency volatility and systematic risk23,24, design investment strategies25,26,27 and implement portfolio managements28,29.

First, although the frequency of our data is on a daily level, our cohort analysis can produce data with higher frequencies.

UTXOs might accumulate ages for at least two reasons other than being a store of value: First, the owner of the UTXOs has lost the private key, or second, the amount of UTXOs in the owner’s account is less than the transaction fee.

In the UTXO model, crypto tokens are akin to banknotes issued by central banks; in the account model, crypto tokens are akin to balances in commercial bank accounts.

Yinhong Zhao was a visiting undergraduate student hosted by Duke Kunshan University, during which he started to partake in the co-authorship as the research assistant of Professor Luyao Zhang during the Covid-19 global pandemic.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author and the source, provide a link to the Creative Commons license, and indicate if changes were made.

…Read the full story