Exploring Ethereum blockchain events: Data Analysis

Preface: This is the last in a series of 3 articles dedicated to the analysis of the Ethereum blockchain events:
Part I: Data scraping (fetching and decoding data)
Part II: Data pre-processing (using the SuperRare contract as an example)
Part III: Data analysis (a SuperRare contract data analysis)

A SuperRare contract data analysis

SuperRare is one of the most famous digital art marketplaces. In this market, you can purchase at sale price artworks certified on the blockchain, but you can also make open offers or place bids in auctions. Once you own a piece you can resell it in the secondary market.

On the previous article, we’ve learned how to extract and pre-process the SuperRare event dataset (extraction of the event method names and topics) and performed some global basic data analysis.

This last article is dedicated to the data analysis of specific events emitted by its Solidity contracts.

As a recall, here is the how the (df_events) dataset looks like:

df_events.shape

(141409, 9)

df_events[['date', 'eventName', 'topic_1', 'topic_2', 'data']] \
.sample(5)

We’re focusing now on the Mint / AcceptBid / Sold events data analysis.

The “Mint” event

Actually there is no such event. The “Mint” event results from specific transfer event emitted by _mint(address _to, uint256 _tokenId) transactions.

The Transfer event has the following signature:

event Transfer(address indexed _from, address indexed _to, uint256 _tokenId)

The specificity of this Transfer/Mint event is the zero address in the _from parameter. Hence, it notify the creation of a unique token (a piece of art).

The Transfer events with value 0x0 for topic_1 (1st indexed parameter) has been categorized as Mint events in the pre-processing phase (see previous article).
From the Transfer signature, we also deduce that the creator of the token has to be extracted from the topic_2 field (2nd indexed parameter) and the token id from the data field (non indexed parameter).

Let’s build our df_mint_events dataset and extract the creator field:

df_mint_events = df_events.copy()

df_mint_events = df_mint_events[df_mint_events.eventName == "Mint"]
df_mint_events.shape

(17714, 9)

There are 17.714 mint events, hence 17.714 unique tokens.

from hexbytes import HexBytes

def decodeAddress(x):
 return HexBytes(x)[-20:].hex()

df_mint_events['creator'] = df_mint_events['topic_2'].apply(lambda x: \
        decodeAddress(x))
df_mint_events.creator.nunique()

There are 613 distinct creators.

Let’s graph the Mint events count by week:

df_mint_events \
  .set_index("datetime") \
  .resample("1W") \
  .transaction_hash \
  .count() \
  .plot()

For the unusual peak in July 2019, here is an interesting fact to know:
Robbie Barrat created AI Generated Nude Portrait #7 for the held the first ever annual Christies’s Tech Summit in London, which he intended as 300 separate frames of a single artwork. Each of the 300 frames was tokenized separately and added to redeemable ETH gift cards with directions for how to claim the 1/1 token on SuperRare (see The Lost Robbies).

The Sold / AcceptBid events

The Sold and AcceptBid events are emitted by two payable transactions: buy(uint256 _tokenId) for the purchase of the token if there is a sale price (transferring ownership to buyer and paying out the owner) and acceptBid(uint256 _tokenId) for the acceptation of the bid on the token (transferring ownership to the current bidder and paying out the owner).

From the following event signatures:

event Sold(address indexed _buyer, address indexed _seller, uint256 _amount, uint256 indexed _tokenId);

event AcceptBid(address indexed _bidder, address indexed _seller, uint256 _amount, uint256 indexed _tokenId);

we know the buyer has to be extracted from topic_1, the seller from topic_2 and the token id from topic_3 as indexed parameters.
The value of the transaction (the sale price in wei) is extracted from the data field as a non indexed parameter.

df_sale_events = df_events.copy()

df_sale_events = df_sale_events[
  (df_sale_events.eventName=="Sold") 
  | (df_sale_events.eventName=="AcceptBid")
]

df_sale_events.shape

(13021, 9)

Let’s see what is the most used method to purchase artworks (that is Sold vs AcceptBid distribution):

df_sale_events["eventName"] \
  .value_counts(normalize=True)

AcceptBid    0.752861
Sold         0.247139
Name: eventName, dtype: float64

75% of the purchases comes from the acceptBid transactions.

The value of the transaction has been converted in USD then stored in a usPrice field. Let’s graph the total sales by week:

df_sale_stats = df_sale_events \
  .set_index("datetime") \
  .usPrice \
  .resample("1w") \
  .agg(['sum','count'])

df_sale_stats['count'].plot()
df_sale_stats['sum'].plot(secondary_y=True)

Let’s now display the token highest price sales:

ax = df_sale_events[['tokenId','usPrice']] \
.nlargest(10, columns=['usPrice'], keep="first") \
.plot.barh(x="tokenId", y="usPrice")

ax.invert_yaxis()

Here are the links to these artworks:

Primary and Secondary markets Analysis

Let’s now join the mint dataframe (df_mint_events) and the sales dataframe (df_sales_events) to focus on the primary market:

df_mint_sale_events = pd.merge(
  df_mint_events,
  df_sale_events,
  how="inner",
  left_on=["creator", "tokenId"],
  right_on=["seller", "tokenId"])

Let’s graph the sales by month:

df_mint_sale_events.set_index("datetime_y") \
  .usPrice \
  .resample('1m') \
  .sum() \
  .plot()

For the cumulative sum, we get:

df_mint_sale_events \
  .set_index("datetime_y") \
  .sort_index() \
  .usPrice \
  .cumsum()\
  .plot()

On the 3rd of June 2020, the cumulative sum of primary sales hits the 1.000.000 USD. Since then, it’s quite impressive how exponential the growth is.

Note that a describe on the sum of the usPrice by creator shows some huge outliers:

s = df_mint_sale_events.groupby('creator')['usPrice'].sum()
s.describe(percentiles=[0.05, 0.25, 0.75, 0.95, 0.99])

count       525.000000
mean       9468.161905
std       23456.470991
min           1.000000
5%          174.200000
25%        1107.000000
50%        3700.000000
75%        9875.000000
95%       33806.600000
99%       79765.520000
max      418343.000000
Name: usPrice, dtype: float64

and it’s confirmed by the distribution of the usPrice earned by the creators in 20 bins:

s.value_counts(bins=20)

(-417.342, 20918.1]     471
(20918.1, 41835.2]       33
(41835.2, 62752.3]       12
(62752.3, 83669.4]        4
(83669.4, 104586.5]       2
(104586.5, 125503.6]      1
(125503.6, 146420.7]      0
(146420.7, 167337.8]      0
(167337.8, 188254.9]      1
(188254.9, 209172.0]      0
(209172.0, 230089.1]      0
(230089.1, 251006.2]      0
(251006.2, 271923.3]      0
(271923.3, 292840.4]      0
(292840.4, 313757.5]      0
(313757.5, 334674.6]      0
(334674.6, 355591.7]      0
(355591.7, 376508.8]      0
(376508.8, 397425.9]      0
(397425.9, 418343.0]      1
dtype: int64

This binning shows that most of the artists earned less than 20.918 USD and a couple of artists earns more than 104.586 USD with one huge outlier between 397.425 and 418.343 USD.
We can easily guess that those outliers should be worse in the secondary market.

Talking about the secondary market, let’s finish with its analysis:

df_sale_inout_events = pd.merge(
  df_sale_events,
  df_sale_events,
  how="inner",
  left_on=["buyer", "tokenId"],
  right_on=["seller", "tokenId"])

# Avoid back and forth to be counted twice
df_sale_inout_events = \
    df_sale_inout_events[df_sale_inout_events.datetime_y
                         > df_sale_inout_events.datetime_x]
df_sale_inout_events['profit'] = df_sale_inout_events.usPrice_y \
    - df_sale_inout_events.usPrice_x

df_sale_inout_events.shape

(1427, 33)

There is a count of 1427 resales.

Let’s display the most profitable artworks at resale:

ax = df_sale_inout_events[['tokenId','profit']] \
  .nlargest(10, columns=['profit'] ,keep="first") \
  .plot(x="tokenId", kind='barh')
ax.invert_yaxis()

And here they are (note that 10%(?) of the sales is paid to the artist when there is a resale, this hasn’t been taken into account in the final sale price):

Girl Next door Purchase price: 3 ETH (2018–10–28), Sale Price: 38 ETH (2020–11–22)
A message worth repeating Purchase price: 2 ETH (2020–02–27), Sale Price: 37.55 ETH (2020–10–16)
AI Generated Landscape Painting #4 Purchase price: 0.451 ETH ( 2018–04–30), Sale Price: 54.4 ETH (2020–07–10)
AI Generated Nude Portrait #1 Purchase price: 0.46 ETH (2018–04–05), Sale Price: 75 ETH (2020–01–18)
Rubik’s Awe Purchase price: 7 ETH (2020–02–10), Sale Price: 43.25 ETH (2020–07–27)
Bird in the Shell Purchase price: 0.25 ETH (2018–08–03), Sale Price: 26 ETH (2020–08–12)
Mrs. Skull Purchase price: 2 ETH (2019–12–03), Sale Price: 24 ETH (2020–08–04)
Some Asshole Purchase price:0.5 ETG (2018–04–25), Sale Price:25.0 ETH (2020–09–06)
I’ve got the Power Purchase Price: 5.25 ETH (2020–02–27), Sale Price: 45 ETH (2020–05–16)
M87 Black Hole Deconstruction #6 Purchase Price: 0.55 ETH (2019–05–16), Sale Price: 20 ETH (2020–09–19)

There are lots of fun facts to discover with this dataset. I’ll let you now go on.

Disclaimer: The content provided on this article is for informational purposes only and should not be relied upon in connection with a particular investment decision or be construed as an offer, recommendation or solicitation regarding any investment. The author is not endorsing any company, project, or token discussed in this article. All information is presented here “as is,” without warranty of any kind, whether express or implied, and any forward-looking statements may turn out to be wrong.

« Exploring Ethereum blockchain events : Data Pre-processing Image Steganography with Python »