Preface: This is the first in a series of 3 articles dedicated to the analysis of the Ethereum blockchain events:
Part I : Data scraping (fetching and decoding data)
Part II: Data pre-processing (using the SuperRare contract as an example)
Part III: Data analysis (a SuperRare contract data analysis)
The code samples provided in this article are written in python (except for the Solidity contract example) but you can easily translate them into your favorite language.
This article assumes you have some knowledge of the Ethereum smart contract mechanisms. A good resource to get started is the Solidity Introduction to smart contracts.
Understanding events on the blockchain
Events in Solidity are used to track the execution of a transaction that’s sent to a contract. Along with the data they carry, they are recorded as special transaction logs on the blockchain.
Usually events provide relevant information on stage changes. They can be subscribed to and they declare indexed fields that can be searched.
Hence they are not only a way to log relevant data to the blockchain but also a way to facilitate interactions between smart contracts and their user interfaces which are listening for those events.
Let’s start our exploration by implementing a Solidity DummyContract with a transfer function emitting a Transfer event:
contract DummyContract {
event Transfer(address indexed _from, address indexed _to, uint256 _tokenId);
function transfer(address _to, uint256 _tokenId) public {
emit Transfer(msg.sender, _to, _tokenId);
}
}
In the example above, the first parameter of the Transfer event is the sender (_from
), the second the receiver (_to
), the third the id of the object to transfer (_tokenId
).
After the blockchain accepts a transaction, the transaction receipt will hold the events data in the logs field.
Lets’ see what the transaction receipt looks like for a DummyContract transfer transaction with hash 0xf08d18fb269a0513ab0f8efd47ae4f884aea35b2d59559fad0a6fff27c2c9f48:
curl -H "Content-Type: application/json" -X POST --data '{"jsonrpc":"2.0","method":"eth_getTransactionReceipt","params":["0xf08d18fb269a0513ab0f8efd47ae4f884aea35b2d59559fad0a6fff27c2c9f48"],"id":1}' http://localhost:8545
One event has been emitted (one entry in the logs field). The parameters passed to the Transfer event are stored in the topics and data fields. We will cover soon how to decode them.
Fetching events
The easiest way to fetch some events is by getting a sample from the etherscan.io API.
Get your api key and grab the contract address you’re interested in, then:
import requests
import json
url = 'https://api.etherscan.io/api'
params = {
'module': 'logs',
'action': 'getLogs',
'address': CONTRACT_ADDRESS,
'apikey': API_KEY
}
r = requests.get(url, params=params)
json_data = json.loads(r.text)["result"]
You will fetch the 1000 first events for the given contract you’ve choose. See https://etherscan.io/apis#logs for details on query parameters. Note: The API has a rate limit of 5 calls per sec/IP, see Etherscan API Doc.
Below a screenshot of the fetched data loaded in a panda dataframe (just for a pretty-print purpose):
import pandas as pd
df = pd.json_normalize(json_data)
df[["topics", "data", "timeStamp", "transactionHash", ]].head()
Note: In the next article (the data pre-processing part of this series), I’ll describe how to get all contract events from a BigQuery public dataset along with some tips to reduce query cost.
Understanding event content
Let’s detail the most important fields of our sample: timeStamp, transactionHash, topics and data.
The timeStamp
field refers to the block timestamp of the transaction where the event was emitted.
The transactionHash
field refers to the hash of the transaction.
The topics
(array) field encode the event method and its indexed keys/parameters. There is a maximum of 3 topics.
data
encodes additional data.
The first element of the topics
array consists of the signature (a keccak256 hash) of the event method name, including the types (uint256, address, string, etc.) in hexadecimal.
For example, our Transfer(address,address,uint256)
will have the following signature: 0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef
Let’s check this with a piece of code:
from Crypto.Hash import keccak
k = keccak.new(digest_bits=256)
k.update(b'Transfer(address,address,uint256)')
print('0x'+k.hexdigest())
0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef
The next elements in the topics
array along with the data
field encodes the value parameters passed to the event method, respectively indexed and not indexed value parameters.
Let’s check this with our DummyContract example. Let’s say you initiate a transfer transaction to transfer the token with ID 9343 to the address 0x70997970C51812dc3A010C7d01b50e0d17dc79C8.
Remember how I defined the Transfer event emitted by the transfer function in the contract:
event Transfer(address indexed _from, address indexed _to, uint256 _tokenId)
Hence there will be 3 topics:
-
topic0 : the Keccak-256 hash of the ASCII form of the signature Transfer(address,address,uint256):
0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef
-
topic1 : the
_from
address parameter, that is the address calling the transfer function (your address), a 20 bytes hexadecimal value padded to 32 bytes:0x000000000000000000000000f39fd6e51aad88f6f4ce6ab8827279cfffb92266
-
topic2 : the
_to
address parameter:0x00000000000000000000000070997970c51812dc3a010c7d01b50e0d17dc79c8
The data encodes the non indexed parameter, the _tokenid
, a uint256 value 9343 (in hexadecimal) padded to 32 bytes:
0x000000000000000000000000000000000000000000000000000000000000247f
And this is exactly the values you see in the screenshot of the transaction receipt in the previous section:
"data": "0x00000000000000000000000000000000000000000000000000000000000004d2",
"topics": [
"0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
"0x000000000000000000000000f39fd6e51aad88f6f4ce6ab8827279cfffb92266",
"0x00000000000000000000000070997970c51812dc3a010c7d01b50e0d17dc79c8"
]
Decoding event keys/parameters
Since the goal of this article is to deep dive into data carried by events, you won’t use libraries like web3.py for decoding event parameters into human-readable information (anyway, I find web3.py to be rather tedious to use in this context compared to web3.js).
From the event signature, you have the most important information : the type of each parameter so you know how to decode them by converting them in the right format. The last missing pieces of information you may have deducted from the previous section are the following:
- There is only 256-bit (32-bytes) words
- Bytes and strings are right-padded otherwise types are left-padded
- Everything is expressed in hexadecimal format
Let’s start with the easiest one, decoding integers. For this, just use the int function:
int("0x00000000000000000000000000000000000000000000000000000000000004d2", 16)
1234
For addresses, don’t decode them, Ethereum addresses are 20 bytes hexadecimal numbers. Just extract the last 20 bytes from the 32 bytes word. With the help of a thin wrapper called HexBytes to easily back and forth between bytes and hexadecimals, a piece of code like this should do the trick:
from hexbytes import HexBytes
s = "0x00000000000000000000000070997970c51812dc3a010c7d01b50e0d17dc79c8"
HexBytes(s)[-20:].hex()
'0x70997970c51812dc3a010c7d01b50e0d17dc79c8'
For boolean, boolean is an unsigned integer of value zero (false) or one (true). Just decode the integer and cast into boolean.
Note that the data
field can hold a concatenation of values. To get each value, chunk every 32 bytes and decode each part according to their respecting types.
Below an example, with a data field holding 2 integers:
def chunkData(l, n):
return list((l[i:i+n] for i in range(0, len(l), n)))
data = "0x000000000000000000000000000000000000000000000000016345785d8a00000000000000000000000000000000000000000000000000000000000000001ce6"
print([int(x.hex(),16) for x in chunkData(HexBytes(data), 32)])
[100000000000000000, 7398]
Notice that I’ve talked about data
as a concatenation of values and not a array of values, so there is no confusion with the array data types.
Talking about arrays, decoding dynamic types is the tricky part.
Bytes, strings, arrays can be of dynamic type. A dynamic type is encoding the following way: In the part dedicated to a given dynamic parameter, the first word will hold the offset where the parameter starts, then at this offset, a first word for the length of the value of the parameter followed by the parameter value encoded on one or more words.
I’ll try to illustrate this with an example. Suppose an emission of the event DummyEvent(address, bytes)
with a random address and a ipfs url as parameters. The data field value in the transaction receipt will be:
0x000000000000000000000000f39fd6e51aad88f6f4ce6ab8827279cfffb92266
0000000000000000000000000000000000000000000000000000000000000040
000000000000000000000000000000000000000000000000000000000000004a
68747470733a2f2f697066732e7069787572612e696f2f697066732f516d5038
6f3578335961564e36544377555075685369516861706a356450333143416967
45326d7a743362335a4200000000000000000000000000000000000000000000
Let’s read each 32-bytes words.
First word is the value of the first parameter, the address:
0x000000000000000000000000f39fd6e51aad88f6f4ce6ab8827279cfffb92266
Second word is the offset of the second parameter (0x40, that is 64 bytes): 0x0000000000000000000000000000000000000000000000000000000000000040
Third word (at location 0x40) is the length of the second parameter (0x4a, that is 74 bytes):
0x000000000000000000000000000000000000000000000000000000000000004a
The remaining words encode the ipfs url.
Let’s decode the ipfs url this with the following code:
data = HexBytes(data)
# Get the offset of the 2nd parameter
start = int(data[32:64].hex(), 16)
# Go to the offset and extract the size of the value of the 2nd parameter (encoded in the first 32-bytes starting at offset)
size = int(data[start:][:32].hex(), 16)
# Go to offset+32 bytes, extract and decode into string
print(data[start+32:][:size].decode('utf-8'))
'https://ipfs.pixura.io/ipfs/QmP8o5x3YaVN6TCwUPuhSiQhapj5dP31CAigE2mzt3b3ZB'
Note: Since topics can only hold a maximum of 32 bytes of data, objects like strings, bytes, arrays or struct are usually not indexed. Otherwise their Keccak-256 hash will be stored as a topic instead (see Encoding of indexed event parameters)
For more information and examples, see Use of Dynamic Types in the Solidity Contract ABI Specification.
Getting the contract event specifications (ABI)
So the key in decoding event data is getting the event method signature.
Obviously if you have access to the contract, you have access to the signatures. But beside getting and reading the contract, there is a more convenient way to get the schema of a contract which is getting its ABI. In Solidity contracts, the ABI (application binary interface) is the interface that defines method for encoding/decoding data into/out of the machine code. It lists how functions/events are called and in which binary format (see Contract ABI Specification).
The JSON representation of my DummyContract looks like this:
[
{
"anonymous": false,
"inputs": [
{
"indexed": true,
"name": "_from",
"type": "address"
},
{
"indexed": true,
"name": "_to",
"type": "address"
},
{
"indexed": false,
"name": "_tokenId",
"type": "uint256"
}
],
"name": "Transfer",
"type": "event"
},
{
"inputs": [
{
"name": "_to",
"type": "address"
},
{
"name": "_tokenId",
"type": "uint256"
}
],
"name": "transfer",
"outputs": [],
"stateMutability": "nonpayable",
"type": "function"
}
]
For a given contract address, you can find the ABI on Etherscan or with a call to their API:
https://api.etherscan.io/api?module=contract&action=getabi&address=CONTRACT_ADRESS&apikey=API_KEY
Unfortunately, this isn’t always reliable, as the uploaded ABI may be out of date.
If you work on a smart contract with a public project, you can download that project and get the ABI by using truffleor using solc.
If the ABI is not available, there is tools online to look up for event signatures like https://www.4byte.directory/event-signatures/.
In the worst case scenario (smart contract as byte code), you can explore event transactions with the transaction hashes and get the event signatures on Etherscan website (under the Logs tab) or via their API.
For going even deeper into blockchain events, this article is pretty good: Understanding event logs on the Ethereum blockchain
References:
https://docs.soliditylang.org/en/latest/introduction-to-smart-contracts.html
https://docs.soliditylang.org/en/latest/contracts.html#events
https://docs.soliditylang.org/en/v0.5.3/abi-spec.html#types
https://docs.soliditylang.org/en/v0.5.3/abi-spec.html
https://medium.com/mycrypto/understanding-event-logs-on-the-ethereum-blockchain-f4ae7ba50378
Tools:
https://www.4byte.directory/event-signatures/
https://emn178.github.io/online-tools/keccak_256.html