Navigation auf


Department of Informatics Blockchain and Distributed Ledger Technologies

Blockchain Data Parser and Storage

Level: MAP
Responsible Person: Mostafa Chegeni
Keywords: UTXO-based blockchains, Data parser, BlockSci library
Programming Skills: C++, Python, PostgreSQL

Blockchain technology has gained significant attention in recent years due to its decentralized and transparent nature. Extracting and analyzing data from blockchain networks can provide valuable insights for various applications, including forensic investigations, market analysis, and security research.
This project aims to develop a blockchain data parser in Python and store the parsed data in a PostgreSQL database. The parser will be based on the blockchain analysis library BlockSci [1-3], which is written in C++. This project will provide an accessible and efficient solution for blockchain data analysis, contribute to the advancement of blockchain research, and enable the development of applications that rely on blockchain data.

The primary objectives of this project are as follows:
a. Extract the blockchain data parser from BlockSci.
b. Implement a blockchain data parser in Python based on the extracted parser. c. Develop a PostgreSQL database to store the parsed data.
d. Parse and store the data of UTXO-based blockchains such as Bitcoin and Litecoin.

The project will be divided into the following steps:
Step 1: Extracting the Blockchain Data Parser from BlockSci
a. Perform a detailed study of the BlockSci library to understand its architecture and functionality.
b. Identify the key components and algorithms used in the BlockSci data parser.
c. Extract the relevant code sections from the BlockSci library.

Step 2: Implementing a Blockchain Data Parser in Python
a. Design a Python-based data parser architecture that replicates the functionality of the extracted code from BlockSci.
b. Utilize Python libraries and modules, such as Pandas and NumPy, to handle data structures and perform efficient data parsing.
c. Develop Python data parsing algorithms that can handle UTXO-based blockchains like Bitcoin and Litecoin.
d. Test the Python-based parser extensively on sample blockchain data to ensure accurate and efficient parsing.

Step 3: Developing a PostgreSQL Database to Store the Parsed Data
a. Set up a PostgreSQL database to store the parsed blockchain data.
b. Design the schema and tables of the database, leveraging the data organization principles utilized in Cardano DB Sync [4], a popular blockchain data synchronization tool for the Cardano blockchain.
Note: The data organization principles employed in Cardano DB Sync will guide the design of the schema and tables, ensuring compatibility and alignment with the data storage and retrieval requirements of the project.

Step 4: Parsing and Storing Data in PostgreSQL Database
a. Develop Python scripts to populate the PostgreSQL database with the parsed blockchain data, ensuring that the data is stored in a format compatible with the inspired data organization approach.
b. Implement indexing and optimization techniques to enhance query performance for data retrieval and analysis within the data organization framework.

The project timeline is estimated as follows:
- Week 1-2: Study and understand BlockSci library and its data parser.
- Week 3-4: Extract and validate the blockchain data parser from BlockSci.
- Week 5-8: Design and implement a Python-based data parser.
- Week 9-12: Set up the PostgreSQL database and design the schema.
- Week 13-16: Develop Python scripts for parsing and storing data in the database.
- Week 17-20: Test the system and optimize performance.
- Week 21-24: Prepare documentation and final report.

[1] Kalodner, H., Möser, M., Lee, K., Goldfeder, S., Plattner, M., Chator, A. and Narayanan, A., 2020. Blocksci: Design and applications of a blockchain analysis platform. In 29th USENIX Security Symposium.
[4] dbsync/blob/master/doc/