From Idea to Execution: Building an Algorithmic Crypto Trading Bot (Part 2)

Crafting the Foundations — Building Utilities and First Attempts

5 min readJust now

In the previous article, we explored why I started this crypto trading bot project and how I gathered and structured historical data. Now it’s time to see how everything fits together. This second part focuses on the behind-the-scenes utilities — like the data loading pipeline and the initial attempt to train and validate a trading strategy.

Why Focus on Utilities?

Building a trading bot isn’t just about writing a fancy algorithm. You need a robust foundation to handle:

Data loading and preprocessing
Configuration management (so you don’t hardcode everything)
Logging of runs, errors, and performance metrics
Backtesting so you can gauge how well your bot might do in real markets

I discovered these essentials the hard way. The more my project grew, the more crucial it became to have reusable utilities I could rely on — especially once I moved toward distributed training (we’ll get to that in Part 4).

DataLoader: The Heart of Data Management

One of the first utilities I built was a dedicated class to manage data import, resampling, and indicator calculations. Here’s a snippet from src/data_loader.py that shows how I load 1-minute data and resample it into multiple timeframes:

# data_loader.py (excerpt)
import pandas as pd
import pandas_ta as ta
class DataLoader:
    def __init__(self):
        self.tick_data = {}
        self.timeframes = ['1min', '5min', '15min', '30min', '1h', '4h', '1d']
        self.base_timeframe = '1min'
        self.data_folder = 'output_parquet/'
    def import_ticks(self):
        tick_data_1m = pd.read_parquet(
            os.path.join(self.data_folder, 'BTCUSDT-tick-1min.parquet')
        )
        tick_data_1m['timestamp'] = pd.to_datetime(tick_data_1m['timestamp'])
        tick_data_1m.set_index('timestamp', inplace=True)
        tick_data_1m.sort_index(inplace=True)
        # (Additional date filtering omitted for brevity)
        self.tick_data[self.base_timeframe] = tick_data_1m
    def resample_data(self):
        base_data = self.tick_data[self.base_timeframe]
        for tf in self.timeframes:
            if tf == self.base_timeframe:
                continue
            resampled_data = base_data.resample(tf).agg({
                'open': 'first',
                'high': 'max',
                'low': 'min',
                'close': 'last',
                'volume': 'sum'
            }).dropna()
            self.tick_data[tf] = resampled_data

Memory Constraints

One challenge I faced early on was RAM usage. Loading a full year of 1-second or 1-minute data with dozens of indicators can easily exceed the limits of a typical desktop. To mitigate this:

I used Parquet (columnar format) to store compressed data.
I selectively loaded timeframes (not all at once).
I used distributed methods later when I needed to scale (Part 4 discusses that in detail).

Configuration: A Single Source of Truth

Rather than hardcode dates, file paths, or hyperparameters, I created a YAML configuration file and a helper class Config in src/config_loader.py:

# config_loader.py (excerpt)
import yaml
import os
class Config:
    def __init__(self, config_path: str = 'config.yaml'):
        if not os.path.exists(config_path):
            raise FileNotFoundError(f"Configuration file {config_path} not found.")
        with open(config_path, 'r') as file:
            self.config = yaml.safe_load(file)
    def get(self, section: str, key: str = None):
        if key:
            return self.config.get(section, {}).get(key)
        return self.config.get(section, {})

This approach let me quickly change the start or end date, tweak model thresholds, or adjust timeframes without digging through multiple scripts.

First Attempt at a Trading Strategy

At this stage, I wanted a neural network (or simple model) that looked at indicators and decided whether to buy or sell. The naive approach:

Two Models — One model for “buy signals,” another for “sell signals.”
Threshold — If the buy model output was above a threshold, we went long. If the sell model output was above a threshold, we closed or reversed the position.

Snippet: `TradingStrategy` First Iteration

Below is a condensed excerpt of the TradingStrategy class from src/trading_strategy.py, illustrating how I attempted to manage buy/sell signals:

# trading_strategy.py (excerpt)
class TradingStrategy:
    def __init__(self, data_loader, config, model_buy, model_sell):
        self.data_loader = data_loader
        self.config = config
        self.model_buy = model_buy
        self.model_sell = model_sell
        self.threshold_buy = config.get('threshold_buy', 0.6)
        self.threshold_sell = config.get('threshold_sell', 0.6)
        # Set up initial balances, last price, etc.
        self.initial_balance = config.get('initial_balance', 100000000)
        self.available_balance = self.initial_balance
        # More initialization...
    def calculate_profit(self):
        # Simulate trades
        while self.last_price >= 0:
            self.read_next_trade()
            if self.last_price == -1:
                return self.write_orders()
            # Evaluate indicators for this timestep
            if self.current_timestamp in self.features.index:
                indicator_values = self.features.loc[self.current_timestamp].values
                buy_prob = self.analyze_buy(indicator_values)
                sell_prob = self.analyze_sell(indicator_values)
                if self.current_order['quantity'] == 0 and buy_prob > self.threshold_buy:
                    self.place_order_buy(self.order_qty)
                elif self.current_order['quantity'] != 0 and sell_prob > self.threshold_sell:
                    self.place_order_sell(abs(self.current_order['quantity']))
        return self.write_orders()

Why This Approach Stalled

I ran into a few hurdles:

Training Issues: It wasn’t producing reliable buy/sell signals. The model needed better labeled data or a more robust reinforcement approach.
Indicator Explosion: Loading too many indicators simultaneously bogged down memory and CPU time, especially for year-long data.
Overfitting: The model occasionally memorized segments of historical data, leading to poor generalization.

Validating Performance

Even with an imperfect strategy, I still needed a quick way to measure performance. My validation consisted of:

Genetic Algorithm Fitness: Each “individual” in the GA population represented a unique set of indicator parameters (e.g., RSI length, MACD fast/slow).
Neural Network-Level Backtest: For each individual’s configuration, I ran a simulation. The Simulation class in src/simulation.py simply calls the strategy’s calculate_profit():

# src/simulation.py

class Simulation:
    def __init__(self, strategy):
        self.strategy = strategy

    def run(self):
        result = self.strategy.calculate_profit()
        return result

Result: The final account balance (or total profit) after the backtest became the “fitness” metric.

Early Results

The GA framework generated many parameter sets, but the neural net didn’t converge on meaningful buy/sell signals. I had to re-think how to combine these two methods, ultimately deciding to switch from the geneticalgorithm library to DEAP and from a naive feed-forward approach to a reinforcement learning agent (more on that in Part 3).

Key Libraries & Dependencies

pandas & pandas_ta: For data manipulation and indicator calculation.
geneticalgorithm (later replaced by DEAP): Handled parameter tuning via evolutionary approaches.
PyArrow & Parquet: Efficient file formats for reading/writing large datasets.
NumPy: For numerical computations and array manipulations.
SciKit-Learn: For initial “buy vs. sell” classification models (e.g., LogisticRegression).

Lessons Learned (So Far)

Utility Organization: Creating separate files/classes (data_loader.py, config_loader.py, etc.) saved me endless hours of troubleshooting.
Memory Limitations Are Real: Precomputing indicators for large timeframes can be resource-intensive.
Naive Buy/Sell Approaches: Basic threshold-based neural nets may not be enough. A more advanced or dedicated RL approach can pay off.
Logging: Although not shown in detail here, logging to a dedicated file or console stream is invaluable for debugging runs (especially when you move to multi-processing or distributed setups).

Wrapping Up Part 2

In this article, we covered how I set up the foundational utilities — data loading, configuration management, and my first attempt at a naive buy/sell neural network. Despite early limitations, these structures proved essential as the project evolved.

Up Next (Part 3): I’ll share how I swapped out the geneticalgorithm library for DEAP and built a more sophisticated reinforcement learning agent. That pivot changed the entire trajectory of the bot’s evolution—one step closer to a truly adaptive trading system!

What’s Next?

Part 3: Leveling Up — Replacing Genetic Algorithm with DEAP and Reinforcement Learning
Part 4: Scaling Up — Distributed Training for Maximum Efficiency

Have questions about structuring your utilities or want to share your own first attempts? Feel free to drop a comment below!

From Idea to Execution: Building an Algorithmic Crypto Trading Bot (Part 2)

Crafting the Foundations — Building Utilities and First Attempts

Why Focus on Utilities?

DataLoader: The Heart of Data Management

Memory Constraints

Configuration: A Single Source of Truth

First Attempt at a Trading Strategy

Snippet: `TradingStrategy` First Iteration

Why This Approach Stalled

Validating Performance

Early Results

Key Libraries & Dependencies

Lessons Learned (So Far)

Wrapping Up Part 2

What’s Next?

Written by Teo Miscia

No responses yet

From Idea to Execution: Building an Algorithmic Crypto Trading Bot (Part 2)

Crafting the Foundations — Building Utilities and First Attempts

Why Focus on Utilities?

DataLoader: The Heart of Data Management

Memory Constraints

Configuration: A Single Source of Truth

First Attempt at a Trading Strategy

Snippet: TradingStrategy First Iteration

Why This Approach Stalled

Validating Performance

Early Results

Key Libraries & Dependencies

Lessons Learned (So Far)

Wrapping Up Part 2

What’s Next?

Written by Teo Miscia

No responses yet

Snippet: `TradingStrategy` First Iteration