From Idea to Execution: Building an Algorithmic Crypto Trading Bot (Part 2)

Crafting the Foundations — Building Utilities and First Attempts

Teo Miscia
5 min readJust now
Photo by Joshua Mayo on Unsplash

In the previous article, we explored why I started this crypto trading bot project and how I gathered and structured historical data. Now it’s time to see how everything fits together. This second part focuses on the behind-the-scenes utilities — like the data loading pipeline and the initial attempt to train and validate a trading strategy.

Why Focus on Utilities?

Building a trading bot isn’t just about writing a fancy algorithm. You need a robust foundation to handle:

  • Data loading and preprocessing
  • Configuration management (so you don’t hardcode everything)
  • Logging of runs, errors, and performance metrics
  • Backtesting so you can gauge how well your bot might do in real markets

I discovered these essentials the hard way. The more my project grew, the more crucial it became to have reusable utilities I could rely on — especially once I moved toward distributed training (we’ll get to that in Part 4).

DataLoader: The Heart of Data Management

One of the first utilities I built was a dedicated class to manage data import, resampling, and indicator calculations. Here’s a snippet from src/data_loader.py that shows how I load 1-minute data and resample it into multiple timeframes:

# data_loader.py (excerpt)
import pandas as pd
import pandas_ta as ta
class DataLoader:
def __init__(self):
self.tick_data = {}
self.timeframes = ['1min', '5min', '15min', '30min', '1h', '4h', '1d']
self.base_timeframe = '1min'
self.data_folder = 'output_parquet/'
def import_ticks(self):
tick_data_1m = pd.read_parquet(
os.path.join(self.data_folder, 'BTCUSDT-tick-1min.parquet')
)
tick_data_1m['timestamp'] = pd.to_datetime(tick_data_1m['timestamp'])
tick_data_1m.set_index('timestamp', inplace=True)
tick_data_1m.sort_index(inplace=True)
# (Additional date filtering omitted for brevity)
self.tick_data[self.base_timeframe] = tick_data_1m
def resample_data(self):
base_data = self.tick_data[self.base_timeframe]
for tf in self.timeframes:
if tf == self.base_timeframe:
continue
resampled_data = base_data.resample(tf).agg({
'open': 'first',
'high': 'max',
'low': 'min',
'close': 'last',
'volume': 'sum'
}).dropna()
self.tick_data[tf] = resampled_data

Memory Constraints

One challenge I faced early on was RAM usage. Loading a full year of 1-second or 1-minute data with dozens of indicators can easily exceed the limits of a typical desktop. To mitigate this:

  • I used Parquet (columnar format) to store compressed data.
  • I selectively loaded timeframes (not all at once).
  • I used distributed methods later when I needed to scale (Part 4 discusses that in detail).

Configuration: A Single Source of Truth

Rather than hardcode dates, file paths, or hyperparameters, I created a YAML configuration file and a helper class Config in src/config_loader.py:

# config_loader.py (excerpt)
import yaml
import os
class Config:
def __init__(self, config_path: str = 'config.yaml'):
if not os.path.exists(config_path):
raise FileNotFoundError(f"Configuration file {config_path} not found.")
with open(config_path, 'r') as file:
self.config = yaml.safe_load(file)
def get(self, section: str, key: str = None):
if key:
return self.config.get(section, {}).get(key)
return self.config.get(section, {})

This approach let me quickly change the start or end date, tweak model thresholds, or adjust timeframes without digging through multiple scripts.

First Attempt at a Trading Strategy

At this stage, I wanted a neural network (or simple model) that looked at indicators and decided whether to buy or sell. The naive approach:

  1. Two Models — One model for “buy signals,” another for “sell signals.”
  2. Threshold — If the buy model output was above a threshold, we went long. If the sell model output was above a threshold, we closed or reversed the position.

Snippet: TradingStrategy First Iteration

Below is a condensed excerpt of the TradingStrategy class from src/trading_strategy.py, illustrating how I attempted to manage buy/sell signals:

# trading_strategy.py (excerpt)
class TradingStrategy:
def __init__(self, data_loader, config, model_buy, model_sell):
self.data_loader = data_loader
self.config = config
self.model_buy = model_buy
self.model_sell = model_sell
self.threshold_buy = config.get('threshold_buy', 0.6)
self.threshold_sell = config.get('threshold_sell', 0.6)
# Set up initial balances, last price, etc.
self.initial_balance = config.get('initial_balance', 100000000)
self.available_balance = self.initial_balance
# More initialization...
def calculate_profit(self):
# Simulate trades
while self.last_price >= 0:
self.read_next_trade()
if self.last_price == -1:
return self.write_orders()
# Evaluate indicators for this timestep
if self.current_timestamp in self.features.index:
indicator_values = self.features.loc[self.current_timestamp].values
buy_prob = self.analyze_buy(indicator_values)
sell_prob = self.analyze_sell(indicator_values)
if self.current_order['quantity'] == 0 and buy_prob > self.threshold_buy:
self.place_order_buy(self.order_qty)
elif self.current_order['quantity'] != 0 and sell_prob > self.threshold_sell:
self.place_order_sell(abs(self.current_order['quantity']))
return self.write_orders()

Why This Approach Stalled

I ran into a few hurdles:

  • Training Issues: It wasn’t producing reliable buy/sell signals. The model needed better labeled data or a more robust reinforcement approach.
  • Indicator Explosion: Loading too many indicators simultaneously bogged down memory and CPU time, especially for year-long data.
  • Overfitting: The model occasionally memorized segments of historical data, leading to poor generalization.

Validating Performance

Even with an imperfect strategy, I still needed a quick way to measure performance. My validation consisted of:

  1. Genetic Algorithm Fitness: Each “individual” in the GA population represented a unique set of indicator parameters (e.g., RSI length, MACD fast/slow).
  2. Neural Network-Level Backtest: For each individual’s configuration, I ran a simulation. The Simulation class in src/simulation.py simply calls the strategy’s calculate_profit():
# src/simulation.py

class Simulation:
def __init__(self, strategy):
self.strategy = strategy

def run(self):
result = self.strategy.calculate_profit()
return result

Result: The final account balance (or total profit) after the backtest became the “fitness” metric.

Early Results

  • The GA framework generated many parameter sets, but the neural net didn’t converge on meaningful buy/sell signals. I had to re-think how to combine these two methods, ultimately deciding to switch from the geneticalgorithm library to DEAP and from a naive feed-forward approach to a reinforcement learning agent (more on that in Part 3).

Key Libraries & Dependencies

  • pandas & pandas_ta: For data manipulation and indicator calculation.
  • geneticalgorithm (later replaced by DEAP): Handled parameter tuning via evolutionary approaches.
  • PyArrow & Parquet: Efficient file formats for reading/writing large datasets.
  • NumPy: For numerical computations and array manipulations.
  • SciKit-Learn: For initial “buy vs. sell” classification models (e.g., LogisticRegression).

Lessons Learned (So Far)

  1. Utility Organization: Creating separate files/classes (data_loader.py, config_loader.py, etc.) saved me endless hours of troubleshooting.
  2. Memory Limitations Are Real: Precomputing indicators for large timeframes can be resource-intensive.
  3. Naive Buy/Sell Approaches: Basic threshold-based neural nets may not be enough. A more advanced or dedicated RL approach can pay off.
  4. Logging: Although not shown in detail here, logging to a dedicated file or console stream is invaluable for debugging runs (especially when you move to multi-processing or distributed setups).

Wrapping Up Part 2

In this article, we covered how I set up the foundational utilities — data loading, configuration management, and my first attempt at a naive buy/sell neural network. Despite early limitations, these structures proved essential as the project evolved.

Up Next (Part 3): I’ll share how I swapped out the geneticalgorithm library for DEAP and built a more sophisticated reinforcement learning agent. That pivot changed the entire trajectory of the bot’s evolution—one step closer to a truly adaptive trading system!

What’s Next?

  • Part 3: Leveling Up — Replacing Genetic Algorithm with DEAP and Reinforcement Learning
  • Part 4: Scaling Up — Distributed Training for Maximum Efficiency

Have questions about structuring your utilities or want to share your own first attempts? Feel free to drop a comment below!

--

--

Teo Miscia
Teo Miscia

Written by Teo Miscia

I’m a freelance backend developer. I’m using Laravel as my daily driver since 2015. Alway finding new solutions to the problems I face in my daily work

No responses yet