1
0
Fork 0
No description
  • Python 90.7%
  • Shell 9.3%
Find a file
Florian Pallas c1c2727b8a
All checks were successful
ci/woodpecker/push/ci Pipeline was successful
add pipx dispatch
2026-05-15 21:17:39 +02:00
.woodpecker use --no-sync to respect extras selection 2026-05-07 17:23:47 +02:00
eval use --no-sync to respect extras selection 2026-05-07 17:23:47 +02:00
pre-trilemma add new threaded tcp backend 2026-05-15 21:17:39 +02:00
src/decentralizepy add pipx dispatch 2026-05-15 21:17:39 +02:00
tests add new threaded tcp backend 2026-05-15 21:17:39 +02:00
tutorial move pre-trilemma specifics to own folder 2026-05-08 01:13:14 +02:00
.gitignore write hosts.txt to root on execute 2026-05-08 18:14:44 +02:00
.python-version pin python version 2026-05-08 13:00:06 +02:00
AGENTS.md Update AGENTS.md to include modular aggregations 2026-05-08 01:14:12 +02:00
download_dataset.py Added validation set on CIFAR-10 dataset 2023-06-15 12:06:40 +02:00
generate_graph.py Added validation set on CIFAR-10 dataset 2023-06-15 12:06:40 +02:00
install_nMachines.sh update shell scripts to use uv 2026-05-03 16:38:59 +02:00
justfile add justfile 2026-05-08 17:39:21 +02:00
LICENSE Add license 2023-02-20 13:00:08 +01:00
pyproject.toml add pipx dispatch 2026-05-15 21:17:39 +02:00
README.md use --no-sync to respect extras selection 2026-05-07 17:23:47 +02:00
remotes.example.json add runner protocol 2026-05-08 17:04:29 +02:00
split_into_files.py Reddit 2022-03-23 14:25:03 +00:00
uv.lock add runner protocol 2026-05-08 17:04:29 +02:00

EPFL logo

decentralizepy

decentralizepy is a framework for running distributed applications (particularly ML) on top of arbitrary topologies (decentralized, federated, parameter server). It was primarily conceived for assessing scientific ideas on several aspects of distributed learning (communication efficiency, privacy, data heterogeneity etc.).

Setting up decentralizepy

  • Fork the repository.

  • Clone and enter your local repository.

  • Check if you have uv installed:

    uv --version
    
  • If not, install uv:

    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  • Install decentralizepy for development (choose PyTorch profile: cpu, cuda, or rocm):

    # For CPU-only PyTorch
    uv sync --extra dev --extra cpu
    
    # For CUDA (NVIDIA GPU) PyTorch
    uv sync --extra dev --extra cuda
    
    # For ROCm (AMD GPU) PyTorch
    uv sync --extra dev --extra rocm
    
  • Download CIFAR-10 using download_dataset.py:

    uv run --no-sync python download_dataset.py
    
  • (Optional) Download other datasets from LEAF and place them in eval/data/.

Running the code

  • Follow the tutorial in tutorial/. OR,

  • Generate a new graph file with the required topology using generate_graph.py:

    uv run --no-sync python generate_graph.py --help
    
  • Choose and modify one of the config files in eval/{step,epoch}_configs.

  • Modify the dataset paths and addresses_filepath in the config file.

  • In eval/run.sh, modify arguments as required.

  • Execute eval/run.sh on all the machines simultaneously. There is a synchronization barrier mechanism at the start so that all processes start training together.

Linting and Type Checking

  • ruff is used for linting and formatting. Run it with:

    uv run --no-sync ruff check .
    uv run --no-sync ruff format .
    
  • basedpyright is used for type checking. Configure in pyproject.toml under [tool.basedpyright].

Citing

Cite us as:

@inproceedings{decentralizepy,
    author = {Dhasade, Akash and Kermarrec, Anne-Marie and Pires, Rafael and Sharma, Rishi and Vujasinovic, Milos},
    title = {Decentralized Learning Made Easy with DecentralizePy},
    year = {2023},
    isbn = {9798400700842},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3578356.3592587},
    doi = {10.1145/3578356.3592587},
    booktitle = {Proceedings of the 3rd Workshop on Machine Learning and Systems},
    pages = {34-41},
    numpages = {8},
    keywords = {peer-to-peer, distributed systems, machine learning, middleware, decentralized learning, network topology},
    location = {Rome, Italy},
    series = {EuroMLSys '23}
}

Built with DecentralizePy

Epidemic Learning

  • Tutorial: tutorial/EpidemicLearning
  • Source files: src/node/EpidemicLearning/
  • Epidemic Learning paper
  • Cite: "Martijn de Vos, Sadegh Farhadkhani, Rachid Guerraoui, Anne-Marie Kermarrec, Rafael Pires, and Rishi Sharma. Epidemic Learning: Boosting Decentralized Learning with Randomized Communication. In Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023."

Get More for Less in Decentralized Learning Systems

  • Tutorial: tutorial/JWINS
  • Source files: src/sharing/JWINS/
  • Get More for Less in Decentralized Learning Systems paper
  • Cite: "Akash Dhasade, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma, Jeffrey Wigger, and Milos Vujasinovic. Get More for Less in Decentralized Learning Systems. In IEEE 43rd International Conference on Distributed Computing Systems (ICDCS), 2023."

Contributing

  • Use ruff for linting and formatting (configured in pyproject.toml).

  • While in the root directory of the repository, before committing changes, run:

    uv run --no-sync ruff check --fix .
    uv run --no-sync ruff format .
    

Modules

Following are the modules of decentralizepy:

Node

  • The Manager. Optimizations at process level.

Dataset

  • Static

Training

  • Heterogeneity. How much do I want to work?

Graph

  • Static. Who are my neighbours? Topologies.

Mapping

  • Naming. The globally unique ids of the processes <-> machine_id, local_rank

Sharing

  • Leverage Redundancy. Privacy. Optimizations in model and data sharing.

Communication

  • IPC/Network level. Compression. Privacy. Reliability

Model

  • Learning Model