Friend-Or-Foe

Welcome to the Friend or Foe page!

HuggingFace bioRxiv

Logo

FriendOrFoe is a collection of environmental datasets obtained from metabolic modeling of microbial communities AGORA and CARVEME. FriendOrFoe gathers 64 tabular datasets (16 for AGORA with 100 additional compounds, 16 for AGORA with 50 additional compounds, 16 for CARVEME with 100 additional compounds, 16 for CARVEME with 50 additional compounds), which were constructed by studying more than 10 000 pairs of microbes via Flux Balance Analysis. Our collection could be investigated by four machine learning frameworks. The code underlying the metabolic modeling process is available here. Running Matlab code requires Gurobi Academic License. Logo

Repository structure

Getting started

Download the data from our HugginFace repo: https://huggingface.co/datasets/powidla/Friend-Or-Foe

from huggingface_hub import hf_hub_download
import pandas as pd

REPO_ID = "powidla/Friend-Or-Foe"

# File paths within the repo
X_train_ID = "Classification/AGORA/100/BC-I/X_train_BC-I-100.csv"
X_val_ID = "Classification/AGORA/100/BC-I/X_val_BC-I-100.csv"
X_test_ID = "Classification/AGORA/100/BC-I/X_test_BC-I-100.csv"

y_train_ID = "Classification/AGORA/100/BC-I/y_train_BC-I-100.csv"
y_val_ID = "Classification/AGORA/100/BC-I/y_val_BC-I-100.csv"
y_test_ID = "Classification/AGORA/100/BC-I/y_test_BC-I-100.csv"

# Download and load CSVs as pandas DataFrames
X_train = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_train_ID, repo_type="dataset"))
X_val = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_val_ID, repo_type="dataset"))
X_test = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_test_ID, repo_type="dataset"))

y_train = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_train_ID, repo_type="dataset"))
y_val = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_val_ID, repo_type="dataset"))
y_test = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_test_ID, repo_type="dataset"))

Baseline Demo Notebooks

Quickstart notebook

We provide an end-to-end example on how to predict competitive and cooperative interactions with TabNet.

Examples

The notebooks contain a simple example of using baseline models for predicting microbial interactions.

Reproducing the results

To execute the lines below for Supervised models data path should be organized as follows

FOFdata/<Task>/<Collection>/<Group>/<Dataset>/csv/<name>.csv

For example,

FOFdata/Regression/CARVEME/50/GR-III/csv/X_train_GR-III.csv

Scripts below assume that after creating FOFdata folder the above structure holds.

Supervised models

TabM

To train and test TabM we followed an example. We donwloaded the data into FOFdata folder.

mamba env create -f tabm.yaml
mkdir FOFdata
python main.py 

FT-Transformer

To train and test FT-Transformer we followed an example.

mamba env create -f ft.yaml
mkdir FOFdata
python main.py 

TabNet

To train and test TabNet we followed instructions from the package.

mamba env create -f tabnet.yaml
mkdir FOFdata
python main.py 

GBDTs

We evaluate XGBoost, LightGBM and Catboost as our baselines here.

mamba env create -f gbdts.yaml
mkdir FOFdata
python main.py 

Unsupervised models

mamba env create -f uns.yaml
mkdir FOFdata
python main.py 

Generative models

TVAE, CTGAN and TabDDPM

To test TVAE, CTGAN and TabDDPM we used synthcity package and adapted officially provided examples. We calculated $\alpha$-Precision and $\beta$-Recall by using eval statistical from synthcity.metrics.

mamba env create -f synthcity.yaml
cd FOFdata
python main.py --tvae
python main.py --ctgan
python main.py --ddpm

TabDiff

To train and test TabDiff we followed the guidelines. The example we used for the AGORA50 dataset is below

git clone https://github.com/MinkaiXu/TabDiff
mamba env create -f tabdiff.yaml
cd data
mkdir GenAGORA50
python process_dataset.py --dataname GenAGORA50
python main.py --dataname GenAGORA50 --mode train --no_wandb --non_learnable_schedule --exp_name GenAGORA50

Alternative way is to skip preprocessing by downloading files from here.

To evaluate and calculate metrics

mamba env create -f synthcity.yaml
cd Info
cp info.json
python main.py --dataname GenAGORA50 --mode test --report --no_wandb

License

FriendOrFoe is under the Apache 2.0 license for code found on the associated GitHub repo and for the data hosted on HuggingFace. The LICENSE file for the repo can be found in the top-level directory.