FriendOrFoe is a collection of environmental datasets obtained from metabolic modeling of microbial communities AGORA and CARVEME. FriendOrFoe gathers 64 tabular datasets (16 for AGORA with 100 additional compounds, 16 for AGORA with 50 additional compounds, 16 for CARVEME with 100 additional compounds, 16 for CARVEME with 50 additional compounds), which were constructed by studying more than 10 000 pairs of microbes via Flux Balance Analysis. Our collection could be investigated by four machine learning frameworks. The code underlying the metabolic modeling process is available here. Running Matlab code requires Gurobi Academic License.
.json
files with final metrics.json
files for the experimentsDownload the data from our HugginFace repo: https://huggingface.co/datasets/powidla/Friend-Or-Foe
from huggingface_hub import hf_hub_download
import pandas as pd
REPO_ID = "powidla/Friend-Or-Foe"
# File paths within the repo
X_train_ID = "Classification/AGORA/100/BC-I/X_train_BC-I-100.csv"
X_val_ID = "Classification/AGORA/100/BC-I/X_val_BC-I-100.csv"
X_test_ID = "Classification/AGORA/100/BC-I/X_test_BC-I-100.csv"
y_train_ID = "Classification/AGORA/100/BC-I/y_train_BC-I-100.csv"
y_val_ID = "Classification/AGORA/100/BC-I/y_val_BC-I-100.csv"
y_test_ID = "Classification/AGORA/100/BC-I/y_test_BC-I-100.csv"
# Download and load CSVs as pandas DataFrames
X_train = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_train_ID, repo_type="dataset"))
X_val = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_val_ID, repo_type="dataset"))
X_test = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_test_ID, repo_type="dataset"))
y_train = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_train_ID, repo_type="dataset"))
y_val = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_val_ID, repo_type="dataset"))
y_test = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_test_ID, repo_type="dataset"))
We provide an end-to-end example on how to predict competitive and cooperative interactions with TabNet.
The notebooks contain a simple example of using baseline models for predicting microbial interactions.
To execute the lines below for Supervised models data path should be organized as follows
FOFdata/<Task>/<Collection>/<Group>/<Dataset>/csv/<name>.csv
For example,
FOFdata/Regression/CARVEME/50/GR-III/csv/X_train_GR-III.csv
Scripts below assume that after creating FOFdata
folder the above structure holds.
To train and test TabM we followed an example. We donwloaded the data into FOFdata
folder.
mamba env create -f tabm.yaml
mkdir FOFdata
python main.py
To train and test FT-Transformer we followed an example.
mamba env create -f ft.yaml
mkdir FOFdata
python main.py
To train and test TabNet we followed instructions from the package.
mamba env create -f tabnet.yaml
mkdir FOFdata
python main.py
We evaluate XGBoost, LightGBM and Catboost as our baselines here.
mamba env create -f gbdts.yaml
mkdir FOFdata
python main.py
mamba env create -f uns.yaml
mkdir FOFdata
python main.py
To test TVAE, CTGAN and TabDDPM we used synthcity package and adapted officially provided examples. We calculated $\alpha$-Precision and $\beta$-Recall by using eval statistical
from synthcity.metrics
.
mamba env create -f synthcity.yaml
cd FOFdata
python main.py --tvae
python main.py --ctgan
python main.py --ddpm
To train and test TabDiff we followed the guidelines. The example we used for the AGORA50 dataset is below
git clone https://github.com/MinkaiXu/TabDiff
mamba env create -f tabdiff.yaml
cd data
mkdir GenAGORA50
python process_dataset.py --dataname GenAGORA50
python main.py --dataname GenAGORA50 --mode train --no_wandb --non_learnable_schedule --exp_name GenAGORA50
Alternative way is to skip preprocessing by downloading files from here.
To evaluate and calculate metrics
mamba env create -f synthcity.yaml
cd Info
cp info.json
python main.py --dataname GenAGORA50 --mode test --report --no_wandb
FriendOrFoe is under the Apache 2.0 license for code found on the associated GitHub repo and for the data hosted on HuggingFace. The LICENSE file for the repo can be found in the top-level directory.