Matrix Titans
Zenix
Virtual Lab — Control Panel
lab account
Zenix
Unified System Dashboard

System Dashboard

Unified overview of Lab
Active machines
86
▲ 12 today
Users
251
▲ 8 this week
Services running
9 / 9
all components healthy
Idle resources
14
machines · found by AI agent
System services health
identity service VPN Console gateway Compute cluster · 3 nodes Cluster storage · OK SLURM · GPU queue near limit Backup service Metrics · monitoring Profile E · Big Data stack

Cluster capacity

3 nodes · Compute cluster
CPU — 656 cores412 / 656
RAM — 3.7 TB2.3 / 3.7 TB
Cluster storage — 42 TB9.1 / 42 TB
GPU — 4× NVIDIA L43 / 4 in SLURM queue

Active alerts

Alertmanager
GPU queue near limit
slurm-compute-2 · 3 jobs waiting over 20 min
Storage disk usage — 86%
Recommend checking storage datasets
SMTP not configured
Password emails not sent — manual distribution
SLURM cluster · live
Recent activity
Imported class “Big Data 2026” — 45 users
bulk-import · profile E · directory + VM + VPN + console — all ready
8 min ago
Built image profile-f-nn v1 — moved to testing channel
Image builder · yoel-admin
41 min ago
AI agent: proposal to free 14 idle machines
awaiting approval · estimated saving — 224 GB RAM
1 h ago
Backup service · nightly backup of all VMs completed
02:13 · 251 machines · zstd · retention 7d/4w/6m
7 h ago
Identity source. Creating a user runs the full pipeline: account → machine → network → access.
+ Create user
⬆ Import CSV
251 users · 7 profiles · directory
UserLoginRoleProfileMachineStatusLast login
Ron Azulairon.azulaistudentC · System Programmingvm-ron.azulai-cactivetoday, 09:14
Rivka Gadrivka.gadstudentB · Networksvm-rivka.gad-bactivetoday, 08:52
Abraham Lincolnabraham.lincolnprofessorC · System Programmingvm-abraham.lincoln-prof-cactiveyesterday, 17:30
Noa Rigelnoa.rigelstudentE · Big Datavm-noa.rigel-emachine off2 days ago
Raul Cohenraul.cohenstudentA · Operating Systemsvm-raul.cohen-aactivetoday, 10:01
Yoel MangubiymangubistudentB · Networksvm-ymangubi-bfirst login — password change
Last 100 user operations (create / reset / bulk). Passwords persist even if the browser lost the create-page render. VPN configs come from the VPN portal — admin login required there.
Time (UTC)AdminActionUser PasswordVPN · export
loading…
+ Deploy machine
86 active · 165 off · nodes compute nodes
MachineProfileNodeCPU / RAMGPUStatusUptime
vm-raul.cohen-aA · Operating Systemsnode-14 / 16 GBrunning2 h 41 min
vm-rivka.gad-bB · Networksnode-24 / 16 GBrunning1 h 18 min
vm-noa.rigel-eE · Big Datanode-12 / 8 GBoff
vm-dana.levi-fF · Neural Networksnode-34 / 16 GB1× L4 (SLURM)running52 min
yarn-masterinfrastructure · Profile Enode-14 / 16 GBinfra6 d
vm-amit.bar-cC · System Programmingnode-24 / 16 GBidle 3 days72 h
★ Key new capability — images used to be built manually via the console. Click “Create image” to walk the wizard.
+ Create image
7 images · build engine: Packer
🧱 Golden image builder
✕ Close
1 Base
2 Name
3 Packages
4 Configuration
5 Security
6 Manifest

Step 1 · OS and image base

The builder is multi-platform — an image can be Linux or Windows. Build from scratch or inherit from an existing one (the child gets all the parent’s packages and settings).

🐧 Linux · profile-a-os — Operating Systems (inherit)

Ubuntu 24.04 · desktop + dev stack · v3 · production channel

🐧 Linux · golden-base — clean Ubuntu 24.04 LTS

Bare minimum, no desktop

🪟 Windows · win-11-base — Windows 11

Windows ISO + autounattend.xml · RDP access via gateway

🪟 Windows · win-server-base — Windows Server 2025

For server and engineering courses

Step 2 · Image name and code

The code is used as the profile identifier when provisioning machines.

Step 3 · Packages

The catalog below is a set of suggestions, not a restriction. Any program a lecturer asks for can be added in the field below. Parent packages are already included.

python3CUDA Toolkit cuDNNPyTorch TensorFlowJupyterLab scikit-learnpandas WiresharkDocker VS CodeMATLABLibreOffice

Step 4 · Configuration

Core workspace integrations.

NFS mounts

/data/datasets and /data/results from storage

SLURM client

Access to the GPU queue for courses F/G

Auto-start JupyterLab on login

Working environment opens straight in the browser

Step 5 · Security

The baseline hardening profile is applied automatically — it cannot be disabled (protection against human error).

Baseline hardening profile — per image OS

Linux: linux-baseline (SSSD enumerate=false, GDM, suspend, hidden local admin, sudoers…). Windows: windows-baseline (GPO / Local Security Policy). Applied automatically.

Smoke test after build

Boot and core-service check before publishing

Step 6 · Image manifest

The final declarative spec. Versioned, reproducible, can be edited later.

image: profile-f-nn version: 1 os_family: linux # linux | windows base: profile-a-os description: "Neural Networks BSc — GPU course" packages: - python3 - cuda-toolkit - cudnn - python3-torch - python3-tensorflow config: nfs_mounts: true slurm_client: true jupyter_autostart: false hardening_profile: baseline-v3 smoke_test: true channel: testing
← Back
Step 1 of 6
Next →
A

profile-a-os

Operating Systems
v3base: golden-base28 packages
production channel
✎ Edit⧉ Versions
B

profile-b-networks

Computer Networks
v2base: profile-a-os34 packages
production channel
✎ Edit⧉ Versions
E

profile-e-bigdata

Big Data Engineering
v2base: golden-base51 packages
production channel
✎ Edit⧉ Versions
Course directories on /lab-data/courses/<faculty>/<code>/. Each course gets materials/assignments/submissions[+datasets for E/F/G] with group ACLs (course-<faculty>-staff / -students). Students of profile-<x> auto-enrolled in the faculty's student group on VM creation.
+ Create course
Loading…
Faculty Code Path Datasets Actions
Loading…
Access is managed in two layers: access policy (who can log into which machine) + virtualization RBAC (who manages which VM). The dashboard merges them into one matrix.
+ Create role
One change applies to all identities, networks and machines

Who can do what

visual RBAC editor
CapabilityStudentProfessorAdministrator
Log into own machine (HBAC)
Log into another student’s machine
Power-manage own machine
Create / delete users
Build golden image
Access infrastructure VMs
View audit log
“Student” role
218
users
“Professor” role
12
users
“Administrator” role
3
users
★ Unified overview of the whole lab system. Each service is a real component of the lab, visible and managed from one dashboard.
Platform services
🔑

Identity service

identity-server · VM 101
Identity directory: LDAP, Kerberos, DNS, password policies, HBAC, sudo.
251 identities · 3 groupsOK
🛡️

VPN

gateway · VM 102
Remote access + firewall role segmentation (students / professors / admin).
251 peers · endpoint 213.8.245.150OK
🖥️

Console gateway

VM 106
VDI gateway: desktop in the browser, SSO, no heavy client.
86 active sessionsOK
⚙️

Compute cluster

compute nodes
Virtualization cluster: 3 nodes, HA for core VMs, quorum 3/3.
251 VMs · 656 cores · 3.7 TB RAMOK

SLURM + GPU

slurm-login + 2 compute · 4× L4
GPU queue for courses F/G. GPU access via srun/sbatch, not direct passthrough.
3 / 4 GPU in use · 3 jobs queuedload
💾

Cluster storage

storage host
Storage: cluster storage for VM disks (42 TB) + NFS file storage for home folders (101 TB).
Cluster storage OKOK
🗄️

Backup service

backup service · VM 200
Backups: daily 02:00, all VMs, retention 7d/4w/6m.
Last backup 02:13 · successOK
📊

Monitoring stack

monitor · VM 103
Monitoring: 31 scrape targets, 10 alert rules, 4 dashboards.
31 / 31 targets · 2 alertsOK
🧩

Profile E · Big Data

VM 220–225 · 6 machines
Big Data stack: HDFS/YARN, 2× Spark worker, Kafka, Airflow, DataHub.
6 / 6 services runningOK
↗ Open service Web UI ⟳ Restart 📄 Logs
Network access: VPN terminates on gateway VM 102, traffic is segmented by firewall per role. Student machines run on the internal network.

VPN server

running
Endpoint: 213.8.245.150:51820
Tunnel: 10.200.0.0/22
Peers: 251 · active now: 86

Firewall · firewall

drop policy
Rules: 28
Role groups: admins profs students
A student sees over SSH only their own VM

internal network · machine network

OK
Network: 172.16.0.0/16 (stunet)
Type: isolated · spans 3 nodes
DHCP/DNS: Identity service 10.11.150.10
VPN peers
PeerVPN addressRole / segmentAccessStatus
ron.azulai10.200.1.21studentsportal · own VM · DNShandshake 2 min ago
abraham.lincoln10.200.1.9professorsportal · all student VMs · SLURMhandshake 8 min ago
yoel-admin10.200.1.7administratorfull accesshandshake 1 min ago
noa.rigel10.200.1.34studentsportal · own VM · DNSnot connected 2 days

cluster storage

HEALTH_OK
VM disks9.1 / 42 TB
9 OSD · replication ×2 · cluster-wide

NFS · lab-data

ONLINE
Home folders + datasets2.2 / 101 TB
RAIDZ2 · NFS exports for machines

NFS exports

active
/student-home — home folders
/datasets — course data (ro)
/results — work results
Backups · Backup service
JobScheduleVolumeRetentionLast runStatus
backup-daily · all VMsdaily 02:00251 VMs · zstd7d / 4w / 6mtoday 02:13success
restore-test · verificationweekly1 VM sampled2 days agopassed · 745 MB/s
▶ Run backup now ⟲ Restore from backup
Long operations (bulk import, image build) run as asynchronous jobs — the portal does not freeze.
completedbulk-import · class “Big Data 2026”8 min ago
[10:31:02] start · profile E · 45 CSV rows [10:34:48] directory: 45 identities created [10:38:11] 45 machines deployed · 45 VPN peers · 45 console gateway connections [10:38:12] ✓ done — 45 workspaces in 7 min 10 s
completedimage-build · profile-f-nn v141 min ago
[09:52] cloning base profile-a-os [09:56] installing packages: cuda-toolkit, torch, tensorflow [10:03] applied hardening profile baseline-v3 [10:06] ✓ image built · testing channel · smoke test passed
Live monitoring dashboards embedded via proxy. Use ↗ links at bottom for the full UI.
Scrape targets
loading…
Active alerts
loading…
Compute nodes online
loading…
Push notifications
in development · Phase 2
Hypervisor Node Exporter cluster storage Blackbox GPU / SLURM Active alerts ()
Big Data module (Spark/YARN/HDFS, Kafka, Airflow, DataHub).
YARN nodes
loading…
CPU cores
loading…
RAM
loading…
HDFS capacity
loading…

Running Spark applications

loading…

HDFS storage

NameNode
loading…

HDFS browser

Go
click Go to browse

Airflow recent DAG runs

loading…

Kafka topics

loading…

Kafka UI (AKHQ)

embedded

Spark History Server

completed Spark applications

Data catalog (DataHub)

embedded
Cluster Queue B.2.2 Reservations B.2.3 Nodes B.2.4 Notebooks B.2.5 History B.2.6 QOS & Accounts B.2.7 Containers NEW Settings NEW
Partitions
UP / total
Nodes
idle / total
Active jobs
in queue
Reservations
active
Partitions
loading partitions…
Nodes
loading nodes…
Aggregated capacity
loading aggregates…
Live GPU usage NEW
loading GPU metrics…
Refresh
Idle machines
14
CPU < 5% over 3 days
Orphaned resources
6
VPN peers without a machine
Can be reclaimed
224 GB
RAM · + 56 CPU cores

Resource reclamation report

generated by AI agent · weekly
ObjectObservationProposal
vm-amit.bar-cidle 3 days · CPU 1%power offApprove
11 student machines (course A — finished)no session opened for 9 dayspower off groupApprove
vm-dana.test-e16 GB allocated · 3 GB usedshrink to 8 GBApprove
6 VPN peersmachine deleted, peer remainsdelete peersApprove
⬇ Export
TimeWhoActionObjectResult
10:38:12yoel-adminbulk-importclass “Big Data 2026”success · 45
10:06:44yoel-adminimage-buildprofile-f-nn v1success
09:21:03AI agentreset-password (auto)ron.azulaisuccess
08:55:17adminrbac-modify“Professor” rolesuccess
02:13:40systembackup-daily251 VMssuccess
yesterday 14:02AI agentidle-scanwhole cluster14 findings
The AI agent is the platform’s last layer (Phase 6). It works on a “proposes — human approves” model. This is a preview.
🟢 Auto

Low risk, allowed by policy — acts itself, writes to audit

🟡 Proposal

Medium risk — prepares a proposal, waits for approval

🔴 Escalation

Complex case — hands to a human with a summary

Incoming proposals PREVIEW
🟡 proposalFree 14 idle machinescluster scan
14 machines with CPU load < 5% for over 3 days; no session in 9 days. Frees ≈ 224 GB RAM and 56 cores. Machines tagged keep-alive are excluded.
✓ Approve✕ Reject
🟡 proposalRequest: “need a profile B machine for a new student”email · 11:04
Classified as “onboarding”. Requester — professor of course B (identity verified). Pipeline ready: identity + profile-b machine + VPN + access. Run it?
✓ Approve✕ Reject
🟢 done autoPassword reset · ron.azulai09:21
Repetitive low-risk request, identity verified — executed automatically within policy. New password sent to the user, action recorded in audit.
loading modules…
Your personal workspace VM. Click «Open console» to launch the console gateway HTML5 client; close the browser tab when done — the VM keeps running so your work and files persist (mounted via NFS — they survive even if the VM gets re-provisioned).
loading…
Refresh Change password
★ Instructor view. Sees only their own course — can open a course, upload a class roster CSV, see their students. Infrastructure tabs are not available to them.
My course
B · Networks
Itzhak Nudler
Students on the course
50
all workspaces provisioned
Machines active now
18
of 50
+ Open new course
⬆ Upload roster CSV
scope: instructor’s courses only
StudentLoginMachineStatusLast login
Rivka Gadrivka.gadvm-rivka.gad-bactivetoday, 08:52
Yoel Mangubiymangubivm-ymangubi-bfirst login
Dana Levidana.levivm-dana.levi-bmachine offyesterday, 14:03