Befehle des Menübands überspringen
Zum Hauptinhalt wechseln
SharePoint

Skip Navigation LinksMephisto

​​​​​Mephisto GPU Cluster

Mephisto GPU Cluster Hardware Configuration

  • Network Configuration

    • 1x Mellanox 18 Port QDR 40Gb/s InfiniBand Switch MIS5023Q-1BFR
    • 1x TP-Link 24 Port Gigabit Ethernet Switch TL-SG1024​
  • 1x Master Node: mephisto

    • 1x Supermicro SuperServer 7046GT-TRF-FC407
    • 1x Supermicro Motherb​oard X8DTG-QF
    • 2x Intel Xeon Six-Core CPU X5650 @ 2.67GHz with 12 Cores
    • 1x 16GB DDR3 PC1333 ECC RAM​
    • 1x Samsung SSD 840 256GB Solid State Boot Disk
    • 1x Mellanox 40Gb/s QDR InfiniBand Board MHQH19B-XTR
    • 4x Nvidia GeForce GTX 480 GPU with 6GB GDDR5 RAM and 1920 Cores
    • 1x Adaptec RAID Controller 6805 with 8x 6Gb/s SATA/SAS Ports
    • 4x Western Digital RE4 2TB Hard Disk Drive: RAID5 Array with 6TB
    • 4x OCZ Vertex3 240GB Solid State Disk: RAID5 Array with 720GB
  • 5x Compute Node "Tesla": compute-0-0 . . . compute-0-4

    • 5x Supermicro SuperServer 7046GT-TRF-FC407
    • 5x Supermicro Motherboard X8DTG-QF
    • 10x Intel Xeon Six-Core CPU X5650 @ 2.67GHz with 60 Cores
    • 5x 96GB DDR3 PC1333 ECC RAM with a total of 480GB DDR3 RAM
    • 5x OCZ Vertex3 120GB Solid State Boot Disk
    • 5x Mellanox 40Gb/s QDR InfiniBand Board MHQH19B-XTR
    • 20x Nvidia Tesla C2070 GPU with 120GB GDDR5 ECC RAM and 8960 CUDA Cores
  • 1x Compute Node "Kepler": compute-0-5
    • 1x Supermicro SuperServer 7047GR-TPRF​
    • 1x Supermicro Motherboard X9DRG-QF
    • 2x Intel Xeon Eight-Core CPU E5-2650 @ 2.00GHz​ with 16 Cores
    • 1x 256GB DDR3 PC1600 ECC RAM
    • 1x OCZ Vertex4 128GB Solid State Boot Disk​
    • 1x Mellanox 40Gb/s QDR InfiniBand Board MCX353A-QCBT
    • 4x Nvidia Telsa K20 GPU with 20GB GDDR5 ECC RAM and 9984 CUDA Cores
  • 1x Compute Node "Phi": compute-0-6
    • F​
    • 1x Supermicro Motherboard X9DRG-QF
    • s
    • 1x 256GB DDR3 PC1600 ECC RAM​
    • 1x OCZ Vertex4 256GB Solid State Boot Disk​
    • 1x Mellanox 40Gb/s QDR InfiniBand Board MCX353A-QCBT
    • 1x intel Xeon Phi 5110P with 8GB GDDR5 RAM and 60 CPU Cores

Rocks Cluster Software and File System Configuration

  • rg

    • Very flexible cluster management software
    • Based on CentOS 6.3 operating system
    • Simple compute node installation and management
  • File System Configuration: NFS automount, EXT3 file system

    • 6TB RAID5 HDD array for home directories
    • 720GB RAID5 SSD array for high performance cluster directories

How to Use the GPU Cluster?

  • Get your GPU computing project approved!

    • Contact: Gundolf Haase or Manfred Liebmann
    • Email: gundolf.haase@uni-graz.at; manfred.liebmann@uni-graz.at
  • Access the GPU cluster within the University of Graz network:

    • Login: ssh username@mephisto.uni-graz.at
  • Software

    • Compiler: nvcc, g++, mpicxx, gfortran, . . .
    • Oracle Grid Engine 6.2u5: Batch processing system: qsub myjob.sh, qstat, . . .
    • Scalasca: Parallel performance analysis tool
    • OpenMPI 1.6.5 and 1.8.2, MVAPICH2 2.0, Mellanox OFED 2.2.0, Nvidia CUDA 6.0, PGI 15.1

Oracle Grid Engine Batch Processing System

  • Design goals for the queueing system

    • Maximum performance for all users!
    • Minimum interference between parallel jobs using CPU or GPU resources
    • Not trivial to achieve out of the box
  • Solution for mephisto GPU cluster

    • Queue with one slot per compute server! Only 5 slots are available on the cluster
    • One slot provides 2 Six-Core CPUs and 4 GPUs!
    • Only parallel jobs are supposed to be run on the cluster!
    • Modified submission scripts are required for good performance
    • Parallel jobs can run with full hardware utilization with fine grained control
    • High bandwidth 1.44Tb/s Infiniband switch is the only shared resource.

Account Configuration

Update the paths in the .bashrc file!

.bashrc

# CUDA & SCALASCA
export PATH=$PATH:/share/apps/cuda-6.0/bin:/share/apps/scalasca/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/share/apps/cuda-6.0/lib64:/share/apps/scalasca/lib

# PGI
export PATH=/share/apps/pgi/linux86-64/15.1/bin:$PATH
export MANPATH=$MANPATH:/share/apps/pgi/linux86-64/15.1/man
export LM_LICENSE_FILE=$LM_LICENSE_FILE:/share/apps/pgi/license.dat

# PGI MPICH
#export PATH=/share/apps/pgi/linux86-64/15.1/mpi/mpich/bin:$PATH
#export MANPATH=$MANPATH:/share/apps/pgi/linux86-64/15.1/mpi/mpich/man

# PGI MVAPICH
#export PATH=/share/apps/pgi/linux86-64/15.1/mpi/mvapich/bin:$PATH
#export MANPATH=$MANPATH:/share/apps/pgi/linux86-64/15.1/mpi/mvapich/man


Simple Submission Script for CPU Jobs

Example script directory: /share/apps/

Output:[scriptname].o[jobID]

The simple CPU submission script configures 12 CPU slots per compute node.

qsub simple.sh

#!/bin/sh -f
#$ -V
#$ -cwd
#$ -j y
#$ -pe mpi 4

while read line; do
echo $line "slots=12"
done < $TMPDIR/machines > $TMPDIR/hostfile
cat $TMPDIR/hostfile

mpirun --np 48 --hostfile $TMPDIR/hostfile ./armocpu armo16.inp


Advanced Submission Script for CPU Jobs

The advanced CPU submission script configures a MPI rank to CPU core mapping for 12 cores per compute node.

qsub cpu.sh

#!/bin/sh -f
#$ -V
#$ -cwd
#$ -j y
#$ -pe mpi 4

rank=0;
while read line; do
for i in 0 1 2 3 4 5 6 7 8 9 10 11; do
echo "rank" $rank"="$line "slot="$i
let "rank += 1"
done
done < $TMPDIR/machines > $TMPDIR/rankfile
cat $TMPDIR/rankfile

mpirun --np 48 --hostfile $TMPDIR/machines --rankfile $TMPDIR/rankfile ./armocpu armo16.inp


Submission Script for GPU Jobs

The GPU submission script configures a MPI rank to CPU core mapping for 4 CPU cores and 4 GPU boards per compute node.

qsub gpu.sh

#!/bin/sh -f
#$ -V
#$ -cwd
#$ -j y
#$ -pe mpi 4

rank=0;
while read line; do
for i in 0 3 6 9; do
echo "rank" $rank"="$line "slot="$i
let "rank += 1"
done
done < $TMPDIR/machines > $TMPDIR/rankfile
cat $TMPDIR/rankfile

mpirun --np 16 --hostfile $TMPDIR/machines --rankfile $TMPDIR/rankfile ./armogpu armo16.inp


Submission Script for PHI Jobs

Simple PHI submission script for offload mode.

qsub phi.sh

#!/bin/sh -f
#$ -S /bin/bash
#$ -V
#$ -cwd
#$ -j y
#$ -pe phi 1

./exe-offlo
ad.x


.2

The Red Hat Developer Toolset 1.1 and 2.0 is installed on the cluster and can be activated with the scl utility.

scl enable devtoolset-1.1 bash

scl enable devtoolset-2 bash


15.1

Intel Parallel Studio XE 2015 (Professional Edition)

source /share/apps/intel/bin/compilervars.sh intel64