Setting up a MPI cluster on Ubuntu involves the following steps:
1. Install OpenMPI on all machines.
$sudo apt-get install libopenmpi-dev openmpi-bin openmpi-doc
2. Installing openSSH on all machines.
$sudo apt-get install openssh-server openssh-client
3. Select one Master machine say M and others will be slaves say S.
4. Now we need to make sure M can logon to all S machines with out a password.
Create a user with same name on all machines. Say 'fawad'.
Login on M using 'fawad' and generate public/private key pair.
$ssh-keygen -t dsa
This command will generate id_dsa and id_dsa.pub files in ~/.ssh folder. id_dsa is the private key and id_dsa.pub is the public key. Now we need to keep the private key and copy the public key on to all the slaves machines.
5. Use scp to copy id_dsa.pub in the home directory of fawad on all S machines.
6. On the S machines do:
Login as fawad.
$cd ~/.ssh
$ cat id_dsa.pub >> authorized_keys
7. Now the M machine will be able to logon to all machines without password.
8. On the M machines.
Write a MPI code and save it. In my case i used the following code and saved it in hello_nodes.c
--------------hello_nodes.c--------------------
/*
A simle example program using MPI - Hello World
The program consists of one receiver process and N-1 sender
processes. The sender processes send a message consisting
of their hostname to the receiver. The receiver process prints
out the values it receives in the messages.
Compile the program with 'mpicc hello_nodes.c -o hello_nodes'
To run the program on four processors do 'mpiexec -n 4 ./hello'
*/
#include
#include
#include
int main(int argc, char *argv[]) {
const int tag = 42; /* Message tag */
int id, ntasks, source_id, err, i;
MPI_Status status;
char msg[80]; /* Message array */
err = MPI_Init(&argc, &argv); /* Initialize MPI */
if (err != MPI_SUCCESS) {
printf("MPI_init failed!\n");
exit(1);
}
err = MPI_Comm_size(MPI_COMM_WORLD, &ntasks); /* Get nr of tasks */
if (err != MPI_SUCCESS) {
printf("MPI_Comm_size failed!\n");
exit(1);
}
err = MPI_Comm_rank(MPI_COMM_WORLD, &id); /* Get id of this process */
if (err != MPI_SUCCESS) {
printf("MPI_Comm_rank failed!\n");
exit(1);
}
/* Check that we run on at least two processors */
if (ntasks < 2) {
printf("You have to use at least 2 processors to run this program\n");
MPI_Finalize(); /* Quit if there is only one processor */
exit(0);
}
/* Process 0 (the receiver) does this */
if (id == 0) {
int length;
MPI_Get_processor_name(msg, &length); /* Get name of this processor */
printf("Hello World from process %d running on %s\n", id, msg);
for (i=1; i err = MPI_Recv(msg, 80, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, \
&status); /* Receive a message from any sender */
if (err != MPI_SUCCESS) {
printf("Error in MPI_Recv!\n");
exit(1);
}
source_id = status.MPI_SOURCE; /* Get id of sender */
printf("Hello World from process %d running on %s\n", source_id, msg);
}
}
/* Processes 1 to N-1 (the senders) do this */
else {
int length;
MPI_Get_processor_name(msg, &length); /* Get name of this processor */
err = MPI_Send(msg, length+1, MPI_CHAR, 0, tag, MPI_COMM_WORLD);
if (err != MPI_SUCCESS) {
printf("Process %i: Error in MPI_Send!\n", id);
exit(1);
}
}
err = MPI_Finalize(); /* Terminate MPI */
if (err != MPI_SUCCESS) {
printf("Error in MPI_Finalize!\n");
exit(1);
}
if (id==0) printf("Ready\n");
exit(0);
}
-------------------------------------------
9. Compile the code.
$ mpicc hello_nodes.c -o hello_nodes
10. Run the code on the same machine.
fawad@fawad-virtual-machine:~$ mpirun -np 4 hello_nodes
Hello World from process 0 running on fawad-virtual-machine
Hello World from process 2 running on fawad-virtual-machine
Hello World from process 1 running on fawad-virtual-machine
Hello World from process 3 running on fawad-virtual-machine
Ready
11. Create a file called hostfile, which contains the slave computers in the MPI cluster:
fawad@fawad-virtual-machine:~$ cat hostfile
172.16.92.130
172.16.92.129
12. Copy the source code hello_mpi to all slave machines using scp.
13. Then run the following command to run the job on all slave machines:
fawad@fawad-virtual-machine:~$ mpirun --hostfile hostfile -np 10 hello_nodes
Hello World from process 0 running on ubuntu
Hello World from process 6 running on ubuntu
Hello World from process 4 running on ubuntu
Hello World from process 8 running on ubuntu
Hello World from process 3 running on fawad-virtual-machine
Hello World from process 2 running on ubuntu
Hello World from process 9 running on fawad-virtual-machine
Hello World from process 5 running on fawad-virtual-machine
Hello World from process 7 running on fawad-virtual-machine
Hello World from process 1 running on fawad-virtual-machine
Ready
DONE :).
Now the next step is that we need to try creating a NFS so that we do not have to copy the binary file all the time.
Best MPI Books i would recommend:
1. Install OpenMPI on all machines.
$sudo apt-get install libopenmpi-dev openmpi-bin openmpi-doc
2. Installing openSSH on all machines.
$sudo apt-get install openssh-server openssh-client
3. Select one Master machine say M and others will be slaves say S.
4. Now we need to make sure M can logon to all S machines with out a password.
Create a user with same name on all machines. Say 'fawad'.
Login on M using 'fawad' and generate public/private key pair.
$ssh-keygen -t dsa
This command will generate id_dsa and id_dsa.pub files in ~/.ssh folder. id_dsa is the private key and id_dsa.pub is the public key. Now we need to keep the private key and copy the public key on to all the slaves machines.
5. Use scp to copy id_dsa.pub in the home directory of fawad on all S machines.
6. On the S machines do:
Login as fawad.
$cd ~/.ssh
$ cat id_dsa.pub >> authorized_keys
7. Now the M machine will be able to logon to all machines without password.
8. On the M machines.
Write a MPI code and save it. In my case i used the following code and saved it in hello_nodes.c
--------------hello_nodes.c--------------------
/*
A simle example program using MPI - Hello World
The program consists of one receiver process and N-1 sender
processes. The sender processes send a message consisting
of their hostname to the receiver. The receiver process prints
out the values it receives in the messages.
Compile the program with 'mpicc hello_nodes.c -o hello_nodes'
To run the program on four processors do 'mpiexec -n 4 ./hello'
*/
#include
#include
#include
int main(int argc, char *argv[]) {
const int tag = 42; /* Message tag */
int id, ntasks, source_id, err, i;
MPI_Status status;
char msg[80]; /* Message array */
err = MPI_Init(&argc, &argv); /* Initialize MPI */
if (err != MPI_SUCCESS) {
printf("MPI_init failed!\n");
exit(1);
}
err = MPI_Comm_size(MPI_COMM_WORLD, &ntasks); /* Get nr of tasks */
if (err != MPI_SUCCESS) {
printf("MPI_Comm_size failed!\n");
exit(1);
}
err = MPI_Comm_rank(MPI_COMM_WORLD, &id); /* Get id of this process */
if (err != MPI_SUCCESS) {
printf("MPI_Comm_rank failed!\n");
exit(1);
}
/* Check that we run on at least two processors */
if (ntasks < 2) {
printf("You have to use at least 2 processors to run this program\n");
MPI_Finalize(); /* Quit if there is only one processor */
exit(0);
}
/* Process 0 (the receiver) does this */
if (id == 0) {
int length;
MPI_Get_processor_name(msg, &length); /* Get name of this processor */
printf("Hello World from process %d running on %s\n", id, msg);
for (i=1; i
&status); /* Receive a message from any sender */
if (err != MPI_SUCCESS) {
printf("Error in MPI_Recv!\n");
exit(1);
}
source_id = status.MPI_SOURCE; /* Get id of sender */
printf("Hello World from process %d running on %s\n", source_id, msg);
}
}
/* Processes 1 to N-1 (the senders) do this */
else {
int length;
MPI_Get_processor_name(msg, &length); /* Get name of this processor */
err = MPI_Send(msg, length+1, MPI_CHAR, 0, tag, MPI_COMM_WORLD);
if (err != MPI_SUCCESS) {
printf("Process %i: Error in MPI_Send!\n", id);
exit(1);
}
}
err = MPI_Finalize(); /* Terminate MPI */
if (err != MPI_SUCCESS) {
printf("Error in MPI_Finalize!\n");
exit(1);
}
if (id==0) printf("Ready\n");
exit(0);
}
-------------------------------------------
9. Compile the code.
$ mpicc hello_nodes.c -o hello_nodes
10. Run the code on the same machine.
fawad@fawad-virtual-machine:~$ mpirun -np 4 hello_nodes
Hello World from process 0 running on fawad-virtual-machine
Hello World from process 2 running on fawad-virtual-machine
Hello World from process 1 running on fawad-virtual-machine
Hello World from process 3 running on fawad-virtual-machine
Ready
11. Create a file called hostfile, which contains the slave computers in the MPI cluster:
fawad@fawad-virtual-machine:~$ cat hostfile
172.16.92.130
172.16.92.129
12. Copy the source code hello_mpi to all slave machines using scp.
13. Then run the following command to run the job on all slave machines:
fawad@fawad-virtual-machine:~$ mpirun --hostfile hostfile -np 10 hello_nodes
Hello World from process 0 running on ubuntu
Hello World from process 6 running on ubuntu
Hello World from process 4 running on ubuntu
Hello World from process 8 running on ubuntu
Hello World from process 3 running on fawad-virtual-machine
Hello World from process 2 running on ubuntu
Hello World from process 9 running on fawad-virtual-machine
Hello World from process 5 running on fawad-virtual-machine
Hello World from process 7 running on fawad-virtual-machine
Hello World from process 1 running on fawad-virtual-machine
Ready
DONE :).
Now the next step is that we need to try creating a NFS so that we do not have to copy the binary file all the time.
No comments:
Post a Comment