Skip to Main Content

Manual for the setup of Whisper on HPC

The use of Whisper is currently being piloted by the DCC. We are trying to build a highly secure environment for you to safely use this A.I. based tool. We kindly ask for your patience while we set this up for you.

If you are interested in using this tool and you are not already in contact with the DCC, please feel free to send us a message at dcc@rug.nl. We will put you on our waiting list and will contact you as soon as the pilot is over. Once the pilot is complete, we will make the tool available on demand.

Introduction

This guide takes you through the steps to set up a personal system of speech-to-text transcription on University of Groningen infrastructure (for UG staff and students) on the basis of the OpenAI Whisper automatic speech recognition (ASR) model running on the Hábrók High Performance Computing (HPC) cluster.

The process of transcribing spoken audio to text is usually a very time consuming manual process. The UG offers a licensed version of F4 Transkript on the University Workplace as an aid for manual transcription, but doesn't offer automatic speech recognition software.
This guide is offered by the DCC to help researchers process their research data as efficiently as possible, while optimizing data protection (keeping their audio files on UG storage instead of sending it to cloud services). For technical aspects, the service is supported by the Data Science and HPC team of the CIT.
 If you wish to read more on the detailed functionalities of Whisper, please refer to the manual in their Git repository.

Because audio is highly sensitive data, our advice is to access this tool by requesting a Virtual Research Workspace (VRW). To request a VRW capable to access Whisper, please use the form found in the linked webpage and specify that you need transcription capabilities.

(N.B.: As stated above, we are as of now not able to give you access to the tool due to an ongoing pilot. You can still contact us at dcc@rug.nl and we will put you on our waiting list. When the pilot is over, we will contact you again to onboard you.)

Should you have any further questions on the use or initial set up of Whisper on Hábrók HPC, please contact the DCC at dcc@rug.nl.

Setting up on the VRW and HPC

The guide below will take you through some basic steps to start running automatic transcription jobs on the high performance computing cluster from a Windows computer. In short it entails:

  • In the same way, you should also create an output folder in the same place by inputting:
    • mkdir $HOME/whisper_output
  • If this step of the installation succeeded, then you should be able to display the new folders in MobaXterm as shown in the figures below.

  • You can also use the left-hand folder navigation to check if the new folders are there.

Build a Virtual Environment and install Whisper

This step is only needed for the first time you set up Whisper. After you have installed the program for the first time, you can skip directly to the next part of the guide to run the program.

When logged in to your session in the Mobaxterm terminal, or in any other terminal, you will have a prompt where you can enter commands. In order to run Whisper, you will need to create the proper environment in your HPC session. To do so, copy the grey highlighted lines below one by one into your terminal and run them separately.

Note: To copy text into the terminal, ctrl+V will not work. Use either the right mouse click, then select paste from the drop-down menu or, if you have a mouse wheel, click on the terminal with the mouse wheel to paste the text directly.

Steps to follow to install whisper:

  • First, you need to load a module that whisper will need to run. To do, copy-paste the line highlighted in grey below into the terminal, as shown in the figure.

module load PyTorch/1.12.1-foss-2022a-CUDA-11.7.0

  • Then you need to create the virtual environment where you will install whisper. Copy-paste the line below into the terminal.

python3 -m venv $HOME/.envs/whisper

  • Now, activate the newly created environment, by copy-pasting the line below.

source $HOME/.envs/whisper/bin/activate

  • Before you install whisper, you need to make sure to have the latest version of some programs. Copy the two lines below separately into the terminal, as shown in the figures:

pip install --upgrade pip


pip install --upgrade wheel

  • Finally, you can install whisper by running the command below.

pip install git+https://github.com/openai/whisper.git 

  • If everything went well, this is the screen you expect to see.

  • As a final step, type deactivate into the terminal, then press "enter". After this initial installation, you won't be needing to manually activate the whisper environment anymore.

  • If you wish to fully close the environment and also close the HPC session directly, type exit instead of deactivate.

Note: The version numbers displayed in this guide for the programs you have installed and upgraded reflect the most recent versions at the time this guide was written. The numbers might change in your case, as newer versions might have been released in the time since.

 

Create the script to run the transcription job

We will run Whisper using a script in order to facilitate the use of the tool. Follow the steps here to set up the script and run it, read the next section of the guide to learn more about the content of the script file itself.

In order to run the script, you will first have to create it. Open your text editor of choice and copy the grey highlighted code below into the new file. Save the file with the name: whisper_runall.sh.


#!/bin/bash
#SBATCH --time=08:00:00
#SBATCH --gpus-per-node=1
#SBATCH --mem=16000

module load PyTorch/1.12.1-foss-2022a-CUDA-11.7.0
source $HOME/.envs/whisper/bin/activate
whisper $HOME/whisper_audio/* --model large-v2 --output_dir $HOME/whisper_output/

__

You can now close the editor.

The example below uses the "vi" text editor found in HPC to create the script. Follow the instructions in the figures to create the script using this text editor:

  • Type or copy vi whisper_runall.sh into the terminal, then press "enter". This will create a new and empty file that you can edit. The terminal is going to change what is displayed when "vi" starts.

  • Press the "i" key on your keyboard to enable editing of the file. Check the figure to know if the editor is in the correct mode.

  • Click on the terminal with the mouse wheel to paste the content of the script into the file. If you see the message in the figure displayed, click "OK" to complete the pasting. The content of the script can be found at the top of this section.

  • Double-check that the content of the script is correct. If it is, it should look exactly like in the picture below.

Note: The colors displayed are also important, because it means that the editor recognizes the words in the text as script commands.

  • Finally, to save the file and exit from "vi", first press "esc" on your keyboard. Then type :wq directly on your keyboard. The input should be displayed at the bottom of the terminal like shown in the figure. Press enter to commit the command. The punctuation is to let the editor know that a command is coming, the w stands for "write", while the q stands for "quit".

  • If you want to make sure that the script has been saved, type ls into the terminal, then press "enter". If the script has been saved, it shuld now appear in the list of files you have in your home directory.

 

Run your transcription job

Now that you have your general script ready, all you have to do to run it, is to simply copy the command below into your terminal and press enter:

sbatch whisper_runall.sh

The terminal will then confirm that your job had been received and assign it a jobID. The three messages below will appear in your terminal. Please keep the jobID handy, as that is the quickest way to check how the job is going and if it ran successfully. The figure below shows what the terminal will look like upon successfully launching the script.

sbatch: Job sent to gpu partition
sbatch: Request for one of the default GPU types added
Submitted batch job <jobID>

Once the job is complete, you will find the transcribed audio in the output folder you specified in the batch script. 

Note: The script shown here will run Whisper on all audio files present in the folder "whisper_audio". Please make sure to only have the files that you wish to transcribe in that folder.

To check if the HPC has finished with your job, type the following:

squeue -u (your_pnumber)

In the list that will appear, look for the jobID and under the column "ST", read the letter written there. "PD" means the job is waiting for resources to be available, "R" means that the job is running, "CG" that the job is completing. If your job does not appear in the list, it means that the HPC cluster is done processing your audio. To check if the job finished successfully, type the following line then press enter, as shown in the figure below:

jobinfo <jobID>

This will show a list of all the information related to the job. What you want to check is if the "Reserved walltime" is greater than the "Used walltime", and if the "State" parameter says "COMPLETED". If it does, it means that the HPC is done processing your audio and that the job ran correctly.

Content of the batch script

The batch script you created and ran is the starting point for all your jobs relating to Whisper. Below is a brief explanation of the different lines present in the file. Please read the next steps carefully if you wish to modify the content of the script. For convenience's sake, also make sure to always run the script through "sbatch", rather than run the steps separately by hand.

#!/bin/bash

  • This first line is used to tell the cluster what it should use to interpret/run the script. Do not change!

The next three lines specify certain parameters for the batch script:

#SBATCH --time=08:00:00

  • This line specifies the maximum time your job will run on the cluster. The format is hh:mm:ss. The example asks for a maximum of 8 hours, which should be plenty of time to cover most jobs. Should you run into longer processing times, this is the parameter you want to change.

#SBATCH --gpus-per-node=1

  • This line tells the cluster that the script is asking for 1 GPU to be allocated to this job. For Whisper, 1 GPU is more than enough to run the transcription, please do not modify this parameter.

#SBATCH --mem=16000

  • This line specifies the amount of Memory/RAM asked for this job. In the default case, the script asks for 16GB of RAM to be allocated.

The next two lines make sure that the virtual environment and the dependencies that Whisper needs to run are correctly loaded:

module load PyTorch/1.12.1-foss-2022a-CUDA-11.7.0

  • This line loads the program packages that Whisper needs to run. Please be sure to not modify it, otherwise the script is not going to load the correct dependencies.

source $HOME/.envs/whisper/bin/activate

  • This line activates the virtual environment for Whisper. As it is part of the script, you won't have to deactivate the environment once the script is launched. Once again, leave this part of the script unchanged.

Finally, the last line is the actual command to run Whisper:

whisper $HOME/whisper_audio/* --model large-v2 --output_dir $HOME/whisper_output/

  • If you wish to modify the location of the input audio, then you need to specify its PATH and replace "$HOME/whisper_audio/*". Please remember to add an  at the end of the PATH to let the program know that you wish to process all files present in the folder you selected. In the same way, modify the PATH after --output_dir if you wish to change the location of the output directory. Finally, if you wish to change the language model used, you need to change the value after --model. Please consult the Whisper manual before changing the model.

Manage your data

Last but not least, it is important that you ensure proper management of your data. HPC is a computing cluster and therefore only intended for the short term storage of mutable data. In order to ensure proper performance, data should not be stored long-term in the cluster. Therefore, after you have processed your files, you should move your processed data to a more suitable location (like University network storage), where you can perform quality control and any text editing. The source audio files and their transcription are considered potentially sensitive data and should not be retained on HPC, but kept in a secure location for future reference. 

MobaXterm data transfer workaround

Sometimes, when attempting to transfer data to or from the Hábrók HPC cluster using MobaXterm, the transfer is registered, but the progress bar stays at 0% and the transfer never starts. There is an easy workaround to resolve this issue described in the steps below.

  1. If you are already signed into Hábrók, the first step is to close the connection. To do so, type exit into the terminal, then press the enter button twice.
  2. Right click the recorded session, then select "Edit session".

  1. Navigate the the Advanced SSH settings, and change the transfer protocol to "SCP (enhanced speed).

  1. Confirm your changes, then log back into HPC by double-clicking the session.

  1. Log back out of Hábrók without doing anything else. To do so, type exit into the terminal, then press the enter button twice.
  2. Repeat steps 2-4, but change to transfer protocol back to "SFTP protocol".

  1. Log back into Hábrók.
  2. Your transfer should now work.

{{subjectTitle}}

{{subjectGuides}}

Questions? Ask the experts: