18 Nov 2021 - Datathon
Last year when I was preparing Huawei Cloud for our annual Singapore Healthcare AI Datathon event, I did not think of writing a summary, which could have greatly helped the setup this year.
This year, I am setting up AWS as one of the cloud environments, so I decide to write down the process.
We provide one EC2 for every team to carry out their computing tasks.
First of all, AWS has a default vCPU limit for every account, which in our case is not enough. A limit increase needs to be requested. Here is the user guide for this.
To pick the right instance type, there are a few considerations.
In this year’s datathon, we choose AWS’s g4dn.12xlarge instance that has 4* NVIDIA T4 GPUs, 48 vCPUs and 196GB system memory. g4dn.12xlarge also comes with 900 GB NVMe SSD storage that can be mounted. Take note that this NVMe SSD storage will be deleted when the instance is stopped, so don’t keep any data (except for temporary ones) inside unless the instance will be kept running until the end of datathon. For teams who are not using GPU, we choose m5.8xlarge instances.
Launching the first instance (example of GPU instance)
g4dn.12xlargeSetting up the environment
sudo apt updatesource activate python3 && pip install jupyterlabconda install -c conda-forge r-irkernelmimic-cxr-jpg dataset directly to the machineAfter S3 is set up
AmazonS3ReadOnlyAccess and AmazonSSMManagedInstanceCore and associate to the EC2 in the consoleAfter RDS is set up
Backup the instance
Launching more instances
We build our EHR databases on RDS for every team to query from. This year, we provide MIMIC-IV with ED and CXR, as well as eICU.
Preparing a backup of the databases
pg_dump each database into a single .sql file
eicu database in our server does not have views, a separate dump from eicu-views.pgsql needs to be importedCreate an EC2
postgresql & postgresql-client on the EC2 to enable psql command (example for PostgreSQL 12)scp the .sql backup files from your own machine to the EC2Create an RDS
pgadmin4 by SSH tunneling using the EC2Importing the database backup from EC2 (based on official guide from AWS)
psql from the EC2Roles management
team and teammaster roles, grant necessary priviledgesWe put our medical imaging datasets in S3 for teams to download if they need. (reference)
AmazonS3FullAccess, download the access keyaccess key ID and secret access key from the IAM role created with AmazonS3FullAccess (default region name: ap-southeast-1, default output format: json)aws s3 cp local_file s3://bucket_name