Batch Imaging Workflow¶
The batch_imaging
workflow is a proof-of-concept of integrate a scientific
workflow with the SDP prototype. It simulates visibilities and images them using
RASCIL with Dask as an execution engine.
The workflow simulates SKA1-Low visibility data in a range of hour angles from -30 to 30 degrees and adds phase errors. The visibilities are then calibrated and imaged using the ICAL pipeline.
The workflow creates buffer reservations for storing the visibilities and images.
Parameters¶
The workflow parameters are:
n_workers
: number of Dask workers to deployfreq_min
: minimum frequency (in hertz)freq_max
: maximum frequency (in hertz)nfreqwin
: number of frequency windowsntimes
: number of time samplesrmax
: maximum distance of stations to include from array centre (in metres)ra
: right ascension of the phase centre (in degrees)dec
: declination of the phase centre (in degrees)buffer_vis
: name of the buffer reservation to store visibilitiesbuffer_img
: name of the buffer reservation to store images
For example:
{
"n_workers": 4,
"freq_min": 0.9e8,
"freq_max": 1.1e8,
"nfreqwin": 8,
"ntimes": 5,
"rmax": 750.0,
"ra": 0.0,
"dec": -30.0,
"buffer_vis": "buff-pb-mvp01-20200523-00001-vis",
"buffer_img": "buff-pb-mvp01-20200523-00001-img"
}
Running the workflow¶
If using Minikube, make sure to increase the memory size (minimum 16 GB):
minikube start --memory=16g
Once the sdp-prototype is running, start a iTango shell with:
kubectl exec -it itango-tango-base-sdp-prototype -- /venv/bin/itango3
First, obtain a handle to a subarray device with:
d = DeviceProxy('mid_sdp/elt/subarray_1')
Create a configuration string for the scheduling block instance. This contains
one real-time processing block, which uses the test_realtime
workflow as a
placeholder, and one batch processing block containing the batch_imaging
workflow, which uses the example parameters from above:
config_sbi = '''
{
"id": "sbi-mvp01-20200523-00000",
"max_length": 21600.0,
"scan_types": [
{
"id": "science",
"channels": [
{"count": 8, "start": 0, "stride": 1, "freq_min": 0.9e8, "freq_max": 1.1e8, "link_map": [[0,0]]}
]
}
],
"processing_blocks": [
{
"id": "pb-mvp01-20200523-00000",
"workflow": {"type": "realtime", "id": "test_realtime", "version": "0.2.0"},
"parameters": {}
},
{
"id": "pb-mvp01-20200523-00001",
"workflow": {"type": "batch", "id": "batch_imaging", "version": "0.1.0"},
"parameters": {
"n_workers": 4,
"freq_min": 0.9e8,
"freq_max": 1.1e8,
"nfreqwin": 8,
"ntimes": 5,
"rmax": 750.0,
"ra": 0.0,
"dec": -30.0,
"buffer_vis": "buff-pb-mvp01-20200523-00001-vis",
"buffer_img": "buff-pb-mvp01-20200523-00001-img"
},
"dependencies": [
{"pb_id": "pb-mvp01-20200523-00000", "type": ["none"]}
]
}
]
}
'''
The scheduling block instance is created by the AssignResources
command:
d.AssignResources(config_sbi)
You can run the subarray commands as normal, but the batch processing does not
start until you end the real-time processing with the ReleaseResources
command:
d.ReleaseResources()
You can watch the pods and persistent volume clams (for the buffer reservations) being deployed with:
watch kubectl get pod,pvc -n sdp
At this stage you should see a pod called
proc-pb-mvp01-20200523-00001-workflow-...
and the status is RUNNING
. To see
the logs, run:
kubectl logs <pod-name> -n sdp
and it should look like this:
INFO:batch_imaging:Claimed processing block pb-mvp01-20200523-00001
INFO:batch_imaging:Waiting for resources to be available
INFO:batch_imaging:Resources are available
INFO:batch_imaging:Creating buffer reservations
INFO:batch_imaging:Deploying Dask EE
INFO:batch_imaging:Running simulation pipeline
INFO:batch_imaging:Running ICAL pipeline
...
Accessing the data¶
The buffer reservations are realised as Kubernetes persistent volume claims. They should have persistent volumes created to satisfy them automatically. The name of the corresponding persistent volume is in the output of:
kubectl get pvc -n sdp
The location of the persistent volume in the filesystem is shown in the output of:
kubectl describe pv <pv-name>
If you are running Kubernetes with Minikube in a VM, you need to log in to it first to gain access to the files:
minikube ssh