Outputs Overview¶
This page gives details on the output files that the pipeline writes to disk.
Pipeline Run Output Overview¶
The output for a pipeline run will be located in the pipeline working directory, which is defined at the pipeline configuration stage (see Pipeline Configuration). A sub-directory will exist for each pipeline run that contains the output products for the run.
Note
If you do not administrate your system or do not have access to a vast-tools
notebook interface, please contact your system admin to confirm the working directory and how to best access the files.
The pipeline uses the Apache Parquet file format to write results to disk. Details on how to read these files can be found below in Reading the Outputs.
Below is the output structure for a pipeline run named new-test-data
when the pipeline run option measurements.write_arrow_files
has been set to True
and the working directory is named pipeline-runs
(see File Details for descriptions):
pipeline-runs
├── new-test-data
│  ├── associations.parquet
│  ├── bands.parquet
│  ├── config.yaml
│  ├── config_prev.yaml
│  ├── forced_measurements_VAST_0127-73A_EPOCH01_I_cutout_fits.parquet
│  ├── forced_measurements_VAST_0127-73A_EPOCH05x_I_cutout_fits.parquet
│  ├── forced_measurements_VAST_0127-73A_EPOCH06x_I_cutout_fits.parquet
│  ├── forced_measurements_VAST_2118+00A_EPOCH01_I_cutout_fits.parquet
│  ├── forced_measurements_VAST_2118+00A_EPOCH02_I_cutout_fits.parquet
│  ├── forced_measurements_VAST_2118+00A_EPOCH03x_I_cutout_fits.parquet
│  ├── forced_measurements_VAST_2118+00A_EPOCH05x_I_cutout_fits.parquet
│  ├── forced_measurements_VAST_2118+00A_EPOCH06x_I_cutout_fits.parquet
│  ├── forced_measurements_VAST_2118-06A_EPOCH01_I_cutout_fits.parquet
│  ├── forced_measurements_VAST_2118-06A_EPOCH02_I_cutout_fits.parquet
│  ├── forced_measurements_VAST_2118-06A_EPOCH03x_I_cutout_fits.parquet
│  ├── forced_measurements_VAST_2118-06A_EPOCH05x_I_cutout_fits.parquet
│  ├── forced_measurements_VAST_2118-06A_EPOCH06x_I_cutout_fits.parquet
│  ├── forced_measurements_VAST_2118-06A_EPOCH12_I_cutout_fits.parquet
│  ├── images.parquet
│  ├── log.txt
│  ├── measurements.arrow
│  ├── measurement_pairs.arrow
│  ├── measurement_pairs.parquet
│  ├── relations.parquet
│  ├── skyregions.parquet
│  └── sources.parquet
Arrow Files¶
Large pipeline runs (hundreds of images) mean that to read the measurements, hundreds of parquet files need to be read in, and can contain millions of rows. This can be slow using libraries such as pandas, and also consumes a lot of system memory. A solution to this is to save all the measurements associated with the pipeline run into one single file in the Apache Arrow format.
The library vaex
is able to open .arrow
files in an out-of-core context so the memory footprint is hugely reduced along with the reading of the file being very fast. The two-epoch measurement pairs are also saved to arrow format due to the same reasons. See Reading with vaex for further details on using vaex
.
Note
At the time of development vaex
could not open parquets in an out-of-core context. This will be reviewed in the future if such functionality is added to vaex
.
Tip
The arrow files can be generated after a run has successfully completed (must be done by an administrator, refer to the admin command createmaeasarrow
).
To enable the arrow files to be produced, the option measurements.write_arrow_files
is required to be set to True
in the pipeline run config.
Image Data¶
The data for the images ingested into the pipeline is also stored in the pipeline working directory under the subdirectory images
:
pipeline-runs
├── images
│  ├── VAST_0127-73A_EPOCH01_I_cutout_fits
│  │  └── measurements.parquet
│  ├── VAST_0127-73A_EPOCH05x_I_cutout_fits
│  │  └── measurements.parquet
│  ├── VAST_0127-73A_EPOCH06x_I_cutout_fits
│  │  └── measurements.parquet
│  ├── VAST_2118+00A_EPOCH01_I_cutout_fits
│  │  └── measurements.parquet
│  ├── VAST_2118+00A_EPOCH02_I_cutout_fits
│  │  └── measurements.parquet
│  ├── VAST_2118+00A_EPOCH03x_I_cutout_fits
│  │  └── measurements.parquet
│  ├── VAST_2118+00A_EPOCH05x_I_cutout_fits
│  │  └── measurements.parquet
│  ├── VAST_2118+00A_EPOCH06x_I_cutout_fits
│  │  └── measurements.parquet
│  ├── VAST_2118-06A_EPOCH01_I_cutout_fits
│  │  └── measurements.parquet
│  ├── VAST_2118-06A_EPOCH02_I_cutout_fits
│  │  └── measurements.parquet
│  ├── VAST_2118-06A_EPOCH03x_I_cutout_fits
│  │  └── measurements.parquet
│  ├── VAST_2118-06A_EPOCH05x_I_cutout_fits
│  │  └── measurements.parquet
│  ├── VAST_2118-06A_EPOCH06x_I_cutout_fits
│  │  └── measurements.parquet
│  └── VAST_2118-06A_EPOCH12_I_cutout_fits
│  └── measurements.parquet
Here, for each image, the selavy measurements that have been ingested are stored in the parquet format under a subdirectory of the respective image name.
File Details¶
File | Description |
---|---|
associations.parquet | Contains the association information between sources and measurements. |
bands.parquet | Contains the information of the bands associated with the pipeline run. |
config.yaml | The pipeline run configuration file. |
config_prev.yaml | The previous pipeline run configuration file used by the add image mode. |
forced_measurements*.parquet | Multiple files that contain the forced measurements extracted from the respective image denoted in the filename. |
images.parquet | Contains the information of the images processed in the pipeline run. |
log.txt | The log file of the pipeline run. |
measurements.arrow | An Apache Arrow format file containing all the measurements associated with the pipeline run (see Arrow Files). |
measurement_pairs.arrow | An Apache Arrow format file containing all the measurement pair metrics (see Arrow Files). |
measurement_pairs.parquet | Contains all the measurement pairs metrics. |
relations.parquet | Contains the relation information between sources. |
skyregions.parquet | Contains the sky region information of the pipeline run. |
sources.parquet | Contains all the sources resulting from teh pipeline run. |
Created: March 15, 2021