DataCatalog: tigapics

import requests
import pandas as pd

import src.utils as ut

# Setup the root path of the application
project_path = ut.project_path()

# Load the metadata

meta_filename = [
    f"{ut.project_path(1)}/meta/mosquito_alert/tigapics.json",
    f"{ut.project_path(2)}/meta_ipynb/tigapics.html",
]
metadata = ut.load_metadata(meta_filename)

# Get contentUrl from metadata file
ut.info_meta(metadata)

Dataset: tigapics_mosquitoalert

1. Distribution by image download from MosquitoAlert webserver

This distribution allows to download individual pictures (adults and sites) that can be viewed at the Mosquito Alert map webserver given a picture IID.

# Get metadata
contentUrl, dataset_name, distr_name = ut.get_meta(
    metadata, idx_distribution=0, idx_hasPart=0
)

# Make folders for data download
path = f"{project_path}/data/{dataset_name}/{distr_name}"
ut.makedirs(path)

In order to get a picture from the Mosquito Alert public map we should know its file-name (ID hash and file extension). The simplest way to get one is to check visually the map, for example as given below:

# Set up an picture ID file-name and get the relative URL address
ID_PICNAME = "a67c2ad2-09b6-4dbe-9cd0-2536a91e17f3.jpg"
contentUrl_pic = contentUrl.format(ID_PICNAME=ID_PICNAME)

Note that this particular ID corresponds to the below displayed mosquito adult picture available on the Mosquito Alert public map.

Aedes albopictus

Another way to get a picture ID is to get the reports dataset, where the measured variable movelab_annotation is a dictionary with key photo_html that gives photo href (ID) relative to a given report.

# Download the picture and save it
r = requests.get(contentUrl_pic)
with open(f"{path}/{ID_PICNAME}", "wb") as f:
    f.write(r.content)

Dataset: tigapics_labels_bioimagearchive

1. Distribution by image download from BioStudies repository

This dataset distribution allows to download the labels that describe the tigapics_bioimagearchive dataset.

# Get metadata
contentUrl, dataset_name, distr_name = ut.get_meta(
    metadata, idx_distribution=0, idx_hasPart=1
)

# Make folders for data download
path = f"{project_path}/data/{dataset_name}/{distr_name}"
ut.makedirs(path)
# Download the file list of labels and save it
r = requests.get(contentUrl)
with open(f"{path}/labels.tsv", "wb") as f:
    f.write(r.content)

# Get the labels into a dataframe
df_labels = pd.read_csv(f"{path}/labels.tsv", sep="\t")
df_labels.head()

Dataset: tigapics_bioimagearchive

1. Distribution by single image download from the BioImage Archive repository

This distribution allows to download individual pictures of mosquito adults useful for machine-learning classification tasks.

# Get metadata
contentUrl, dataset_name, distr_name = ut.get_meta(
    metadata, idx_distribution=0, idx_hasPart=2
)

# Make folders for data download
path = f"{project_path}/data/{dataset_name}/{distr_name}"
ut.makedirs(path)

In order to get a picture from the Mosquito Alert BioImage Archive we should know the relative species and file-name ID labels. This information is provided by the tigapics_bioimagearchive_labels dataset. For example, we just take the first entry of this label dataset.

# Set up the picture ID and get the relative URL address
CLASS, ID_PICNAME = df_labels.iloc[0]["Files"].split("/")
contentUrl_pic = contentUrl.format(CLASS=CLASS, ID_PICNAME=ID_PICNAME)
# Download the picture and save it
r = requests.get(contentUrl_pic)
with open(f"{path}/{ID_PICNAME}", "wb") as f:
    f.write(r.content)

2. Distribution by image-chunks download from BioStudies repository

This distribution allows to download pictures of mosquito adults in chunks by classes that may correspond to species.

# Get metadata
contentUrl, dataset_name, distr_name = ut.get_meta(
    metadata, idx_distribution=1, idx_hasPart=2
)

# Make folders for data download
path = f"{project_path}/data/{dataset_name}/{distr_name}"
ut.makedirs(path)

In order to get a picture from the Mosquito Alert BioImage Archive we should know the relative classes available. One may chose between the follwing classes: Aedes_albopictus, Aedes_aegypti, Aedes_japonicus, Aedes_koreicus, Japonicus_koreicus, Complex, Culex, Other_species and Not_sure.

# Set up the class to download
CLASS = "Aedes_japonicus"
contentUrl_pic = contentUrl.format(CLASS=CLASS)
# Download the picture and save it
r = requests.get(contentUrl_pic)
with open(f"{path}/{CLASS}.zip", "wb") as f:
    f.write(r.content)