DataCatalog: tigapics
Contents
DataCatalog: tigapics¶
import requests
import pandas as pd
import src.utils as ut
# Setup the root path of the application
project_path = ut.project_path()
# Load the metadata
meta_filename = [
f"{ut.project_path(1)}/meta/mosquito_alert/tigapics.json",
f"{ut.project_path(2)}/meta_ipynb/tigapics.html",
]
metadata = ut.load_metadata(meta_filename)
# Get contentUrl from metadata file
ut.info_meta(metadata)
Dataset: tigapics_mosquitoalert¶
1. Distribution by image download from MosquitoAlert webserver¶
This distribution allows to download individual pictures (adults and sites) that can be viewed at the Mosquito Alert map webserver given a picture IID.
# Get metadata
contentUrl, dataset_name, distr_name = ut.get_meta(
metadata, idx_distribution=0, idx_hasPart=0
)
# Make folders for data download
path = f"{project_path}/data/{dataset_name}/{distr_name}"
ut.makedirs(path)
In order to get a picture from the Mosquito Alert public map we should know its file-name (ID hash and file extension). The simplest way to get one is to check visually the map, for example as given below:
# Set up an picture ID file-name and get the relative URL address
ID_PICNAME = "a67c2ad2-09b6-4dbe-9cd0-2536a91e17f3.jpg"
contentUrl_pic = contentUrl.format(ID_PICNAME=ID_PICNAME)
Note that this particular ID corresponds to the below displayed mosquito adult picture available on the Mosquito Alert public map.
Another way to get a picture ID is to get the reports dataset, where the measured variable movelab_annotation is a dictionary with key photo_html that gives photo href (ID) relative to a given report.
# Download the picture and save it
r = requests.get(contentUrl_pic)
with open(f"{path}/{ID_PICNAME}", "wb") as f:
f.write(r.content)
Dataset: tigapics_labels_bioimagearchive¶
1. Distribution by image download from BioStudies repository¶
This dataset distribution allows to download the labels that describe the tigapics_bioimagearchive dataset.
# Get metadata
contentUrl, dataset_name, distr_name = ut.get_meta(
metadata, idx_distribution=0, idx_hasPart=1
)
# Make folders for data download
path = f"{project_path}/data/{dataset_name}/{distr_name}"
ut.makedirs(path)
# Download the file list of labels and save it
r = requests.get(contentUrl)
with open(f"{path}/labels.tsv", "wb") as f:
f.write(r.content)
# Get the labels into a dataframe
df_labels = pd.read_csv(f"{path}/labels.tsv", sep="\t")
df_labels.head()
Dataset: tigapics_bioimagearchive¶
1. Distribution by single image download from the BioImage Archive repository¶
This distribution allows to download individual pictures of mosquito adults useful for machine-learning classification tasks.
# Get metadata
contentUrl, dataset_name, distr_name = ut.get_meta(
metadata, idx_distribution=0, idx_hasPart=2
)
# Make folders for data download
path = f"{project_path}/data/{dataset_name}/{distr_name}"
ut.makedirs(path)
In order to get a picture from the Mosquito Alert BioImage Archive we should know the relative species and file-name ID labels. This information is provided by the tigapics_bioimagearchive_labels dataset. For example, we just take the first entry of this label dataset.
# Set up the picture ID and get the relative URL address
CLASS, ID_PICNAME = df_labels.iloc[0]["Files"].split("/")
contentUrl_pic = contentUrl.format(CLASS=CLASS, ID_PICNAME=ID_PICNAME)
# Download the picture and save it
r = requests.get(contentUrl_pic)
with open(f"{path}/{ID_PICNAME}", "wb") as f:
f.write(r.content)
2. Distribution by image-chunks download from BioStudies repository¶
This distribution allows to download pictures of mosquito adults in chunks by classes that may correspond to species.
# Get metadata
contentUrl, dataset_name, distr_name = ut.get_meta(
metadata, idx_distribution=1, idx_hasPart=2
)
# Make folders for data download
path = f"{project_path}/data/{dataset_name}/{distr_name}"
ut.makedirs(path)
In order to get a picture from the Mosquito Alert BioImage Archive we should know the relative classes available. One may chose between the follwing classes: Aedes_albopictus, Aedes_aegypti, Aedes_japonicus, Aedes_koreicus, Japonicus_koreicus, Complex, Culex, Other_species and Not_sure.
# Set up the class to download
CLASS = "Aedes_japonicus"
contentUrl_pic = contentUrl.format(CLASS=CLASS)
# Download the picture and save it
r = requests.get(contentUrl_pic)
with open(f"{path}/{CLASS}.zip", "wb") as f:
f.write(r.content)