Datasets
The NTCIR19-Lifelog task provides access to two main datasets for participants to work with. Participants can choose which dataset(s) they plan to use for their submissions. The same dataset and submission process as NTCIR18-Lifelog6 will be used for the Lifelog tasks.
CASTLE Dataset
The CASTLE (Collaborative Analytics and Search Through Lifelog Events) dataset contains multimodal collaborative session data including audio, video, chat transcripts, and interaction logs. This dataset enables research on semantic access and conversation segmentation in collaborative environments.
Dataset Link: https://huggingface.co/datasets/CASTLE-Dataset/CASTLE2024
Description: The CASTLE2024 dataset includes multi-modal recordings of collaborative sessions, featuring rich annotations and metadata for semantic search and segmentation tasks. It supports both CSAT (Semantic Access) and CAST-Seg (Conversation Segmentation) sub-tasks.
Supported Sub-tasks: CSAT (CASTLE Semantic Access Task), CAST-Seg (Conversation Segmentation), Recipe Generation
Lifelog Dataset (LSC'22-24 Dataset)
NTCIR19-Lifelog reuses the same dataset as NTCIR18-Lifelog6: the LSC'22-24 dataset, which is a multimodal dataset that spans 18 months from one active lifelogger. This comprehensive dataset supports research on semantic access, recipe generation, and personal information retrieval.
Dataset Access: Available after registration and completion of required agreement forms. Please email ntcirlifelog@gmail.com with completed agreement forms as described below.
Dataset Components: The dataset consists of three password-protected files:
- Core Image Dataset: 18 months of wearable camera images, fully redacted and anonymized in 1024 x 768 resolution, captured using a Narrative Clip device. These images were collected during 2019-2020. All faces and readable text have been removed, as well as certain scenes and activities manually filtered out to respect local privacy requirements.
- Metadata: Textual metadata representing time, locations, and other contextual information for the collection.
- Visual Concepts: Extracted from the non-redacted version of the visual dataset, including:
- attribute_top{i}: The attribute of the scene detected automatically from the image.
- category_top{i}: The category of the scene detected automatically from the image.
- category_top{i}_score: The confidence score of the scene prediction output.
- concept_class{i}: Objects detected automatically from the image (using the object category list of 2014-2017 COCO datasets with 80 labels).
- concept_score_top{i}: The confidence score of the object detection output.
- concept_bbox_top{i}: The bounding box of the detected object in the format of {top_x top_y bottom_x bottom_y}.
Supported Sub-tasks: LSAT (Lifelog Semantic Access Task)
Access Requirements: NTCIR-Lifelog participants are required to sign two forms to access the datasets:
- Organisation Agreement Form: To be signed by the organisation to which the participants belong. This form must be signed and sent by email to NTCIR-Lifelog organisers (ntcirlifelog@gmail.com) in PDF format.
Download Organisation Agreement Form (PDF) - Individual Agreement Form: To be signed by each individual researcher wishing to use the NTCIR-Lifelog data collection. This form must be filed by the participating organisation, but it does not need to be sent to the lifelog organisers unless requested at a later date.
Download Individual Agreement Form (PDF)
Upon completion of this process, participants will be sent details of how to access the dataset. Please note that participants are also expected to register on the NTCIR-19 Website in order to participate.
Dataset Selection
Participants can choose to work with one or both datasets depending on their research interests and the sub-tasks they plan to participate in:
- CASTLE Dataset: Required for CSAT, CAST-Seg, and Recipe Generation sub-tasks.
- Lifelog Dataset: Required for LSAT sub-task.
- Both Datasets: Participants can work with both datasets if they plan to participate in multiple sub-tasks across different tracks.
When registering, please indicate which dataset(s) you plan to use so that we can provide appropriate access and support.