Datasets

1. Language Model Assisted Text Compression and Decompression

We created a method that leverages the capabilities of advanced language models to condense a text into a compressed, “primitive version”, and subsequently employs a separate language model to attempt the recovery of the original text's meaning. This two-step process aims not only to preserve the semantic integrity of the text during compression but also to efficiently regenerate a detailed approximation of the original content through decompression. Areas of applications of this method include Storage - Persisting “semantic” text content that maintains the original text idea; Reduce embedding search false positives when applying traditional RAG; Improve accuracy of needle in a haystack operations.

Click here to download the implementation code.

2. Activity Images for Human Activity Recognition

Deep Human Activity Recognition with Localisation of Wearable Sensors

Lawal, I. A. and Bano, S., IEEE Access.

The dataset consists of frequency (activity) images generated from the raw tri-axial accelerometer and gyroscope signal for different human activities (walking, running, standing, laying, climbing and jumping) from seven different on-body locations including head, chest, arm, shin, waist, wrist and thigh. Each sample is a three-channel image of size 28×28×3.

Click here to download the dataset and implementation code

2. AI-Based Translator for Norwegian Cycling Terminology

Norwegian Translation of Bicycle Terminology Using Custom Named-Entity Recognition and Neural Machine Translation

Hellebust, D., & Lawal, I. A. (2023), Electronics, 12(10), pp. 23-34

We created a custom AI translator for bicycle terminology from English to Norwegian. We developed a custom dataset of 1000 manually translated sentences to fine-tune a translation model.

Click here to download the dataset and implementation code

3. Handwritten Digit Recognition

Recognition of Handwritten Arabic (Indian) Numerals Using Freeman’s Chain Codes and Abductive Network Classifiers

Lawal, I. A., Radwan E. A., and Sabri A. M., International Conference on Pattern Recognition

We studied the performance of abductive network architecture on a dataset of 21120 samples of handwritten 0-9 digits produced by 44 writers. We developed a new feature set using histograms of contour points chain codes.

Click here to download the implementation code for generating the contour points features.