In a recent study published in the journal NPJ Digital Medicineresearchers use the United Kingdom (UK) Biobank's large-scale accelerometer dataset, consisting of 700,000 person-days of unlabeled data, to monitor physical activity levels more accurately and generalizably. I built the model.
Research: Self-supervised learning for human activity recognition using 700,000 person-days of wearable data. Image credit: sutadimages / Shutterstock
background
The healthcare sector is rapidly seeing the development and use of wearable devices with sensors that can be used for health and fitness tracking, remote patient monitoring, clinical trials requiring real-time data, early disease detection, and personalized medicine. Increased has. , conducting a large-scale health study. These devices provide summary metrics about movement, sleep quality, steps, pace, and sedentary time. However, reliable algorithms are required to obtain information about human activities from data collected by sensors.
Fields such as natural language processing and computer vision have made significant advances due to the availability of surplus data to train these learning models, but lack large datasets that can be used to train algorithms. This has constrained progress in developing reliable and reliable models. Accurately recognize human activities. The lack of sufficient data to train these models has also confounded research findings on deep learning models, with deep learning models performing no better than traditional methods such as simple statistics. suggests.
About research
In this study, researchers used the UK Biobank accelerometer dataset to train a deep learning model to accurately recognize physical activity. UK Biobank conducted a large-scale accelerometer study that attracted nearly 500,000 participants. More than 100,000 of these participants wore accelerometers on their wrists for a week in a natural environment rather than a lab-based environment. This yielded approximately 700,00 person-days of free-living human movement data.
Overview of the proposed self-supervised learning pipeline. Step 1 involves multitasking self-supervised learning on his 700,000 person-days of data from the UK Biobank. Step 2 evaluates the usefulness of the pre-trained network on eight benchmark human activity recognition baselines via transfer learning.
The researchers used a self-supervised learning approach. This has been used successfully for examples such as generative pre-training transformers and GPT. Recent research has explored the analysis of data from wearable sensors using a number of self-supervised learning approaches, including mask reconstruction, multi-task self-monitoring, bootstrapping, and contrastive learning. In this study, we apply a multi-task self-monitoring approach to a large UK biobank dataset and explore how pre-trained models can be generalized to a wide range of health and clinically important activity-based datasets. was shown.
A multi-task self-supervised learning method was first applied to the UK Biobank's large-scale accelerometer dataset to train a deep convolutional neural network. We then used eight benchmark datasets to evaluate the performance of the pre-trained neural network and assess the quality of its representation of different populations and activity types.
The labeled dataset was used to evaluate the success of the model in transfer learning. Furthermore, this study also used a weighted sampling technique to avoid the problem of low-information, low-motion periods. Real-world data collected from motion sensors has periods of inactivity, and such static signals do not change during transformation, creating problems for self-supervised learning tasks. Therefore, to improve the convergence and stability of the training process, the researchers applied a weighted sampling approach that proportionally samples the data window and uses the standard deviation of those samples for analysis.
result
As a result, we tested the model trained in this study on eight benchmark datasets and found that it outperformed the baseline with a median relative improvement of 24.4%. Additionally, the model can be generalized across a wide range of motion sensor devices, living environments, cohorts, and external datasets.
We also find that our multi-task self-monitoring pre-training method is effective in improving downstream recognition of human activities, even on small unlabeled datasets. Self-supervised pre-training may also perform better than supervised methods.
The researchers said their study demonstrated that multi-task self-supervised learning methods can be applied to datasets from wearable sensors and that deep learning algorithms can be used to build accurate and generalizable activity recognition models. Ta.
The team of researchers will also release the pre-trained models to the digital health research community and build high-performance models based on them for use in various other areas involving limited labeled data. I made it possible to build.
conclusion
In summary, in this study, we used a large unlabeled dataset from UK Biobank consisting of accelerometer data to pre-train a deep learning model through a self-supervised approach. These pre-trained models performed above baseline levels in accurately analyzing motion sensor data across datasets with different cohorts, sensor devices, and living environments. The researchers believe these models can be built and used for a variety of scenarios involving limited amounts of labeled data.
Reference magazines:
- Yuan, H., Chan, S., Creagh, A. P., Tong, C., Acquah, A., Clifton, D. A., and Doherty, A. (2024). Self-supervised learning for human activity recognition using 700,000 person-days of wearable data. Npj digital medicine, 7(1), 91. DOI: 10.1038/s41746024010623, https://www.nature.com/articles/s41746-024-01062-3