A crowd-sourced list of medical datasets for education


Click here for the list of medical datasets.

While making exercises to teach machine learning for healthcare, I’ve been spending a lot of time actively looking for appropriate datasets.

I came across some curated lists of medical datasets on GitHub, but the focus was typically more on research than on education - and the lists are more comprehensive than curated.

So I decided to create one, with a specific focus on education - and thus where datasets have been (1) vetted for quality and (2) easily-accessible. You can access it here.

I only have direct exposure to a limited subset of health datasets, and therefore for this to work the effort ultimately has to be crowd-sourced. I’ve populated the repository with some initial datasets, but am ultimately leaving it as a space that I hope others may contribute to.

I’ve you worked with a dataset that you’ve found helpful, and don’t see it on the list, please consider adding it as per the contributor guidelines.

If you’re an educator looking to create educational content around medical data, I hope this repository can be a helpful resource.