Skip to Content

Eclipse mailing lists

The Eclipse Mailing lists dump is an extract of all emails posted on the Eclipse mailing lists, as a single CSV file or as per-project mboxes.

  • Download the Eclipse mailing lists dataset [ CSV ].
  • Check the documentation for the dataset here (HTML). For reproducibility we also provide the R Markdown document for the dataset analysis and documentation.
  • Download the mbox files [ see the list ]

These datasets are published under the Creative Commons BY-Attribution-Share Alike 4.0 (International) licence.

The CSV extract

This dataset is a dump of all posts sent on all mailing lists hosted at the Eclipse Forge. It only includes the list name, post ID, sent date, author name and address, and post subject. the body of messages is dismissed.

Although this is public data (the mailing lists can be browsed on the official mailman page) all data has been anonymised to prevent any misuse. The privacy issues identified, along with the anonymisation process, have been covered in a dedicated document.

Downloads

  • Download the Eclipse mailing lists dataset here.
    • Content: roughly 400K entries, 6 attributes
    • Size: 12M compressed, 63M raw
  • Check the documentation for the dataset here. For reproducibility we also provide the R Markdown document for the dataset analysis and documentation.

Project mboxes

This dataset provides all Eclipse mailing lists as mboxes, compressed using gzip. Exhaustive list of downloads is as follows:


Note: list obtained through the following command:

for i in `ls`; do
     s=`du -sh $i | cut -f1`;
     echo "* [${i%%.mbox.gz}]($i) (size: $s)" >> list.txt;
done