We presented Wednesday morning at the 2012 DoD CyberCrime Conference in Atlanta, about “Forensic Clusters: Advanced Processing with Open Source Software.” This wasn’t a talk about clustering related items (although we did touch on that briefly), but more about building clusters of servers to scale up to the storage and processing demands of large-scale evidence sets.
In a nutshell, we’ve used Apache Hadoop and Apache HBase as the foundation for a new way of processing evidence files, with some key assists from The Sleuthkit, and a deep bench of other open source projects. By using Hadoop, we’re able to stripe evidence data across multiple machines without creating a storage bottleneck, and we’re able to process the evidence in parallel.
The project is still in prototype-phase, and it’s already proven itself to be a viable approach. While ripping apart an evidence file with 20 machines is a lot of fun, we’re even more excited about using intensive processing algorithms (like clustering graphics and documents) with all the CPU cycles we can now harness, and about being able to warehouse evidence files over time for comparative analysis.
You can download our slides (PDF) from the presentation, but feel free to give us a ring or drop us a line if you’d like to know more.