Data Driven Security

Data Driven Security - Episode 10

October 24, 2014

Episode 10

In this episode, Jay & Bob have a community discussion with John Langton & Alex Baker about their security data analysis & visualization startup: VisiTrend, and take a look at what's made the headlines in the data science community since last show.

Resources / people featured in the episode + link insights from VisiTrend:

VisiTrend - @visitrend
https://twitter.com/visitrend
https://visitrend.com/

VERIS/VCDB general vis - we have a tree map version of the actors, actions, assets, and attributes breakdown which better shows the distribution of events (description on snapshot).
Snapshot - https://visitrend.com/cyber/snapshot/snap.html?543acc01e4b0e3434852f71d

VERIS/VCDB clustering - each square is an event in the data set. Squares are first grouped based on # of employees (e.g. companies with 1k employees will be grouped together), and then based on industry. Squares are colored based on clustering output - we found 7 clusters. We will provide more detail on what defines these clusters in a blog post. Itâ€™s interesting to see that particular industries do have particular attack types according to clustering, shown by blocks of similar color.
Snapshot - https://visitrend.com/cyber/snapshot/snap.html?543acac5e4b0e3434852f71b

Honeypot overview - this is really cool (I think). Black, square nodes are the honey pots. Node size is based on the # of packets theyâ€™re sending. Computers use more different ports are colored red (big red guy doing massive port scan drowns out the others). The force directed layout clusters nodes if they hit the same honeypots. For instance, click a node in an â€œouter ringâ€ twice to highlight the honeypot itâ€™s hitting, and it will be one. All other nodes in that ring hit the same one. Double click one of the center nodes and youâ€™ll se theyâ€™re hitting all of the honeypots. Treemap groups nodes according to subnet addressing. The timeline view shows time-based histogram of packets coming in colored by destination port. The red guy is selected in the snapshot, so you can see that he blasts all the honey pots at relatively same time.
Snapshot - can be posted and viewed without logging in: https://visitrend.com/cyber/snapshot/snap.html?543accefe4b0e3434852f720

Honeypot port highlighting - Square nodes are attackers, and circle nodes are ports. Size of the port is how many times packets were sent to that port. Mouse over big purple circle and you see port 1433 is the most popular. You could double click it to see all machines hitting that port. There are two color layers for the node-link graph, you can toggle between them. They both show a version of variability over time (more red = more variable port usage). Treemap shows subnet addressing again but colors a green heat map based on # of diff ports each machine uses. Size based on # of packets they send.
Snapshot - can be posted and viewed without logging in: https://visitrend.com/cyber/snapshot/snap.html?543acebce4b0e3434852f722

Finally, a great mentor and visionary pioneer of InfoVis named Matt Ward passed away last weekend. He wrote the most recent, comprehensive infovis book with some other really big guys in the field including Keim and Grinnel. Hereâ€™s a link to the book: http://www.idvbook.com/

Data Science Headlines

Data science can't be point and click
http://simplystatistics.org/2014/10/09/data-science-cant-be-point-and-click/

In-depth introduction to machine learning in 15 hours of expert videos
http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/

Data Playlists
http://schoolofdata.org/2014/09/25/data-playlists/

Running RStudio via Docker in the Cloud
http://www.magesblog.com/2014/09/running-rstudio-via-docker-in-cloud.html

Building a DGA Classsifier (in R) - Parts 1-3
http://datadrivensecurity.info/blog/posts/2014/Sep/dga-part1/
http://datadrivensecurity.info/blog/posts/2014/Oct/dga-part2/
http://datadrivensecurity.info/blog/posts/2014/Oct/dga-part3/

Download Episode

A podcast on the journey to discovery and decision making through data in information security by Bob Rudis and Jay Jacobs.