The Mythical Data Scientist Shortage

Data Informed

The Mythical Data Scientist Shortage

April 25, 2013

In a recent meeting with roughly 100 CIOs and other IT executives of Global 2000 companies, the topic of data scientists came up. Everyone seemed to be looking for data scientists, and everyone agreed that finding talent was tough, a conclusion supported by recent NewVantage Partners survey data. Intriguingly, most also felt that a newly hired data scientist would lack the business context for asking the right questions of enterprise data.

In other words, enterprises may be looking in the wrong place for data science talent, setting up their Big Data projects to fail.

Related Stories

Guide to leading data-driven organizations.
Read the storyÂ Â»

Study: best analytics pros show curiosity, creativity and discipline.
Read the storyÂ Â»

Part of the problem lies in the very name â€œBig Data.â€ Enterprises become so intent on the sheer volume of data being collected that they lose sight of the much more essential act of intelligently querying the data for insights. In other words, the goal of the data scientist isnâ€™t to ask bigger questions, but rather to ask better questions.

To do this, context is key.

Gartner analyst Svetlana Sicular highlighted this in her analysis of rising demand for data scientists, arguing that â€œOrganizations already have people who know their own data better than mystical data scientists.â€ As such, enterprises should look within for expertise because â€œLearning Hadoop is easier than learning the companyâ€™s business.â€

And yet so many donâ€™t, despite the crushing need for data talent. According to a 2011 McKinsey Global Institute report, the United States must boost its data-savvy graduates by 60 percent, given that roughly 500,000 data science jobs await, leaving the U.S. 190,000 qualified data scientists short by 2018.

Many new data professionals are expected to come from graduating students, as EMC found in its survey, depicted in the pie chart above. While this is a good long-term source for talent, the better source today is behind the firewall of oneâ€™s enterprise.Â

This is the best way to ensure an enterpriseâ€™s data science team is deeply integrated into the business, rather than a foreign body that â€œgets data but not our business.â€ As renowned statistician Nate Silver argues in his book The Signal and the Noise,
[N]umbers have no way of speaking for themselves. We speak for themâ€¦.If the quantity of information is increasing by 2.5 quintillion bytes per day, the amount of useful information almost certainly isn't. Most of it is just noise, and the noise is increasing faster than the signal. There are so many hypotheses to test, so many data sets to mine--but a relatively constant amount of objective truth.
In other words, more data equals more noise, not necessarily more signal. To get at the â€œtruthâ€ in our enterprise data, we need to be equipped to ask the right sort of questions, which generally means having familiarity with the business itself, and not merely abstract data.

The best data scientists, then, and the best data science teams, will be those that best function as a welcome extension to existing teams, rather than an outside body holding court on enterprise data.

Is there a data scientist shortage? Perhaps. But that may be because we keep looking in the wrong place: outside our own organizations.

Matt AsayÂ is vice president of corporate strategy at 10gen, the company behind MongoDB NoSQL database. With more than a decade spent in open source, Matt is a recognized open source advocate and board member emeritus of the Open Source Initiative.Â

The post The Mythical Data Scientist Shortage appeared first on Data Informed.

DataInformed?i=ZkYRM19DWnY:tOdH8UfDSQU:F7zBnMyn0Lo

DataInformed?i=ZkYRM19DWnY:tOdH8UfDSQU:V_sGLiPBpWU

DataInformed?i=ZkYRM19DWnY:tOdH8UfDSQU:gIN9vFwOqvQ

DataInformed?i=ZkYRM19DWnY:tOdH8UfDSQU:KwTdNBX3Jqk

Dedham, MA

Big Data and Analytics in the Enterprise