Mixing the Perfect Big Data Elixir for IT Professionals in 2014
January 13, 2014 Leave a comment
• How should enterprise IT go about supporting and maybe even driving data and analytics projects in 2014?
• Familiarity with the likes of YARN, NoSQL, Python, ETL among many tools and tactics will help IT succeed with big data both this year and in the years to come
Welcome to 2014. Time for a fresh start and of course a look at how the enterprise data and analytics market will evolve as the new year unfolds. Frankly, you don’t have to look very far on the ol’ interweb to see numerous predictions and prognostications from yours truly and many, many others. Here’s a short list for those keeping score at home:
• Mobility dominates user experience development efforts
• BI in the cloud emerges for large orgs and large data sets
• Regular business users will morph into data scientists
• The public cloud scales mightily (Amazon RedShift, Google BigQuery, etc.)
• Data governance, security and privacy concerns take a front seat
• Engineered systems reach the mid-market
• Small data (day to day operational stuff) becomes a key differentiator
• Open Data informs corporate decision making processes
We could easily go on and on with this sort of soothsaying, surely a powerful testament to the energy fizzing about in the enterprise data and analytics market. But what about enterprise IT itself? How will all of these trends change the way IT goes about supporting, nay driving data and analytics projects in 2014? In my opinion, IT professionals should forget these uber trends and instead focus on a few key technologies and tactics that together can form an elixir for successful big data projects. What follows are a few of those in no particular order:
• Python. Certainly R will remain the heavyweight champ for data scientists, but increasingly IT professionals will find the Python programming language playing an important role as analytics solutions seek to equip business owners with true analytics tools.
• NoSQL. IT professionals will need to come to terms with an increasingly fragmented and complicated database marketplace where both structured and unstructured data stores must co-exist. NoSQL in particular will play an important role in 2014 as a means of unifying or at least supporting analytics and visualization solutions.
• YARN. Any organization considering Apache Hadoop had better get used to Apache Hadoop NextGen MapReduce (YARN), which will makes Hadoop all the more adaptable and useful for a wider range of workloads, allowing it to move beyond basic MapReduce functions to serve as a general data processing layer.
• ETL. IT professionals should be prepared to push and pull huge amounts of data across domains and the firewall itself as organizations look to extract, transform and load (ETL) data from public cloud services in near-real time. The inverse can apply as well in support of open data projects.
• IoT. Interest in Cisco’s vision for an Internet of Things (IoT) as well as a continued march toward software defined networking (SDN) will greatly influence core data center technologies (switches, servers, etc.), requiring IT to architect in support of massive amounts of operational data.
• HPC. High performance computing (HPC) knowhow will become a must as many in-memory database vendors start to deliver extremely high performance systems capable of actually delivering big data analytics at scale and in real-time.