“Big data is not a precise term; rather it’s a characterization of the never ending accumulation of all kinds of data, most of it unstructured. It describes data sets that are growing exponentially and that are too large, too raw or too unstructured for analysis using relational database techniques. Whether terabytes or petabytes, the precise amount is less the issue than where the data ends up and how it is used.”——Cite from EMC’s report “Big data: Big opportunity to create business value”.
When explosion happened in mobile network, cloud computing and internet technology, more and more different information appeared. In the past, the numerous terabyte data could be a disaster for any company, because it means high cost of storage and high performance CPU. However, in nowadays, companies discovered many facts they haven’t thought about these data before. Companies started to use data analytics technology to find business values from these terabyte or petabyte data. It seems to be a big opportunity instead of disaster for companies now.
Data is not only defined as structured data. When we talking about big data, it could be categorized into three types of data: structured data, unstructured data, and semistructured data (Please see Chart I).
Especially when internet and mobile internet developed rapidly, the unstructured data and semistructured data exploded. For example, a bank could draw a conclusion by analyze unstructured data to find out why number of churn increased. Most definitions of big data all talk about the size of data.
However, size, or volume, is not the only characteristic of big data. There are other two characteristics, variety and velocity. Variety means big data generates from several of sources. Data type was no longer connected to structured data. According to the EMC’s report, most of big data related to unstructured data. Velocity means the speed of data production. Data was no long structured data which was stored in the structured database. Data could come from anywhere and anytime: mobile, censors, devices, manufacturing machine etc. The stream of data generates in real time. This means company’s action should be taken with this speed.
Structured data| Structured data is organized in structure. These data can be read and stored by computer. The form of structured data is structured data base that store specific data by methodology of columns and rows. | Unstructured data| Unstructured data refers to the data without identified structure. For example, video, audio, picture, text and so on. These data also called loosely structured data. | Semistructured data| Semistructured data organized in semantic entities. The data’s size and type in one group could be different. For example, XML and RSS feeds. This data try to reconcile the real world with computer based database.| Chart I. Three types of data.
Big data analytics
Big data analytics is not a technique. It is a terms that contains a lot of technologies (See EXHIBITION I). Based on enterprise’s different requirement, each program will use different technology to analyze data. However, with the big data’s development, some of these techniques become popular and useful. On the basis of the exhibition II, advanced analytics, visualization, real time, in-memory databases and unstructured data have strong-to-moderate commitment and strong potential growth.
The traditional techniques, for example, OLAP tools and hand-coded SQL, have gradually lost their place. When a bank want to find the reason why the number of customer churn increased, or marketing department decide to push precise advertisement to their customer, they need to analyze customer behavior. These data from customer service emails, phone call records, sales interview reports, login data from mobile devices, and so on. Almost all of these data cannot be analyzed by traditional data analytic techniques. That’s why these new techniques development so rapid and fierce. How a company adopt big data analytics?
According to the article “Big Data, Analytics and the Path from Insights to Value” published on MIT Sloan Management Review, the author categorized the company who used big data analytics into three stages (See Exhibition II). For most companies, it is easy to establish an enterprise data warehouses (EDW’s). However, how to interpret these data and finding the business value from these data become the most crucial factor for companies. Besides, so many techniques and tools behind the term big data. For any company who decide to adopt big data analytics, the leading obstacle is lacking of understanding of how to use analytics to improve their business. From the article, the author gave 5 recommendation to any company who wanted to adopt big data analytics.
1. Think Big. Focus on the biggest and highest value opportunities. Narrow down the options. 2. Start in the Middle. Within each opportunity, start with questions, not data. Company prefer to collect data and information at first place. In fact, start with questions could help company continue to narrow down the scope and define the most valuable direction. 3. Make analytics come alive. When Problem was defined, company need to apply analytics. Choosing the propriety tools to analyze the data. 4. Add, dong detract. Use centralized analytics. Every analysis is connected. 5. Build the parts, plan the whole. Big data from everywhere. The data will become more and more big and complex. Building the data infrastructure is crucial for big data analytics. Big Data, Big Opportunity
When company decide to concern big data, it means every department are involved. Big data is not IT department’s or analysts’ responsibility. In fact, big data analytics need information and help from sales, marketing, R&D, IT and even external sources. Today, number of companies have entered into big data market. The following chart lists some big organizations who have adopted big data analytics. Besides, some of them provide big data services to other companies
These organizations are just the tip of the iceberg. When big data converted from Blue Ocean to Red Ocean, some of these organizations have turned into services provider. This become a future trend in big data area. Big data needs expensive hardware and labor cost. Not every company can afford that. Besides, big data involved so many different computer technologies, not everyone understood all these techniques. For that matter, there will be more and more companies try to seek big data service from external environment. Using the external big data platform or tools could reduce the cost for building a totally new technique teams. What the companies need to do is finding the problem, narrow down the scope and sending the needs to services provider. When they get the analysis result, they could use the valued result to take the next action.
Furthermore, these services provider will not only focus on big companies. The new fashion is to provide friendly interface and easy to use product to individual customer. What behind big data will be still mystery for people, however, the face or terminal of big data will become more and more friendly and simple. There is an example: Twithink. Twithink is a program invented by a MIT group. They provide customized twitter behavior analysis for customer. This program could draw some conclusion by analysis the unstructured information on Twitter. They collected the gender, location, time, key words, images, etc. from tweets. Then they analysis these data under certain arithmetic to draw conclusions. The last research was the Election in 2012. The latest research is Gun Control discussion which still in progress.
Problem and threats.
Although big data has many opportunity and advantage for enterprises, it still has some disadvantages. The first crucial problem is privacy invasion. After you searched one product on Amazon, the next time when you login to Amazon, you will find the products you may interested which was Amazon pushed to you. This is called precise advertisement. However, you even didn’t know when amazon collected your information. Another example was Google Analyst, company embedded code into their website to collect people’s internet behavior.
These things happened every day and everywhere. It is hard to argue this action is right or wrong. Maybe some are good. However, if personal data is sold or published by someone, it will affect individual’s daily life. It will become a crucial problem. The Second problem is information’s validity. According to the article “With big data comes big responsibilities” points out that “big data sets are never complete”. If data is insufficient, the analysis result would be invalid or distorted. The invalid information would guide company to wrong direction and cause a big loss. Thus, big data also has two side. How to use it to create more value for company is the first consideration for all managers.
1. “Office 2013 Brings BI, Big Data to Windows 8 Tablets.” ZDNet. N.p., n.d. Web. 25 Jan. 2013. 2. “Big Recognition for IBM Big Data.” Smarter Computing Blog Big Recognition for IBM Big Data Comments. N.p., n.d. Web. 25 Jan. 2013. 3. “Big Data.” Wikipedia. Wikimedia Foundation, 26 Jan. 2013. Web. 26 Jan. 2013. 4. “Structured Data.” Webopedia. N.p., n.d. Web. 26 Jan. 2013. 5. “Unstructured Data.” Webopedia. N.p., n.d. Web. 26 Jan. 2013. 6. Group of EMC. Big Data: Big Opportunities to Create Business Value. Rep. EMC, n.d. Web. 26 Jan. 2013. 7. Philip Russom. Big Data Analytics. N.p.: TDWI, 2011. Print. 8. Lavalle, Steve. “Big Data, Analytics and the Path from Insights to Value.” MIT Sloan Management Review Winter 2011: 21-31. Web. 9. “大数据已成红海？！全球十四个大数据公司全面盘点！.” N.p., n.d. Web. 26 Jan. 2013. 10. “IBM InfoSphere Platform Big Data, Information Integration, Data Warehousing, Master Data Management, Lifecycle Management & Data Security.” IBM InfoSphere Platform Big Data, Information Integration, Data Warehousing,
Master Data Management, Lifecycle Management & Data Security. N.p., n.d. Web. 26 Jan. 2013. 11. “Amazon Web Services, Cloud Computing: Compute, Storage, Database.” Amazon Web Services, Cloud Computing: Compute, Storage, Database. N.p., n.d. Web. 26 Jan. 2013. 12. “Oracle Big Data Appliance.” Oracle Big Data Appliance. N.p., n.d. Web. 26 Jan. 2013. 13. “Google BigQuery Feedback on This Document.” Google BigQuery. N.p., n.d. Web. 26 Jan. 2013. 14. “EMC Greenplum Data Computing Appliance – Data Warehousing, Data Analytics (FW).”EMC Greenplum Data Computing Appliance – Data Warehousing, Data Analytics (FW). N.p., n.d. Web. 26 Jan. 2013. 15. “Teradata.” Data Appliance, Data Warehouse, Business Intelligence â”. N.p., n.d. Web. 26 Jan. 2013. 16. “Twithinks.” TwiThinks. N.p., n.d. Web. 26 Jan. 2013.
17. Eria Naone. With Big Data Comes Big Responsibilities. N.p.: MIT Technology Review, n.d. 2011.