A Summary of Enterprise Big Data Platform
With the rapid development of mobile Internet, social network, Internet of things and cloud computing are used widely, and the data also turns to expand explosively. Obviously, the period of big data is coming. The big data owned by enterprises is gradually becoming an important factor of production. Facing new opportunities and challenges, enterprises should revitalize the related data assets inside and outside the enterprises and explore the data value fully to provide the supporting of decisions for improving management and data support for precision marketing of market and customer. In this case, enterprises can restore their core competitive power, expand the business market, provide new engine power for increasing revenue of enterprises, and then make themselves to be data-driven enterprises with core competitive power of big data collection, storage, analysis and using.There is no doubt that the construction and operation of the big data platform is a key action to achieve the above goals, and it is also the core task and objective of enterprises information technology construction. Through building big data platform, enterprises can realize the integration and optimization of resource, the unification of data collection, data processing, data sharing and data service, the improvement of operating efficiency and maximal releasing of data value. Finally, we can realize the data governing goal of “storing in one place, controlling uniformly, applying in several places and showing values”.The construction goals of the big data platform1Data platform of enterprises is an important part of information technology supporting system, with the capabilities of aggregation and processing of data in several internal systems and other external domain systems. At the same time, through unified data management and encapsulation, data assets which are formed in data processing, we can have the system support for the capabilities of enterprise data management, data service and application system management.The big data center should contain following three feathers:“Enterprise-level” data center: the big data center gets data resources from internal and external systems and then integrates and processes them, generates enterprise-level unified data view. Its data services are apply for every system and enterprise application areas, implement the cross-business, cross-department, cross-system data supporting.“Intelligent” data center: It has complete data with standard specifications; Platform has sufficient functions; Data sharing is sufficient; Applications are intelligent and variable; System resources can be extended flexibly; System architecture can be configured dynamically.“Management and operation” data center: the enterprise big data platform should play the pivotal role of enterprise data management and operation, provide the integrative support for enterprise data collection, transforming, processing, application, controlling, operation and maintenance. It implements the seamless convergence and highly coordination between the data management process and every business process, builds the operating ecosystem of data value.The Principles of Construction21. The principle of integrationIn the aspect of system technical architecture, we should bring data of municipal-level system into the range of data collection, and achieve unified data storage and sharing in whole province.In system functional architecture, we should achieve data sharing access and the personalized development supporting for data application in the aspect of data usage and data application development.2. The principle of advancementIn the aspect of system architecture, the enterprise big data platform should ensure its advancement and expendability, and ensure that it can be deployed correctly and it is advanced properly which can meet the long period of enterprise development requirements, provide a stable and smooth-evolve system framework for the introduction of source data and application construction.In the aspect of system technology, we should select advanced products and technology to build a data center platform which has feathers of flexible architecture, smooth-expendable structure and flexible resources configuration.In the aspect of system architecture, we should separate big data processing analysis platform and big data application platform in data center, select component-based and modularized technology to ensure the platform has a stable kernel but has a highly opening and flexible extension feature.In the aspect of data delivery service, we should separate data and application following advanced SOA concept, provide the fast and convenient application delivery ability, and support all kinds of users get related data service as requirement quickly through data service platform.3. The principle of intelligenceWe should gradually change the traditional BI data application and service which is focusing on report to the new platform of big data. It should be insight-oriented. We should improve the efficiency and relevance of data analysis, processing and application; provide full power of data-driven; improve data application and data service level; show the principle of intelligence in system aspects of perception, scheduling, triggering, etc.In the aspect of system architecture, we should use the distributed computing and distributed storage technology to provide the capability of rapid and deep processing of the massive data. It should meet the requirements of the intelligent system for flexible data processing, data storage and intelligent dispatching.In the aspect of data management, it should achieve the intelligent storage and management to expansion of data assets and intelligent guidance and triggering to the process of data use through a comprehensive, flexible and reliable data asset model and management system support, which is based on the formation of data assets formed by multi-form, massive data aggregation processing.In the aspect of data application, it should have the ability of sensing the development and usage track of data application through the support of the scenarized, standard and process application system and flexible and wieldy application development tools. It should also achieve the intelligent guidence to the use of data application through the capability of self learning and self-help data encapsulation of the system.In the aspect of system management, it should achieve the visual management, unified scheduling and process monitoring of data processing, data encapsulation, data application development and use of data application, forming the intelligent nerve center of the big data platform.4. The principle of openingThe service objects of the enterprise big data platform: the data center has a wider range of service objects, including the internal management staff, personnel and operators at core positions, as well as the external partners, group customers and public customers. Besides, it also includes the internal and external information systems and IT platforms for all areas and businesses. The data center can set different data open levels based on data security standards, service content and SLA level.The service fields of the big data platform: with the continual expandation of the source data coverage and the data service fields, it can provide multi-domain data applications such as management-oriented, market-oriented and external customer-oriented applications.A variety of network protocols, hardware interfaces and data interfaces that the big data platform using should meet the industry open standards. It should provide external data service for the business systems of enterprises and support the construction of open data applications through standardized data services such as data encapsulation.5. The principle of safetyThe platform of big data should have a unified and comprehensive security mechanism to ensure data security and system security.Data security: the data stored in big data platform is the internal and external data of the business and management lines that are aggregated and refined, which is of great significance to the management decision supporting, manufacture operating and business expansion. The data access and opening level should be controlled strictly. Besides, it is necessary to have differentiated security access policies and hierarchical management mechanism for different service objects and data content to ensure the security of asset information preservation, access and transmission.6. The principle of easy-usabilityThe easy-usability of data application:it means that achieving the visualization of data assets and the unified data assets are easy to understand and apply. At the same time, it provides a friendly UI to ensure that users have a good usage experience of using application data service, which requires that the system has a easy-to-use human-computer interaction interface and a variety of flexible ways of presentation.The usability of operational management: the functions of operational management are in-depth and intuitive. It is easy to understand and learn with accurate and meticulous functional description and clear logic, which makes it easy for users to find the menu that they want to use immediately when they enter into the interface and then do operation and maintenance tasks.7.The principle of overall planning, implement graduated and iterative evolutionThe planning and implementation of big data platform should be combined with transformation strategies of enterprises to achieve breakthrough innovation with advancement, intelligence and foresight.The specific evolution process should be combined with several factors including business demands in different stages, IT environment and the maturity of data center. There are three stages, including the system construction of data center service capability, the big data platform service capability and the big data platform operation service capability.System Planning3.1 The system capability systemThe construction of big data platform is a complex systematic project, whose capability systems include:Build big data management platform to support the capabilities of collecting, processing, storage and service of the big dataIt builds the big data platform of hierarchical processing and functional decoupling by using functional and technical architecture based on distributed processing;It uses several processing technologies of the big data, such as Hadoop, real-time streaming processing, distributed database, NoSQL database, etc, and builds the capabilities of processing internal and external structure, unstructure, flow data for massive data with different forms.It deploys stored resources reasonably and implements hierarchical data stored management for the enterprise big data platform.It uses encapsulation technology of application data to achieve system support for external data services system.It implements management of data quality to guarantee the accuracy of the big data platform.It implements management of system and information security to guarantee the stable operation of the enterprise big data platform and information security with high availability.Aggregate the internal and external source data, and organize polymorphic big data effectivelyIt achieves the aggregation of the companys internal and external system data by using a unified data collection interface. The system data includes the value source data in companys internal and business platforms, the external Internet, social statistics and data of competitor.It supports the collection of structured, unstructured and real-time data.It supports the collection and storage of massive data.Build enterprise-level data assets, and support sharing and reuse of core dataIt implements the big data by using advanced processing technology and achieves the unified management of the basic data assets.It builds advanced data models by the principle of intelligence to achieve the continual processing from the basic data assets to the application of data assets.It builds systematic long-term storage of data assets and processing and conversion program for large data to achieve the multi-dimensional control mechanism and system management support capabilities with requirements of storage efficiency, time, access and application.It builds standardized data service capabilities to meet the requirement of the company’s operation dataIt builds standardized data service processes and service standards based on unified data assets.It provides data support for the companys internal and external systems and users based on the way of standardized data service.Build data application system, and support the mining and application of data valueIt builds big data value mining and application process based on the enterprise big data platform supporting; promotes data application in the field of business operation to show data value.For internal management: it serves the enterprises meticulous management requirements and enhances the level of business operations management.For customers: it optimizes customer perception to improve the capabilities of precisive marketing and personalized customer service.For partners: it improves the value of enterprises’ data by providing high-value data services products for their partners.3.2 Overall framwork of systemBased on the goals and principles of construction of the big data platform, the platform supports data source introduction and data aggregation based on polymorphic and massive data in the period of big data. It forms centralized enterprise-level data assets through the capability of data processing. It also achieves the intelligent support for data appllication and the development and use of data products to support the delicacy management and all-service operation through the data assets processing, data encapsulation, and the construction of data management system and data application system.There are five aspects in the overall framwork of the big data platform, including data aggregation, data assets, data application, system management and access portal. The enterprise-level basic data assert and application data asserts is formed through the aggregation of polymorphism and massive source data. It provides the data service based on data assert and supports data application and the development of data products through data mining and encapsulation. Besides, through system portals, it provides the support of data analysis and business decision support for every kind of staffs of enterprises, provides data service capability for internal business support system and platforms of each business, and provides intelligent data value products for external users, such as partners and group customers to help improving capability of operation of external users. The overall framework is as follows:Figure 3-1 Overall structure of the big data platform【Data aggregation】In the aspect of data aggregation, we should aggregate the companys internal and external data containing the companys external systems, forming a normal data aggregation mechanism.According to the data types and interface types of the source data, we should build the matched interface and processing capabilities to realize the aggregation of structured data/unstructured data and real-time/non-real-time data, which provides the basic data for the formation of centralized unified data assets.【Data assets】In the aspects of the construction of data assets, we should build the architecture of structured, modeled data assets for organization, management and storage. According to business requirements (high heat data / low heat data), we should provide different data asset classification management and storage environment. By introducing NoSQL database, an efficient architecture formed with relational database, which has the comprehensive management support capabilities of data assets.There are two aspects in data assets: basic data assets and application data assets.【Data application】There are three types according to the form of data application, including data application, data product and data service.Data service: it is the data encapsulation based on data assets, providing standardized data service interface for business system, opening data ability in the way of service and realizing effective interaction amomg systems.Data product: it is a high-value data product with the basis of data assets for analysis and mining provided by the enterprises’ customers, such as external partners.Data application: it includes the city-level data processing applications based on data mart and the provinces data applications formed by the continue exploration to data value based on province-level data assets. through the sharing and interaction between provincial-level data and city-level data, it provides value application support for integrated operation between the two level data and provides data analysis support capabilities for the companys business decision-making and management.【System Management】System Management is responsible for unified management and scheduling of data assets and data applications for enterprises’ big data platform, building the data-level visual management capability, realizing the whole process of data traceability, data quality management and service process monitoring and management and providing basic guarantee capability for the open operation of data center.【Visit the portal】There are three parts, including the internal portal for the provincial and municipal business personnels, the external portal for the data products operation for the external personnels and the management portal for management personnels of the enterprises’ big data platform.The internal portal: it provides unified access Portal for the business people in provinces and cities and the companies’ leaders. Customized personal workbench is realized. Among them, the city-level data application provides unified access and management through the internal portal of opening the city area to the city personnels.The external portal: it provides independent, secure portals for external people. It provides data operating products for external enterprises and individual users in the form of shelves to facilitate external customers to obtain their own data products and enhance their own business operations capability of enterprises’ and individuals’.The management portal: it is a portal for enterprises’ data platform management personnels and data service developers. Through the portal, management personnels achieve the functions of data asset management, data service management, data service self-assembly operation, the system daily operation and maintenance and data quality control, while data service developers achieve data service self-development and registration, management, and build the shared platform of high value data service development.3.3 Functional architecture of the big data platformThe base platform of the big dataData collecting platform is the platform for unified collection of business data. The access layer is formed by the web crawler and interface file collection. It provides a standardized and efficient data service for the data layer through the data acquisition sublayer to realize the full sharing of business data.Data storage and processing is the platform for the unified organization and centralized management of business data. Data layer is formed by the components of data storage and data processing. Components are program modules that implement specific functions, such as HDFS components, MapReduce components, Hive components, YARN components, Hbase components, and so on.The data bus realizes the data interaction among the internal modules through the data integration platform or ETL.The capacity platform of the big dataIt is based on the data analysis capabilities of the upper layer data, focusing on differernt business products. Through the technology packages, such as Java, Jobs, the data service capability components are formed. These components include keyword analysis components, customer identification components, label product components and behavior analysis components. They provide the platform for data service capabilities and platform service capabilities.The management platform of the big dataThe management platform of the big data is a platform managed and monitored uniformly by large data capacity products and application platform. The management layer is formed by the system security, data quality, job scheduling, operation, maintenance monitoring, etc. Through the establishment of network security, data privacy protection, scheduling mechanism, operation and maintenance monitoring and management, it provides the security measures of the management for platforms.The application platform of the big dataThe application platform of the big data is a platform to implement application functions for big data capabilities products and application platform. It can be accessed through a standardized service interface, which is encapsulated by one or more components according to certain rules and standards. It is the object of displaying logic call and entities that are parts of business process, which can also call services to complete business functions.Information construction is a gradual and continuous improvement process, and it cannot be achieved overnight, especially the construction of the big data platform as one of the enterprise architectures. That is, we should not only have a far-sighted strategic vision, but also have a down-to-earth spirit. We should combine the current requirement with the long-term planning and development, highlighting the key thing and implementing it step by step to achieve strong support for the precise management and brand operating.