Research on Cloud Storage Technology for NetCDF Physical Ocean Data
投稿时间:2019-05-08  修订日期:2019-07-10
中文关键词:  物理海洋数据  NetCDF  HDFS  并行计算框架  Spark
英文关键词:Physical ocean data  NetCDF  HDFS  Parallel Computing Framework  Spark
夏伟 山东科技大学测绘科学与工程学院 
艾波 山东科技大学测绘科学与工程学院 
杨应召 山东科技大学测绘科学与工程学院  
尚恒帅 青岛阅海信息服务有限公司  
      Physical ocean data is characterized by multidimensional, spatio-temporal and massive data, mainly stored in NetCDF structured file format. Nevertheless, in the distributed environment, there are some problems in structured files, such as difficult to address data blocks and difficult to determine boundaries, which restrict the storage and application of large data scenarios. This paper presents a cloud storage scheme of NetCDF physical ocean data based on HDFS+Spark. Firstly, HDFS distributed storage technology is used to store and manage physical ocean data. Secondly, a data fragmentation scheme based on Spark parallel computing framework is designed. The address of NetCDF file block in distributed environment is obtained by copy-read interface, which realizes efficient storage and query analysis of physical ocean data. The research takes 100 years’ worth of data from the China Sea as an example to carry out the statistical experiment for wave height-period scatter diagram. The experiment shows that, under the condition of billions of records, the method can shorten the time of query and analysis from 2300 seconds in the centralized file system to less than 50 seconds in cloud computing system, and the efficiency is about 95% higher than that of the centralized file storage system, which verifies the correctness and effectiveness of this method.
