NetCDF物理海洋数据云存储技术研究
Research on Cloud Storage Technology for NetCDF Physical Ocean Data
投稿时间:2019-05-08  修订日期:2019-07-10
DOI:
中文关键词:  物理海洋数据  NetCDF  HDFS  并行计算框架  Spark
英文关键词:Physical ocean data  NetCDF  HDFS  Parallel Computing Framework  Spark
基金项目:国家重点研发计划(2017YFC1405006);国家自然科学基金项目(41401529)
作者单位E-mail
夏伟 山东科技大学测绘科学与工程学院 xiaweiname@163.com 
艾波 山东科技大学测绘科学与工程学院 aibogis@163.com 
杨应召 山东科技大学测绘科学与工程学院  
尚恒帅 青岛阅海信息服务有限公司  
摘要点击次数: 28
全文下载次数: 0
中文摘要:
      物理海洋数据具有多维、时空和海量等特征,主要以NetCDF结构化文件格式进行存储。然而,在分布式环境中,结构化文件存在数据块寻址困难、边界不易判定等问题,制约着大数据场景下的存储及应用。论文设计基于HDFS+Spark的NetCDF物理海洋数据云存储方案,首先采用HDFS分布式存储技术存储和管理物理海洋数据;并设计基于Spark并行计算框架的数据分片方案,复写读取接口获取分布式环境下的NetCDF文件数据块地址,实现了物理海洋数据的高效率存储与查询分析。选取中国海域100年时长的物理海洋数据进行波高-周期散布图统计实验。结果表明:在数亿级记录数条件下,本文方法可将查询分析耗时由集中式文件存储方式的2300秒缩短至50秒内,效率较集中式文件存储方式提升95%以上,验证了该方法的正确性和有效性。
英文摘要:
      Physical ocean data is characterized by multidimensional, spatio-temporal and massive data, mainly stored in NetCDF structured file format. Nevertheless, in the distributed environment, there are some problems in structured files, such as difficult to address data blocks and difficult to determine boundaries, which restrict the storage and application of large data scenarios. This paper presents a cloud storage scheme of NetCDF physical ocean data based on HDFS+Spark. Firstly, HDFS distributed storage technology is used to store and manage physical ocean data. Secondly, a data fragmentation scheme based on Spark parallel computing framework is designed. The address of NetCDF file block in distributed environment is obtained by copy-read interface, which realizes efficient storage and query analysis of physical ocean data. The research takes 100 years’ worth of data from the China Sea as an example to carry out the statistical experiment for wave height-period scatter diagram. The experiment shows that, under the condition of billions of records, the method can shorten the time of query and analysis from 2300 seconds in the centralized file system to less than 50 seconds in cloud computing system, and the efficiency is about 95% higher than that of the centralized file storage system, which verifies the correctness and effectiveness of this method.
  查看/发表评论  下载PDF阅读器