Quarterly Publication

Document Type : Original Article

Authors

1 Government Information Headquarters Inspur Software Group Company Ltd, Jinan, China.

2 Department of Computer Engineering, University of Guilan, Rasht, Iran.

3 Department of Industrial Management, Allameh Tabatabai University, Tehran, Iran.

10.22105/bdcv.2022.325253.1040

Abstract

Developing the scale and increasing the data set, make the reliability and availability principal affairs to access process and data achievement. In addition, we face the challenges of handling big data in terms of storage and management. This paper provides the important issues related to the massive storage systems, distributed storage systems, and big data storage mechanisms. Then we present some analysis models utilizing in big data and describe structure of them in details.

Keywords

  1. DeWitt, D., & Gray, J. (1992). Parallel database systems: The future of high performance database systems. Communications of the ACM35(6), 85-98.
  2. Howard, J. H., Kazar, M. L., Menees, S. G., Nichols, D. A., Satyanarayanan, M., Sidebotham, R. N., & West, M. J. (1988). Scale and performance in a distributed file system. ACM transactions on computer systems (TOCS)6(1), 51-81.
  3. Cattell, R. (2011). Scalable SQL and NoSQL data stores. Acm sigmod record39(4), 12-27.
  4. Ghemawat, S., Gobioff, H., & Leung, S. T. (2003, October). The Google file system. In Proceedings of the nineteenth ACM symposium on operating systems principles(pp. 29-43). https://doi.org/10.1145/945445.945450
  5. Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM51(1), 107-113.
  6. Deng, Y. (2009). Deconstructing network attached storage systems. Journal of network and computer applications32(5), 1064-1072.
  7. Kim, S. K. (2005). Enhanced management method of storage area network (SAN) server with random remote backups. Mathematical and computer modelling42(9-10), 947-958.
  8. Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile networks and applications19(2), 171-209.
  9. Behrmann, G., Fuhrmann, P., Grønager, M., & Kleist, J. (2008, July). A distributed storage system with dCache. Journal of physics: conference series(Vol. 119, No. 6, p. 062014). IOP Publishing.
  10. Cattell, R. (2011). Scalable SQL and NoSQL data stores. Acm sigmod record39(4), 12-27.
  11. Chaiken, R., Jenkins, B., Larson, P. Å., Ramsey, B., Shakib, D., Weaver, S., & Zhou, J. (2008). Scope: easy and efficient parallel processing of massive data sets. Proceedings of the VLDB endowment1(2), 1265-1276.
  12. Beaver, D., Kumar, S., Li, H. C., Sobel, J., & Vajgel, P. (2010, October). Finding a Needle in Haystack: Facebook's Photo Storage. OSDI(Vol. 10, No. 2010, pp. 1-8). https://www.usenix.org/conference/osdi10/finding-needle-haystack-facebooks-photo-storage
  13. Thalheim, B. (2013). Entity-relationship modeling: foundations of database technology. Springer Science & Business Media.
  14. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., ... & Vogels, W. (2007). Dynamo: Amazon's highly available key-value store. ACM SIGOPS operating systems review41(6), 205-220.
  15. Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., & Lewin, D. (1997, May). Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. Proceedings of the twenty-ninth annual ACM symposium on theory of computing(pp. 654-663). https://doi.org/10.1145/258533.258660
  16. Sumbaly, R., Kreps, J., Gao, L., Feinberg, A., Soman, C., & Shah, S. (2012, February). Serving large-scale batch computed data with project Voldemort. In FAST(Vol. 12, pp. 18-18). https://www.usenix.org/conference/fast12/serving-large-scale-batch-computed-data-project-voldemort
  17. Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., ... & Gruber, R. E. (2008). Bigtable: A distributed storage system for structured data. ACM transactions on computer systems (TOCS)26(2), 1-26.
  18. Burrows, M. (2006, November). The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th symposium on Operating systems design and implementation(pp. 335-350). https://www.usenix.org/legacy/event/osdi06/tech/full_papers/burrows/burrows_html/
  19. Bradshaw, S., Brazil, E., & Chodorow, K. (2019). Mongodb: the definitive guide: powerful and scalable data storage. O'Reilly Media.
  20. Bryan, P., & Nottingham, M. (2013). Javascript object notation (json) patch. Retrieved from https://www.hjp.at/doc/rfc/rfc6902.html
  21. Murty, J. (2008). Programming amazon web services: S3, EC2, SQS, FPS, and SimpleDB. " O'Reilly Media, Inc."
  22. Anderson, J. C., Lehnardt, J., & Slater, N. (2010). CouchDB: the definitive guide: time to relax. " O'Reilly Media, Inc.".
  23. Blanas, S., Patel, J. M., Ercegovac, V., Rao, J., Shekita, E. J., & Tian, Y. (2010, June). A comparison of join algorithms for log processing in mapreduce. Proceedings of the 2010 ACM SIGMOD international conference on management of data(pp. 975-986). https://doi.org/10.1145/1807167.1807273
  24. Yang, H. C., & Parker, D. S. (2009, April). Traverse: simplified indexing on large map-reduce-merge clusters. International conference on database systems for advanced applications(pp. 308-322). Springer, Berlin, Heidelberg.
  25. Pike, R., Dorward, S., Griesemer, R., & Quinlan, S. (2005). Interpreting the data: Parallel analysis with Sawzall. Scientific programming13(4), 277-298.
  26. Gates, A. F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S. M., Olston, C., ... & Srivastava, U. (2009). Building a high-level dataflow system on top of Map-Reduce: the Pig experience. Proceedings of the VLDB endowment2(2), 1414-1425.
  27. Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Anthony, S., ... & Murthy, R. (2009). Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB endowment2(2), 1626-1629.
  28. Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., ... & Gruber, R. E. (2008). Bigtable: A distributed storage system for structured data. ACM transactions on computer systems (TOCS)26(2), 1-26.
  29. Isard, M., Budiu, M., Yu, Y., Birrell, A., & Fetterly, D. (2007, March). Dryad: distributed data-parallel programs from sequential building blocks. Proceedings of the 2nd ACM SIGOPS/EuroSys European conference on computer systems 2007(pp. 59-72). https://doi.org/10.1145/1272996.1273005
  30. Ekanayake, J., Gunarathne, T., Fox, G., Balkir, A. S., Poulain, C., Araujo, N., & Barga, R. (2009, December). Dryadlinq for scientific analyses. In 2009 Fifth IEEE international conference on e-science(pp. 329-336). IEEE.
  31. Moretti, C., Bulosan, J., Thain, D., & Flynn, P. J. (2008, April). All-pairs: An abstraction for data-intensive cloud computing. In 2008 IEEE international symposium on parallel and distributed processing(pp. 1-11). IEEE.
  32. Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., & Czajkowski, G. (2010, June). Pregel: a system for large-scale graph processing. Proceedings of the 2010 ACM sigmod international conference on management of data(pp. 135-146). https://doi.org/10.1145/1807167.1807184