Implementation of an automated performance analysis tool for UPC applications

Authors

  • N.E. Andreev
  • K.E. Afanasiev

Keywords:

PGAS
Unified Parallel C
Scalasca
instrumentation
tracing
performance analysis
parallel programming

Abstract

An implementation of a performance analysis tool for applications written in Unified Parallel C is considered. This language belongs to the PGAS programming model being a new parallel programming paradigm. The main feature of the instrument is the automated analysis support, which differs it from the existing tools. A number of components of the Scalasca package were used during the implementation process.


Published

2011-05-10

Issue

Section

Section 2. Programming

Author Biographies

N.E. Andreev

K.E. Afanasiev


References

  1. Wolf F., Mohr B. Specifying performance properties of parallel applications using compound events // Parallel and Distributed Computing Practices. 2001. 4, N 3. 301-317.
  2. Geimer M., Wolf F., Wylie B., Abraham E., Becker D., Mohr B. The Scalasca performance toolset architecture // Concurrency and Computation: Practice and Experience. 2010. 22, N 6. 702-719.
  3. High Productivity Computing Systems Program [Electronic resource] (http://www.highproductivity.org/).
  4. Bell C., Bonachea D., Nishtala R., Yelick K. Optimizing bandwidth limited problems using one-sided communication and overlap // Proc. 20th International Parallel &; Distributed Processing Symposium. Rhodes Island, 2006.
    doi 10.1109/IPDPS.2006.1639320
  5. El-Ghazawi T. UPC Language Specifications [Electronic resource] (verb"http://www.gwu.edu/ upc/documentation.html").
  6. LBNL, UC Berkeley. Berkeley UPC User’s Guide [Electronic resource] // (verb"http://upc.lbl.gov/docs/user/index.shtml").
  7. Su H.H., Billingsley M., George A. Parallel performance wizard: a performance analysis tool for partitioned global-address-space programming // Conference on Supercomputing. Miami, 2006. 1-8.
  8. Leko A. Performance Analysis Strategies [Electronic resource] // (http://www.hcs.ufl.edu/prj/upcgroup/upcperf/documents/20050302-AnalysisDraft.pdf).
  9. Wolf F., Mohr B., Bhatia N., Hermanns M.A., Geimer M. EPILOG binary trace-data format. Tech. Rep. FZJ-ZAM-IB-2004-06. Forschungszentrum Julich, University of Tennessee, 2004.
  10. Su H., Bonachea D., Leko A., Sherburne H., Billingsley III M., George A. GASP! A standardized performance analysis tool interface for global address space programming models // Proc. of Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA06). Umea, Sweden, June 18-21, 2006. 450-459.
  11. Leko A., Sherburne H., Su H., Golden B., George A.D. Practical Experiences with Modern Parallel Performance Analysis Tools: An Evaluation [Electronic resource] // (http://www.hcs.ufl.edu/upc/archive/toolevals/WhitepaperEval-Summary.pdf).
  12. Wolf F., Mohr B. EARL - a programmable and extensible toolkit for analyzing event traces of message passing programs // Proc. of the 7th International Conference on High Performance Computing and Networking Europe (HPCN). Amsterdam, 1999. 503-512.
  13. Wolf F., Bhatia N. EARL - API documentation: high-level trace access library. Tech. Rep. ICL-UT-04-03. Forschungszentrum Julich, University of Tennessee, 2004.
  14. Wolf F., Song F. CUBE - User Manual. Tech. Rep. ICL-UT-04-01. Forschungszentrum Julich, University of Tennessee, 2004.
  15. Корж А.А. Результаты масштабирования бенчмарка NPB UA на тысячи ядер суперкомпьютера Blue Gene/P с помощью PGAS-расширения OpenMP // Вычислительные методы и программирование. 2010. 11, № 1. 164-174.
  16. Андрюшин Д.В., Семенов А.С. Исследование реализации алгоритма Survey Propagation для решения задачи выполнимости функций булевых переменных (SAT-задача) на языке UPC // Тр. Международной суперкомпьютерной конференции «Научный сервис в сети Интернет: суперкомпьютерные центры и задачи». М.: Изд-во Моск. ун-та, 2010. 133-135.
  17. Johnson A. Unified parallel C within computational fluid dynamics applications on the Cray X1 // Proc. of the Cray User’s Group Conference. Albuquerque, 2005. 1-9.
  18. Beech-Brandt J. Applications of UPC [Электронный ресурс] // (http://www.nesc.ac.uk/talks/892/applicationsofupc.pdf).
  19. Gordon B., Nguyen N. Overview and Analysis of UPC as a Tool in Cryptanalysis. Tech. Rep. FL 32611. High-performance Computing and Simulation (HCS) Research Laboratory, Department of Electrical and Computer Engineering, University of Florida, 2003.
  20. Cristian F. Probabilistic clock synchronization // Distributed Computing. Berlin: Springer Verlag, 1998. 146-158.
  21. Hoefler T., Schneider T., Lumsdaine A. Characterizing the influence of system noise on large-scale applications by simulation // Int. Conf. for High Performance Computing, Networking, Storage and Analysis (SC’10). New Orleans, 2010. 1-11.
  22. Rabenseifner R. The controlled logical clock - a global time for trace based software monitoring of parallel applications in workstation clusters // Proc. of the Fifth Euromicro Workshop on Parallel and Distributed (PDP’97). London, 1997. 477-484.
  23. Андреев Н.Е., Афанасьев К.Е. Автоматизированный анализ производительности параллельных программ как способ повышения продуктивности разработчика // Информационные технологии и математическое моделирование (ИТММ-2010). Материалы IX Всероссийской научно-практической конференции c международным участием. Анжеро-Судженск, 19-20 ноября 2010 г. Томск: Томский гос. ун-т, 2010. Ч. 2. 121-125.
  24. Андреев Н.Е., Афанасьев К.Е. Набор шаблонов неэффективного поведения для программной модели PGAS на примере языка UPC // Вычислительные технологии (в печати).
  25. Андреев Н.Е. Методы автоматизированного анализа производительности параллельных программ // Вестник Новосибирского государственного университета. 2009. 7, № 1. 16-25.
  26. Bonachea D. Proposal for Extending the UPC Memory Copy Library Functions, v2.0 [Electronic resource] // (http://upc.lbl.gov/publications/upc_memcpy.pdf).