An experimental evaluation of imbalanced learning and time-series validation in the context of CI/CD prediction

Conference item


Liu, Bohan, Zhang, He, Yang, Lanxin, Dong, Liming, Shen, Haifeng and Song, Kaiwen. (2020) An experimental evaluation of imbalanced learning and time-series validation in the context of CI/CD prediction. EASE 2020, April 15-17, 2020, Trondheim, Norway. Norway: Association for Computing Machinery. pp. 21 - 30 https://doi.org/10.1145/3383219.3383222
AuthorsLiu, Bohan, Zhang, He, Yang, Lanxin, Dong, Liming, Shen, Haifeng and Song, Kaiwen
Abstract

Background: Machine Learning (ML) has been widely used as a powerful tool to support Software Engineering (SE). The fundamental assumptions of data characteristics required for specific ML methods have to be carefully considered prior to their applications in SE. Within the context of Continuous Integration (CI) and Continuous Deployment (CD) practices, there are two vital characteristics of data prone to be violated in SE research. First, the logs generated during CI/CD for training are imbalanced data, which is contrary to the principles of common balanced classifiers; second, these logs are also time-series data, which violates the assumption of cross-validation. Objective: We aim to systematically study the two data characteristics and further provide a comprehensive evaluation for predictive CI/CD with the data from real projects. Method: We conduct an experimental study that evaluates 67 CI/CD predictive models using both cross-validation and time-series-validation. Results: Our evaluation shows that cross-validation makes the evaluation of the models optimistic in most cases, there are a few counter-examples as well. The performance of the top 10 imbalanced models are better than the balanced models in the predictions of failed builds, even for balanced data. The degree of data imbalance has a negative impact on prediction performance. Conclusion: In research and practice, the assumptions of the various ML methods should be seriously considered for the validity of research. Even if it is used to compare the relative performance of models, cross-validation may not be applicable to the problems with time-series features. The research community need to revisit the evaluation results reported in some existing research.

Keywordscontinuous integration; continuous deployment; time-series-validation; cross-validation; imbalanced learning
Year2020
JournalEASE '20: Proceedings of the Evaluation and Assessment in Software Engineering
PublisherAssociation for Computing Machinery
Digital Object Identifier (DOI)https://doi.org/10.1145/3383219.3383222
Open accessOpen access
Publisher's version
Page range21 - 30
Research GroupPeter Faber Business School
Additional information

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

Place of publicationNorway
Permalink -

https://acuresearchbank.acu.edu.au/item/8v6q3/an-experimental-evaluation-of-imbalanced-learning-and-time-series-validation-in-the-context-of-ci-cd-prediction

  • 13
    total views
  • 30
    total downloads
  • 2
    views this month
  • 3
    downloads this month
These values are for the period from 19th October 2020, when this repository was created.

Export as

Related outputs

An adaptive differential evolution algorithm to optimal multi-level thresholding for MRI brain image segmentation
Tarkhaneh, Omid and Shen, Haifeng. (2019) An adaptive differential evolution algorithm to optimal multi-level thresholding for MRI brain image segmentation. Expert Systems with Applications. 138, pp. 1 - 18. https://doi.org/10.1016/j.eswa.2019.07.037
Information visualisation methods and techniques: State-of-the-art and future directions
Shen, Haifeng, Bednarz, Tomasz, Nguyen, Huyen, Feng, Frank, Wyeld, Theodor, Hoek, Peter J. and Lo, Edward H.S.. (2019) Information visualisation methods and techniques: State-of-the-art and future directions. Journal of Industrial Information Integration. 16(100102), pp. 1 - 17. https://doi.org/10.1016/j.jii.2019.07.003
Training of feedforward neural networks for data classification using hybrid particle swarm optimization, mantegna levy flight and neighborhood search
Tarkhaneh, Omid and Shen, Haifeng. (2019) Training of feedforward neural networks for data classification using hybrid particle swarm optimization, mantegna levy flight and neighborhood search. Heliyon. 5(4), pp. 1 - 32. https://doi.org/10.1016/j.heliyon.2019.e01275
SORCER: A decentralised continuous integration platform for service-oriented software systems
Almalki, Jameel and Shen, Haifeng. (2019) SORCER: A decentralised continuous integration platform for service-oriented software systems. IEEE International Conference on Service-Oriented Computing Applications. United States of America: Springer International Publishing. pp. 458 - 464 https://doi.org/10.1007/978-3-030-17642-6_44
Developing cross-organisational service-based software systems through decentralised interface-oriented continuous integration
Almalki, Jameel and Shen, Haifeng. (2018) Developing cross-organisational service-based software systems through decentralised interface-oriented continuous integration. Australian Software Engineering Conference. United States of America: IEEE Computer Society. pp. 191 - 200 https://doi.org/10.1109/ASWEC.2018.00033
The interplay of factors affecting learning of introductory programming: A comparative study of an Australian and an Indian University
Sharma, Ritu and Shen, Haifeng. (2018) The interplay of factors affecting learning of introductory programming: A comparative study of an Australian and an Indian University. IEEE International Conference on Computer Science and Education. United States of America: IEEE Computer Society. pp. 669 - 674 https://doi.org/10.1109/ICCSE.2018.8468768
Integrating localization and energy-awareness: A novel geographic routing protocol for underwater wireless sensor networks
Hao, Kun, Shen, Haifeng, Liu, Yonglei, Wang, Beibei and Du, Xiujuan. (2018) Integrating localization and energy-awareness: A novel geographic routing protocol for underwater wireless sensor networks. Mobile Networks and Applications. 23(5), pp. 1427 - 1435. https://doi.org/10.1007/s11036-018-1093-0
A smartphone-based point-of-care quantitative urinalysis device for chronic kidney disease patients
Akraa, Shaymaa, Tam, Anh Pham Tran, Shen, Haifeng, Tang, Youhong, Tang, Ben Zhong, Li, Jimmy and Walker, Sandy. (2018) A smartphone-based point-of-care quantitative urinalysis device for chronic kidney disease patients. Journal of Network and Computer Applications. 115, pp. 59 - 69. https://doi.org/10.1016/j.jnca.2018.04.012
Extending attention span for children with ADHD using an attentive visual interface
Asiry, Othman, Shen, Haifeng, Balkhy, Soher and Wyeld, Theodor. (2018) Extending attention span for children with ADHD using an attentive visual interface. International Conference Information Visualisation. United States of America: IEEE Computer Society. pp. 188 - 193 https://doi.org/10.1109/iV.2018.00041
On the feasibility of a smartphone-based solution to rapid quantitative urinalysis using nanomaterial bioprobes
Akraa, Shaymaa, Guo, Feng, Shen, Haifeng, Tang, Youhong, Li, Jimmy, Lee, Gobert and Tang, Benzhong. (2017) On the feasibility of a smartphone-based solution to rapid quantitative urinalysis using nanomaterial bioprobes. MobiQuitous 2017: The 14th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services. United States of America: Association for Computing Machinery (ACM). pp. 523 - 524 https://doi.org/10.1145/3144457.3144508
Are you a human or a humanoid: Predictive user modelling through behavioural analysis of online gameplay data
Gao, Chen, Jin, Kaiqi, Shen, Haifeng and Babar, Muhammed Ali. (2017) Are you a human or a humanoid: Predictive user modelling through behavioural analysis of online gameplay data. Advanced Engineering Informatics. 33, pp. 410 - 424. https://doi.org/10.1016/j.aei.2017.01.004
Automatic clustering and summarisation of microblogs: A multi-subtopic phrase reinforcement algorithm
Alghamdi, Mahfouth and Shen, Haifeng. (2017) Automatic clustering and summarisation of microblogs: A multi-subtopic phrase reinforcement algorithm. In M. Wagner, X. Li and T. Hendtlass (Ed.). Third Australasian Conference, ACALCI 2017, Geelong, VIC, Australia, January 31 – February 2, 2017, Proceedings. United States of America: Springer International Publishing. pp. 86 - 98 https://doi.org/10.1007/978-3-319-51691-2_8
iLSE: An intelligent web-based system for log structuring and extraction
Serasinghe, Sahan, Shen, Haifeng and Chen, David. (2017) iLSE: An intelligent web-based system for log structuring and extraction. In J. Lv, H. Zhang and M. Hinchey and X. Liu (Ed.). 24th Asia-Pacific Software Engineering Conference: APSEC 2017: 4-8 December 2017, Nanjing, Jiangsu, China. United States of America: IEEE Computer Society. pp. 588 - 593 https://doi.org/10.1109/APSEC.2017.70
Web of credit: Adaptive personalized trust network inference from online rating data
Mao, Yuqing and Shen, Haifeng. (2016) Web of credit: Adaptive personalized trust network inference from online rating data. IEEE Transactions on Computational Social Systems. 3(4), pp. 176 - 189. https://doi.org/10.1109/TCSS.2016.2639016
Sentiment analysis and visualisation in a backchannel system
Jiranantanagorn, Peerumporn and Shen, Haifeng. (2016) Sentiment analysis and visualisation in a backchannel system. OzCHI '16: The 28th Australian Conference on Computer-Human Interaction, Launceston, Tasmania, Australia - November 29 - December 02, 2016. United States of America: Association for Computing Machinery (ACM). pp. 353 - 357 https://doi.org/10.1145/3010915.3010992
Concealing jitter in multi-player online games through predictive behaviour modeling
Gao, Chen, Shen, Haifeng and Babar, Muhammed Ali. (2016) Concealing jitter in multi-player online games through predictive behaviour modeling. In W. Shen, X. Liu and C. Yang, J.-P. Barthès, J. Luo, L. Chen and J. Yong (Ed.). The 2016 IEEE 20th international conference on computer supported cooperative work in design (CSCWD), May 4-6, 2016, Nanching, China. United States of America: IEEE Computer Society. pp. 62 - 67 https://doi.org/10.1109/CSCWD.2016.7565964
Cloud for e-Learning: Determinants of its adoption by university students in a developing country
Almazroi, Abdulwahab Ali, Shen, Haifeng, Teoh, Kung-Keat and Babar, Muhammed Ali. (2016) Cloud for e-Learning: Determinants of its adoption by university students in a developing country. In J. Guo, H. Cai and X. Fei, K.-M. Chao and J.-Y. Chung (Ed.). The thirteenth IEEE international conference on e-business engineering, 4-6 November 2016, Macau, China. United States of America: IEEE Computer Society. pp. 71 - 78 https://doi.org/10.1109/ICEBE.2016.022
NSSSD: A new semantic hierarchical storage for sensor data
Gheisari, Mehdi, Movassagh, Ali Akbar, Qin, Yongrui, Yong, Jianming, Tao, Xiaohui, Zhang, Ji and Shen, Haifeng. (2016) NSSSD: A new semantic hierarchical storage for sensor data. The 2016 IEEE 20th international conference on computer supported cooperative work in design (CSCWD), May 4-6, 2016, Nanching, China. United States of America: IEEE Computer Society. pp. 174 - 179 https://doi.org/10.1109/CSCWD.2016.7565984
A lightweight solution to version incompatibility in service-oriented revision control systems
Almalki, Jameel and Shen, Haifeng. (2015) A lightweight solution to version incompatibility in service-oriented revision control systems. ASWEC 2015: 24th Australasian Software Engineering Conference, Adelaide, SA, Australia. United States of America: Association for Computing Machinery (ACM). pp. 59 - 63 https://doi.org/10.1145/2811681.2811693
An efficient and reliable geographic routing protocol based on partial network coding for underwater sensor networks
Hao, Kun, Jin, Zhigang, Shen, Haifeng and Wang, Ying. (2015) An efficient and reliable geographic routing protocol based on partial network coding for underwater sensor networks. Sensors. 15(6), pp. 12720 - 12735. https://doi.org/10.3390/s150612720
Designing a mobile digital backchannel system for monitoring sentiments and emotions in large lectures
Jiranantanagorn, Peerumporn, Bhardwaj, Parveen, Li, Ruilun, Shen, Haifeng, Goodwin, Robert and Teoh, Kung-Keat. (2015) Designing a mobile digital backchannel system for monitoring sentiments and emotions in large lectures. In F.-Ch. Kuo, S. Marshall and H. Shen, M. Stumptner and M. Ali Babar (Ed.). ASWEC 2015: 24th Australasian Software Engineering Conference, Adelaide, SA, Australia. United States of America: Association for Computing Machinery (ACM). pp. 141 - 144 https://doi.org/10.1145/2811681.2824994
Extending attention span of ADHD Children through an eye tracker directed adaptive user interface
Asiry, Othman, Shen, Haifeng and Calder, Paul. (2015) Extending attention span of ADHD Children through an eye tracker directed adaptive user interface. In F.-Ch. Kuo, S. Marshall and H. Shen, M. Stumptner and M. Ali Babar (Ed.). ASWEC 2015: 24th Australasian Software Engineering Conference, Adelaide, SA, Australia. United States of America: Association for Computing Machinery (ACM). pp. 149 - 152 https://doi.org/10.1145/2811681.2824997
Sustaining cognitive diversity in collaborative learning through shared spatially separated virtual workspaces on mobile devices
Reilly, Mark, Shen, Haifeng, Calder, Paul and Duh, Henry. (2015) Sustaining cognitive diversity in collaborative learning through shared spatially separated virtual workspaces on mobile devices. In In Wyeld, T., Calder, P. and Shen, H. (Ed.). Computer-human interaction: Cognitive effects of spatial interaction, learning, and ability pp. 171 - 193 Springer International Publishing. https://doi.org/10.1007/978-3-319-16940-8_9
Towards a collaborative classroom through shared workspaces on mobile devices
Reilly, Mark, Shen, Haifeng, Calder, Paul and Duh, Henry. (2014) Towards a collaborative classroom through shared workspaces on mobile devices. BCS-HCI '14: The 28th International BCS Human Computer Interaction Conference on HCI 2014 - Sand, Sea and Sky - Holiday HCI. United Kingdom: Electronic Workshops in Computing. pp. 335 - 340
Stimulating high quality social media through knowledge barter-auctioning
Ji, Qijin, Shen, Haifeng, Mao, Yuqing and Zhu, Yanqing. (2014) Stimulating high quality social media through knowledge barter-auctioning. SocialCom '14: The 2014 International Conference on Social Computing, August 04-07 2014, Beijing, China. United States of America: Association for Computing Machinery (ACM). pp. 4 - 11 https://doi.org/10.1145/2639968.2640068
SORC: Service-oriented distributed revision control for collaborative web programming
Bin Sarib, Ahmad Sholehin and Shen, Haifeng. (2014) SORC: Service-oriented distributed revision control for collaborative web programming. 2014 IEEE 18th International Conference on Computer Supported Cooperative Work in Design (CSCWD 2014), Hsinchu, Taiwan, 21-23 May 2014. United States of America: IEEE Computer Society. pp. 638 - 643 https://doi.org/10.1109/CSCWD.2014.6846919