Associate Professor Gustavo Batista
Lecturer

Associate Professor Gustavo Batista

  • 2016. Habilitation. Computer Science. University of São Paulo at São Carlos, Brazil.
  • 2003. Doctor of Philosophy (PhD). Computer Science. University of São Paulo at São Carlos, Brazil.
  • 1997. Master in Science (MSc). Computer Science    University of São Paulo at São Carlos, Brazil.
  • 1994. Bachelor (BS). Computer Science. São Paulo State University, Brazil.
Engineering
Computer Science and Engineering

I joined UNSW as an associate professor in 2018, after working for more than ten years for the University of Sao Paulo (USP). During 2010-2012, I was a visiting researcher at the University of California, Riverside (UCR) working in the prof. Eamonn Keogh's laboratory. 

During my stay at UCR, I continued my work with time series analysis, particularly developing methods for classification and clustering of time-oriented data. In conjunction with Dr Keogh, I proposed the first time series distance invariant to complexity and speed-up techniques to compare massive amounts of time series data under warping. 

More recently, I have worked with data streams, particularly with classification with label latency and proposed efficient unsupervised methods to detect concept drifts as well as to learn in the presence of these changes in the data distribution. 

My research is motivated by applying Machine Learning in practice. My approach is to work on challenging applications that help my students and me to identify gaps in the literature or assumptions in the state-of-the-art that do not hold for our applications. This research approach often leads to contributions both in Computer Science as well as the application areas.

One instance of such an approach is the challenge of incorporating classification algorithms on embedded devices. For example, I have developed lightweight models that can run in environments with severe power restrictions such as satellites and sensors. One notorious application is the development of sensors to classify insects in flight automatically, allowing the creation of surveillance systems for disease vectors, invasive species and pests. I have also developed EmbML, a Machine Learning tool to convert sickit-learn and Weka classifiers into C++ code crafted to run into low-power microcontrollers, such as ones found in the Arduino family.

In the last years, I have actively worked in the area of Machine Learning Quantification, developing new algorithms to count events accurately. These recent developments have led to the proposal of a novel Data Mining task known as One-class Quantification as well as a family of efficient quantification algorithms. 

The impact of my research can be measured by the number of recent papers citing my research articles. According to Google Scholar, my paper have more 
than 9,000 citations, with more than 1,000 citations in 2020.

Phone
‭+61-2-9385 1607‬
Location
Room 510L, Building J17 School of Computer Science and Engineering University of New South Wales NSW 2052
  • Book Chapters | 2022
    2022, 'Geographic Context-Based Stacking Learning for Election Prediction from Socio-economic Data', in Intelligent Systems, Springer International Publishing, pp. 641 - 656, http://dx.doi.org/10.1007/978-3-031-21686-2_44
    Book Chapters | 2021
    2021, 'TECNOLOGIA NO MONITORAMENTO AMBIENTAL DE MOSQUITOS TRANSMISSORES DE DOENÇAS: QUAIS SÃO OS DESAFIOS? UMA BREVE REVISÃO', in INDICADORES BIOLÓGICOS DE QUALIDADE EM AMBIENTES AQUÁTICOS CONTINENTAIS: MÉTRICAS E RECORTES PARA ANÁLISES, RFB Editora, http://dx.doi.org/10.46898/rfb.9786558891321.8
    Book Chapters | 2019
    2019, 'One-Class Quantification', in Machine Learning and Knowledge Discovery in Databases, Springer Nature, pp. 273 - 289, http://dx.doi.org/10.1007/978-3-030-10925-7_17
    Book Chapters | 2014
    2014, 'Time series classification with motifs and characteristics', in Soft Computing for Business Intelligence, Springer, Berlin, Heidelberg, pp. 125 - 138
  • Journal articles | 2023
    2023, 'Dynamic Inference from IoT Traffic Flows under Concept Drifts in Residential ISP Networks', IEEE Internet of Things Journal, pp. 1 - 1, http://dx.doi.org/10.1109/JIOT.2023.3265012
    Journal articles | 2022
    2022, 'An Open-Source Tool for Classification Models in Resource-Constrained Hardware', IEEE Sensors Journal, 22, pp. 544 - 554, http://dx.doi.org/10.1109/JSEN.2021.3128130
    Journal articles | 2022
    2022, 'Hierarchical classification of pollinating flying insects under changing environments', Ecological Informatics, 70, http://dx.doi.org/10.1016/j.ecoinf.2022.101751
    Journal articles | 2022
    2022, 'Time Series Prediction via Similarity Search: Exploring Invariances, Distance Measures and Ensemble Functions', IEEE Access, 10, pp. 78022 - 78043, http://dx.doi.org/10.1109/ACCESS.2022.3192849
    Journal articles | 2021
    2021, 'COVID-Safe Spatial Occupancy Monitoring Using OFDM-Based Features and Passive WiFi Samples', ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 12, http://dx.doi.org/10.1145/3472668
    Journal articles | 2021
    2021, 'Changes in the wing-beat frequency of bees and wasps depending on environmental conditions: a study with optical sensors', Apidologie, 52, pp. 731 - 748, http://dx.doi.org/10.1007/s13592-021-00860-y
    Journal articles | 2021
    2021, 'The impact of body size on Aedes [Stegomyia] aegypti wingbeat frequency: implications for mosquito identification', Medical and Veterinary Entomology, 35, pp. 617 - 624, http://dx.doi.org/10.1111/mve.12540
    Journal articles | 2020
    2020, 'Challenges in benchmarking stream learning algorithms with real-world data', Data Mining and Knowledge Discovery, 34, pp. 1805 - 1858, http://dx.doi.org/10.1007/s10618-020-00698-5
    Journal articles | 2020
    2020, 'Quantifying With Only Positive Training Data', arXiv preprint arXiv:2004.10356
    Journal articles | 2019
    2019, 'Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-art and the best conditions for the use of each model', Information Sciences, 484, pp. 302 - 337
    Journal articles | 2019
    2019, 'Fast similarity matrix profile for music analysis and exploration', IEEE Transactions on Multimedia, 21, pp. 29 - 38, http://dx.doi.org/10.1109/TMM.2018.2849563
    Journal articles | 2018
    2018, 'Combining instance selection and self-training to improve data stream quantification', Journal of the Brazilian Computer Society, 24, pp. 12 - 12, http://dx.doi.org/10.1186/s13173-018-0076-0
    Journal articles | 2018
    2018, 'Speeding up similarity search under dynamic time warping by pruning unpromising alignments', Data Mining and Knowledge Discovery, 32, pp. 988 - 1016
    Journal articles | 2017
    2017, 'Unsupervised active learning techniques for labeling training sets: An experimental evaluation on sequential data', Intelligent Data Analysis, 21, pp. 1061 - 1095
    Journal articles | 2015
    Silva DF; Souza VMA; Ellis DPW; Keogh EJ; Batista GEAPA, 2015, 'Exploring low cost laser sensors to identify flying insect species', Journal of Intelligent & Robotic Systems, 80, pp. 313 - 330
    Journal articles | 2015
    2015, 'Class imbalance revisited: a new experimental setup to assess the performance of treatment methods', Knowledge and Information Systems, 45, pp. 247 - 270
    Journal articles | 2015
    2015, 'ENIAC 2013 Special Issue', Journal of Intelligent and Robotic Systems: Theory and Applications, 80, pp. 225 - 226, http://dx.doi.org/10.1007/s10846-015-0260-9
    Journal articles | 2015
    2015, 'Exploring Low Cost Laser Sensors to Identify Flying Insect Species: Evaluation of Machine Learning and Signal Processing Methods', Journal of Intelligent and Robotic Systems: Theory and Applications, 80, pp. 313 - 330, http://dx.doi.org/10.1007/s10846-014-0168-9
    Journal articles | 2014
    2014, 'CID: an efficient complexity-invariant distance for time series', Data Mining and Knowledge Discovery, 28, pp. 634 - 669
    Journal articles | 2014
    2014, 'Coping with highly imbalanced datasets: A case study with definition extraction in a multilingual setting', Natural Language Engineering, 20, pp. 327 - 359
    Journal articles | 2014
    2014, 'Flying Insect Classification with Inexpensive Sensors', Journal of Insect Behavior, 27, pp. 657 - 677, http://dx.doi.org/10.1007/s10905-014-9454-4
    Journal articles | 2014
    2014, 'Flying insect classification with inexpensive sensors', Journal of insect behavior, 27, pp. 657 - 677
    Journal articles | 2014
    2014, 'Flying insect detection and classification with inexpensive sensors', JoVE (Journal of Visualized Experiments), pp. e52111 - e52111
    Journal articles | 2014
    2014, 'ICMC-USP time series prediction repository', Instituto de Ciências Matemáticas e de Computaçao, Universidade de Sao Paulo, Sao Carlos, Brasil. URL https://goo. gl/uzxGZJ
    Journal articles | 2013
    2013, 'A comparative study between MFCC and LSF coefficients in automatic recognition of isolated digits pronounced in Portuguese and English', Acta Scientiarum. Technology, 35, pp. 621 - 628
    Journal articles | 2013
    2013, 'Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping.', ACM Trans Knowl Discov Data, 7, https://www.ncbi.nlm.nih.gov/pubmed/31607834
    Journal articles | 2013
    2013, 'Addressing big data time series: Mining trillions of time series subsequences under dynamic time warping', ACM Transactions on Knowledge Discovery from Data (TKDD), 7, pp. 1 - 31
    Journal articles | 2012
    2012, 'A complexity-invariant measure based on fractal dimension for time series classification', International Journal of Natural Computing Research (IJNCR), 3, pp. 59 - 73
    Journal articles | 2011
    2011, 'A hybrid approach to learn with imbalanced classes using evolutionary algorithms', Logic Journal of IGPL, 19, pp. 293 - 293
    Journal articles | 2011
    2011, 'A survey on graphical methods for classification predictive performance evaluation', IEEE Transactions on Knowledge and Data Engineering, 23, pp. 1601 - 1618, http://dx.doi.org/10.1109/TKDE.2011.59
    Journal articles | 2010
    2010, 'A survey on graphical methods for classification predictive performance evaluation', Knowledge and Data Engineering, IEEE Transactions on, pp. 1 - 1
    Journal articles | 2008
    2008, 'Curvas ROC para avaliaç ao de classificadores', Revista IEEE América Latina, 6, pp. 215 - 222
    Journal articles | 2008
    2008, 'Evaluating classifiers using ROC curves', IEEE Latin America Transactions, 6, pp. 215 - 222, http://dx.doi.org/10.1109/TLA.2008.4609920
    Journal articles | 2006
    2006, 'A Comparison of Methods for Rule Subset Selection Applied to Associative Classification.', Inteligencia artificial: Revista Iberoamericana de Inteligencia Artificial, 10, pp. 29 - 35
    Journal articles | 2005
    2005, 'Balancing strategies and class overlapping', Advances in Intelligent Data Analysis VI, pp. 741 - 741
    Journal articles | 2004
    2004, 'A study of the behavior of several methods for balancing machine learning training data', ACM SIGKDD Explorations Newsletter, 6, pp. 20 - 29
    Journal articles | 2004
    2004, 'Applying genetic and symbolic learning algorithms to extract rules from artificial neural networks', MICAI 2004: Advances in Artificial Intelligence, pp. 833 - 843
    Journal articles | 2004
    2004, 'Class imbalances versus class overlapping: an analysis of a learning system behavior', MICAI 2004: Advances in Artificial Intelligence, pp. 312 - 321
    Journal articles | 2003
    2003, 'An analysis of four missing data treatment methods for supervised learning', Applied Artificial Intelligence, 17, pp. 519 - 533
    Journal articles | 2003
    2003, 'Descriç ao da arquitetura e do projeto do ambiente computacional DISCOVER LEARNING ENVIRONMENT—DLE', Relatório Técnico do ICMC/USP
    Journal articles | 2003
    2003, 'Experimental comparison of K-nearest neighbour and mean or mode imputation methods with the internal strategies used by C4. 5 and CN2 to treat missing data', University of Sao Paulo
    Journal articles | 2002
    2002, 'A Study of K-Nearest Neighbour as an Imputation Method.', HIS, 87, pp. 48 - 48
    Journal articles | 2002
    2002, 'K-Nearest Neighbour as Imputation Method: Experimental Results', Technical report, ICMC-USP
    Journal articles | 2002
    2002, 'Learning with Skewed Class Distributions', Advances in Logic, Artificial Intelligence, and Robotics: LAPTEC 2002, 85, pp. 173 - 173
    Journal articles | 2000
    2000, 'Applying one-sided selection to unbalanced datasets', MICAI 2000: Advances in Artificial Intelligence, pp. 315 - 325
    Journal articles | 1997
    1997, 'Um ambiente de avaliaçao de algoritmos de aprendizado de máquina utilizando exemplos', Dissertaç ao de Mestrado, ICMC-USP
  • Working Papers | 2022
    2022, AdIoTack: Quantifying and Refining Resilience of Decision Tree Ensemble Inference Models against Adversarial Volumetric Attacks on IoT Networks, arXiv, http://dx.doi.org, https://arxiv.org/pdf/2203.09792.pdf
  • Conference Papers | 2022
    2022, 'Classifying Time-Series of IoT Flow Activity using Deep Learning and Intransitive Features', in International Conference on Software, Knowledge Information, Industrial Management and Applications, SKIMA, pp. 192 - 197, http://dx.doi.org/10.1109/SKIMA57145.2022.10029420
    Conference Papers | 2022
    2022, 'Update Compression for Deep Neural Networks on the Edge', in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, IEEE, pp. 3075 - 3085, presented at 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 19 June 2022 - 20 June 2022, http://dx.doi.org/10.1109/CVPRW56347.2022.00347
    Conference Papers | 2021
    2021, 'A Graph-Based Spatial Cross-Validation Approach for Assessing Models Learned with Selected Features to Understand Election Results', in Proceedings - 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021, IEEE, pp. 909 - 915, presented at 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), 13 December 2021 - 16 December 2021, http://dx.doi.org/10.1109/ICMLA52953.2021.00150
    Conference Papers | 2021
    2021, 'Accurately Quantifying under Score Variability', in Proceedings - IEEE International Conference on Data Mining, ICDM, IEEE, pp. 1228 - 1233, presented at 2021 IEEE International Conference on Data Mining (ICDM), 07 December 2021 - 10 December 2021, http://dx.doi.org/10.1109/ICDM51629.2021.00149
    Conference Papers | 2021
    2021, 'Passive WiFi CSI Sensing Based Machine Learning Framework for COVID-Safe Occupancy Monitoring', in 2021 IEEE International Conference on Communications Workshops, ICC Workshops 2021 - Proceedings, IEEE, presented at 2021 IEEE International Conference on Communications Workshops (ICC Workshops), 14 June 2021 - 23 June 2021, http://dx.doi.org/10.1109/ICCWorkshops50388.2021.9473673
    Conference Papers | 2021
    2021, 'Pitfalls in Quantification Assessment', in CEUR Workshop Proceedings
    Preprints | 2021
    2021, An Open-Source Tool for Classification Models in Resource-Constrained Hardware, , http://dx.doi.org/10.48550/arxiv.2105.05983
    Conference Papers | 2020
    2020, 'Accurately quantifying a billion instances per second', in Proceedings - 2020 IEEE 7th International Conference on Data Science and Advanced Analytics, DSAA 2020, pp. 1 - 10, http://dx.doi.org/10.1109/DSAA49011.2020.00012
    Conference Papers | 2020
    2020, 'Algorithm recommendation for data streams', in Proceedings - International Conference on Pattern Recognition, IEEE, pp. 6073 - 6080, presented at 2020 25th International Conference on Pattern Recognition (ICPR), 10 January 2021 - 15 January 2021, http://dx.doi.org/10.1109/ICPR48806.2021.9411923
    Conference Papers | 2020
    2020, 'Brazilian Presidential Elections: Analysing Voting Patterns in Time and Space Using a Simple Data Science Pipeline', in Anais do Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2020), Sociedade Brasileira de Computação, presented at Symposium on Knowledge Discovery, Mining and Learning, http://dx.doi.org/10.5753/kdmile.2020.11979
    Conference Papers | 2020
    2020, 'Melhorando a Acurácia da Detecção de Lavagem de Dinheiro na Rede Bitcoin', in Anais XXXVIII Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC 2020), Sociedade Brasileira de Computação, presented at Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos, http://dx.doi.org/10.5753/sbrc.2020.12321
    Conference Papers | 2020
    2020, 'The Importance of the Test Set Size in Quantification Assessment', in IJCAI, IJCAI, YOKOHAMA, pp. 2640 - 2646, presented at Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence Main track, YOKOHAMA, http://dx.doi.org/10.24963/ijcai.2020/366
    Preprints | 2020
    2020, Challenges in Benchmarking Stream Learning Algorithms with Real-world Data, , http://dx.doi.org/10.48550/arxiv.2005.00113
    Preprints | 2020
    2020, Quantifying With Only Positive Training Data, , http://dx.doi.org/10.48550/arxiv.2004.10356
    Conference Papers | 2019
    2019, 'DyS: a Framework for Mixture Models in Quantification', in Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)
    Conference Papers | 2019
    2019, 'EmbML Tool: Supporting the use of supervised learning algorithms in low-cost embedded systems', in Proceedings - International Conference on Tools with Artificial Intelligence, ICTAI, pp. 1633 - 1637, http://dx.doi.org/10.1109/ICTAI.2019.00238
    Conference Papers | 2018
    2018, 'A Fuzzy Classifier for Data Streams with Infinitely Delayed Labels', in 23rd Iberoamerican Congress on Pattern Recognition (CIARP)
    Conference Papers | 2018
    2018, 'Classifying and counting with recurrent contexts', in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1983 - 1992
    Conference Papers | 2018
    2018, 'Elastic time series motifs and discords', in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, pp. 237 - 242, IEEE
    Conference Papers | 2018
    2018, 'Evaluating stream classifiers with delayed labels information', in Proceedings - 2018 Brazilian Conference on Intelligent Systems, BRACIS 2018, pp. 408 - 413, http://dx.doi.org/10.1109/BRACIS.2018.00077
    Conference Papers | 2018
    2018, 'Large-Scale Similarity-Based Time Series Mining', in Anais do Concurso de Teses e Dissertações da SBC (CTD-SBC), Sociedade Brasileira de Computação - SBC, presented at XXXI Concurso de Teses e Dissertações da SBC, http://dx.doi.org/10.5753/ctd.2018.3656
    Conference Papers | 2018
    2018, 'On the Need of Class Ratio Insensitive Drift Tests for Data Streams', in Second International Workshop on Learning with Imbalanced Domains: Theory and Applications, pp. 110 - 124
    Conference Papers | 2018
    2018, 'One-class quantification', in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, Dublin, Ireland, pp. 273 - 289, presented at ECML PKDD 2018, Dublin, Ireland, 10 September 2018 - 14 September 2018, http://dx.doi.org/10.1007/978-3-030-10925-7
    Conference Papers | 2018
    2018, 'Towards Hierarchical Classification of Data Streams', in 23rd Iberoamerican Congress on Pattern Recognition (CIARP), pp. 314 - 322
    Conference Papers | 2018
    2018, 'Unsupervised context switch for classification tasks on data streams with recurrent concepts', in Proceedings of the 33rd Annual ACM Symposium on Applied Computing, pp. 518 - 524
    Conference Papers | 2017
    2017, 'Quantification in data streams: Initial results', in 2017 Brazilian Conference on Intelligent Systems (BRACIS), IEEE, pp. 43 - 48, IEEE
    Conference Papers | 2016
    2016, 'Constrained Local and Global Consistency for semi-supervised learning', in 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE, pp. 1689 - 1694, IEEE
    Conference Papers | 2016
    2016, 'Fast unsupervised online drift detection using incremental kolmogorov-smirnov test', in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1545 - 1554
    Conference Papers | 2016
    2016, 'Improved Time Series Classification with Representation Diversity and SVM', in 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, presented at 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 18 December 2016 - 20 December 2016, http://dx.doi.org/10.1109/icmla.2016.0010
    Conference Papers | 2016
    2016, 'Improved time series classification with representation diversity and svm', in 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, pp. 1 - 6, IEEE
    Conference Papers | 2016
    2016, 'On the effect of endpoints on dynamic time warping', in SIGKDD Workshop on Mining and Learning from Time Series II, San Francisco, CA. Association for Computing Machinery-ACM
    Conference Papers | 2016
    2016, 'Prefix and Suffix Invariant Dynamic Time Warping', in 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, presented at 2016 IEEE 16th International Conference on Data Mining (ICDM), 12 December 2016 - 15 December 2016, http://dx.doi.org/10.1109/icdm.2016.0161
    Conference Papers | 2016
    2016, 'Prefix and suffix invariant dynamic time warping', in 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, pp. 1209 - 1214, IEEE
    Conference Papers | 2016
    2016, 'SiMPle: Assessing music similarity using subsequences joins', in Proceedings of the 17th International Society for Music Information Retrieval Conference, ISMIR 2016, pp. 23 - 29
    Conference Papers | 2016
    2016, 'Speeding up all-pairwise dynamic time warping matrix calculation', in Proceedings of the 2016 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, pp. 837 - 845, Society for Industrial and Applied Mathematics
    Conference Papers | 2015
    2015, 'A study of the use of complexity measures in the similarity search process adopted by knn algorithm for time series prediction', in 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), IEEE, pp. 45 - 51, IEEE
    Conference Papers | 2015
    2015, 'An experimental analysis on time series transductive classification on graphs', in 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1 - 8, IEEE
    Conference Papers | 2015
    2015, 'Automatic classification of drum sounds with indefinite pitch', in 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1 - 8, IEEE
    Conference Papers | 2015
    2015, 'Classification of Evolving Data Streams with Infinitely Delayed Labels', in IEEE International Conference on Machine Learning & Applications (ICMLA), pp. 214 - 219
    Conference Papers | 2015
    2015, 'Data Stream Classification Guided by Clustering on Nonstationary Environments and Extreme Verification Latency', in SIAM International Conference on Data Mining (SDM), pp. 873 - 881
    Conference Papers | 2015
    2015, 'Effective insect recognition using a stacked autoencoder with maximum correntropy criterion', in 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1 - 7, IEEE
    Conference Papers | 2015
    2015, 'Igmm-cd: a gaussian mixture classification algorithm for data streams with concept drifts', in 2015 Brazilian Conference on Intelligent Systems (BRACIS), IEEE, pp. 55 - 61, IEEE
    Conference Papers | 2015
    2015, 'Music Shapelets for Fast Cover Song Recognition.', in ISMIR, pp. 441 - 447
    Conference Papers | 2015
    2015, 'Robust multi-class graph transduction with higher order regularization', in 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1 - 8, IEEE
    Conference Papers | 2015
    2015, 'Time series classification with representation ensembles', in International Symposium on Intelligent Data Analysis, Springer, Cham, pp. 108 - 119, Springer, Cham
    Other | 2015
    2015, Nonstationary environments-archive,
    Other | 2015
    2015, The ucr time series classification archive,
    Conference Papers | 2014
    2014, 'Adding diversity to rank examples in anytime nearest neighbor classification', in 2014 13th International Conference on Machine Learning and Applications, IEEE, pp. 129 - 134, IEEE
    Conference Papers | 2014
    2014, 'Extracting texture features for time series classification', in 2014 22nd International Conference on Pattern Recognition, IEEE, pp. 1425 - 1430, IEEE
    Conference Papers | 2014
    2014, 'Music Classification by Transductive Learning Using Bipartite Heterogeneous Networks', in International Society of Music Information Retrieval Conference (ISMIR)
    Conference Papers | 2014
    2014, 'Time series transductive classification on imbalanced data sets: an experimental study', in 2014 22nd International Conference on Pattern Recognition, IEEE, pp. 3780 - 3785, IEEE
    Preprints | 2014
    2014, Flying Insect Classification with Inexpensive Sensors, , http://dx.doi.org/10.48550/arxiv.1403.2654
    Conference Papers | 2013
    2013, 'A comparative study of algorithms for recommending given names', in 2013 2nd International Conference on Informatics and Applications, ICIA 2013, pp. 66 - 71, http://dx.doi.org/10.1109/ICoIA.2013.6650231
    Conference Papers | 2013
    2013, 'A video compression-based approach to measure music structural similarity', in International Society for Music Information Retrieval Conference, pp. 95 - 10
    Conference Papers | 2013
    2013, 'An empirical comparison of dissimilarity measures for time series classification', in 2013 Brazilian Conference on Intelligent Systems, IEEE, pp. 82 - 88, IEEE
    Conference Papers | 2013
    2013, 'Applying machine learning and audio analysis techniques to insect recognition in intelligent traps', in 2013 12th International Conference on Machine Learning and Applications, IEEE, pp. 99 - 104, IEEE
    Conference Papers | 2013
    2013, 'Classification of data streams applied to insect recognition: Initial results', in 2013 Brazilian Conference on Intelligent Systems, IEEE, pp. 76 - 81, IEEE
    Conference Papers | 2013
    2013, 'DTW-D: time series semi-supervised learning from a single example', in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 383 - 391
    Conference Papers | 2013
    2013, 'Data mining a trillion time series subsequences under dynamic time warping', in Twenty-Third International Joint Conference on Artificial Intelligence
    Conference Papers | 2013
    2013, 'Improving the recommendation of given names by using contextual information', in CEUR Workshop Proceedings, pp. 61 - 72
    Conference Papers | 2013
    2013, 'Influence of graph construction on semi-supervised learning', in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, Berlin, Heidelberg, pp. 160 - 175, Springer, Berlin, Heidelberg
    Conference Papers | 2013
    2013, 'Time Series Classification using Motifs and Characteristics Extraction: A Case Study on ECG Databases', in Procedings of the Fourth International Workshop on Knowledge Discovery, Knowledge Management and Decision Support, Atlantis Press, presented at Fourth International Workshop on Knowledge Discovery, Knowledge Management and Decision Support, 06 November 2013 - 08 November 2013, http://dx.doi.org/10.2991/.2013.40
    Conference Papers | 2013
    2013, 'Time series classification using compression distance of recurrence plots', in 2013 IEEE 13th International Conference on Data Mining, IEEE, pp. 687 - 696, IEEE
    Conference Papers | 2012
    2012, 'A novel approximation to dynamic time warping allows anytime clustering of massive time series datasets', in SIAM International Conference on Data Mining, pp. 999 - 1010
    Conference Papers | 2012
    2012, 'An experimental design to evaluate class imbalance treatment methods', in 2012 11th International Conference on Machine Learning and Applications, IEEE, pp. 95 - 101, IEEE
    Conference Papers | 2012
    2012, 'Searching and mining trillions of time series subsequences under dynamic time warping', in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 262 - 270
    Conference Papers | 2012
    2012, 'Spoken digit recognition in portuguese using line spectral frequencies', in Ibero-American Conference on Artificial Intelligence, Springer, Berlin, Heidelberg, pp. 241 - 250, Springer, Berlin, Heidelberg
    Conference Papers | 2011
    2011, 'A Complexity-Invariant Distance Measure for Time Series', in SDM-2011: Proceedings of SIAM International Conference on Data Mining
    Conference Papers | 2011
    2011, 'SIGKDD demo: sensors and software to allow computational entomology, an emerging application of data mining', in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp. 761 - 764, ACM
    Conference Papers | 2011
    2011, 'Towards automatic classification on flying insects using inexpensive sensors', in 2011 10th International Conference on Machine Learning and Applications and Workshops, IEEE, pp. 364 - 369, IEEE
    Conference Papers | 2010
    2010, 'Classification of Live Moths Combining Texture, Color and Shape Primitives', in Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on, IEEE, pp. 903 - 906, IEEE
    Conference Papers | 2010
    2010, 'Discovering Knowledge Rules with Multi-Objective Evolutionary Computing', in Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on, IEEE, pp. 119 - 124, IEEE
    Conference Papers | 2009
    2009, 'Data mining with imbalanced class distributions: concepts and methods.', in IICAI, pp. 359 - 376
    Conference Papers | 2009
    2009, 'How k-Nearest Neighbor Parameters Affect its Performance', in X Argentine Symposium on Artificial Intelligence
    Conference Papers | 2008
    2008, 'A study with class imbalance and random sampling for a decision tree learning system', in IFIP International Conference on Artificial Intelligence in Theory and Practice, Springer, Boston, MA, pp. 131 - 140, Springer, Boston, MA
    Conference Papers | 2008
    2008, 'Evaluating Ranking Composition Methods for Multi-Objective Optimization of Knowledge Rules', in Hybrid Intelligent Systems, 2008. HIS’08. Eighth International Conference on, IEEE, pp. 537 - 542, IEEE
    Conference Papers | 2008
    2008, 'Missing value imputation using a semi-supervised rank aggregation approach', in Brazilian Symposium on Artificial Intelligence, Springer, Berlin, Heidelberg, pp. 217 - 226, Springer, Berlin, Heidelberg
    Conference Papers | 2005
    2005, 'Balancing strategies and class overlapping', in Famili AF; Kok JN; Pena JM; Siebes A; Feelders A (eds.), Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), SPRINGER-VERLAG BERLIN, Madrid, SPAIN, pp. 24 - 35, presented at 6th International Symposium on Intelligent Data Analysis, Madrid, SPAIN, 08 September 2005 - 10 September 2005, http://dx.doi.org/10.1007/11552253_3
    Conference Papers | 2005
    2005, 'Multi-view semi-supervised learning: An approach to obtain different views from text datasets', in Proceeding of the 2005 conference on Advances in Logic Based Intelligent Systems: Selected Papers of LAPTEC 2005, IOS Press, pp. 97 - 104, IOS Press
    Conference Papers | 2004
    2004, 'Improving rule induction precision for automated annotation by balancing skewed data sets', in International Symposium on Knowledge Exploration in Life Science Informatics, Springer, Berlin, Heidelberg, pp. 20 - 32, Springer, Berlin, Heidelberg
    Conference Papers | 2004
    2004, 'Learning with class skews and small disjuncts', in Brazilian Symposium on Artificial Intelligence, Springer, Berlin, Heidelberg, pp. 296 - 306, Springer, Berlin, Heidelberg
    Conference Papers | 2003
    2003, 'Balancing training data for automated annotation of keywords: a case study', in Proceedings of the Second Brazilian Workshop on Bioinformatics, pp. 35 - 43
    Theses / Dissertations | 2003
    2003, Pré-processamento de dados em aprendizado de máquinas supervisionado., Tese (Doutorado)-Instituto de Ciências Matemáticas e de Computaç ao …
    Conference Papers | 2002
    2002, 'Splice junction recognition using machine learning techniques', in Proceedings of the First Brazilian Workshop on Bioinformatics, Citeseer, pp. 32 - 39, Citeseer
    Conference Papers | 2002
    2002, 'The influence of noisy patterns on the performance of learning methods in the splice junction recognition problem', in Neural Networks, 2002. SBRN 2002. Proceedings. VII Brazilian Symposium on, IEEE, pp. 31 - 36, IEEE
    Conference Papers | 2001
    2001, 'A study of K-nearest neighbour as a model-based method to treat missing data', in Argentine Symposium on Artificial Intelligence
    Conference Papers | 2000
    2000, 'A computational environment for extracting rules from databases', in Ebecken N; Brebbia CA (ed.), Management Information Systems, WIT PRESS, CAMBRIDGE UNIV, CAMBRIDGE, ENGLAND, pp. 321 - 330, presented at 2nd International Conference on Data Mining, CAMBRIDGE UNIV, CAMBRIDGE, ENGLAND, 05 July 2000 - 07 July 2000, http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000166319000032&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=891bb5ab6ba270e68a29b250adbe88d1

Grant funding as principal investigator

  • 2017 – 2019: FAPESP e-Science Research Grant. Intelligent Traps and Sensors: an Innovative Approach to Control Insect Pests and Disease Vectors. $55,000.
  • 2016 – 2019: USAID Combating Zika and Future Threats Grand Challenge. An Intelligent Trap and Mobile Application to Motivate Local Mosquito Control Activities. $500,000.
  • 2017 – 2019: CNPq Research Fellow. Novel Approaches in Machine Learning Applied to Automatic Insect Recognition. $25,000.
  • 2015 – 2016: Google LA Research Award. Controlling Dengue Fever Mosquitoes using Intelligent Sensors and Traps. $24,000.
  • 2012 – 2014: FAPESP Research Grant. Complexity-invariance for Classification, Clustering and Motif Discovery in Time Series. $30,000.
  • 2013 – 2015: FAPESP-CALDO International Cooperation Grant. Research on Geospatial Marine Biology Data Mining using Time Series, Text Mining and Visualization (with Stan Matwin co-PI for NSERC). $20,000.
  • 2013 – 2015: FAPESP-CNPq Research Grant. Intelligent Sensors for Controlling Agricultural Pests and Disease-vector Insects. $55,000.
  • 2014 – 2017: CNPq Universal Research Grant. Real-time Monitoring of Insect Pests in Agriculture and the Environment. $25,000.
  • 2014 – 2017: FAPESP New Frontiers Grant. Time Series Classification Algorithms Applied to Embedded Systems. $30,000.
  • 2007 – 2009: FAPESP Research Grant. Machine Learning and Class Imbalance. $10,000.
     

  • 2020. Best Research Paper Award. IEEE International Conference on Data Science and Advanced Analytics (IEEE-DSAA).
  • 2017 – 2020. Research Fellow, level 2. National Council for Scientific and Technological Development, CNPq.
  • 2014 – 2017. Research Fellow, level 2. National Council for Scientific and Technological Development, CNPq.
  • 2015 – 2016. Google Research Award in Latin America. Google Inc.
  • 2012. Best Research Paper Award. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (ACM-KDD).

I have worked in Machine Learning during my entire career. My main contributions to the field are the following:

Quantification: I have developed counting algorithms that are robust to changes in data distributions that occur in real-world applications. The algorithms developed by my research group, such as the ones of the DyS family are among the most accurate ones. We recently developed an ultra-fast counting algorithm which performs similarly to the state-of-the-art. This algorithm received the Best Research Paper Award at DSAA-2020.

Time Series Mining: I have created algorithms to classify and cluster time-oriented data under different invariances such as warping. Such developments lead to the UCR suite, a framework for time series matching under warping that received the KDD Best Research Award in 2012. More recently, we further improved the search speed of the UCR suite, creating the UCR-USP suite. I also proposed the first time series distance invariant to complexity.

Class imbalance: My initial research involved the development and assessment of methods to deal with imbalanced class data. My research focused on discussing the challenges of learning with imbalanced data, including the scenarios in which skewed distributions would impose difficulties for classifiers. My articles figure among the most cited in the topic, including the ACM SIGKDD paper of 2004 with more than 2,500 citations.

Missing data imputation: During my PhD, I worked with data preprocessing techniques, including missing data imputation methods. I developed and demonstrated the use of k-nearest neighbour (k-NN) as a flexible technique for missing data imputation and demonstrated its efficacy comparing to other techniques in the state-of-the-art. k-NN is currently one of the most used imputation algorithms due to its simple implementation, ability to deal with missing data in multiple attributes and capacity to work with continuous and discrete features.

My Research Supervision

  • Tiago Pinho da Silva, PhD student: Election Forensics: Detecting Irregularities in Electoral DataUnder Spatial Non-Stationarity.
  • Antonio Parmezan, PhD student: Hierarchical Classification of Data Streams.