Select Page
Share

Tool allows data preparation up to 16 times faster, reducing time and cost needed to carry out the task

NEC Corporation recently launched a new data analytics library called FireDucks (*1). The main feature of this new bookstore is its processing speed when compared, for example, to the most used bookstore in the world, “pandas”. Capable of performing the data preparation necessary for data analysis up to 16 times faster than existing libraries/software (*2), this tool significantly reduces the time spent on the data analysis process and reduces computational costs. The beta version of FireDucks is now available for free online (https://fireducks-dev.github.io/).

In recent years, it has become easier than ever to collect large volumes of data, including sales information obtained through card payment terminals (POS), e-commerce, e-commerce and financial transaction data. To extract valuable analytical results from this data, there is a growing need for data scientists to perform their analysis using artificial intelligence (AI) and machine learning (ML).

However, to prepare for data analysis, large data sets need to be preprocessed. It is estimated that data scientists spend approximately 45% (*3) of their time preparing data, and this has become a huge problem. Furthermore, the increase in data volume and the evolution of AI and ML have led to an increase in computational complexity. As a result, higher computing costs (e.g. cloud costs) and the consequent increase in energy consumption and CO2 emissions have also become problematic.

Given this, NEC set out to develop FireDucks, a software designed to speed up the data manipulation process, and fully compatible with the “pandas” library, not requiring the refactoring of the code. To develop this software NEC leveraged the high-performance programming technology and acceleration expertise it cultivated from more than thirty years of experience developing supercomputers.

By making the beta version of FireDucks available free of charge to the general public, NEC hopes to help reduce data analysis scientists' work hours in solving environmental issues through energy conservation and reducing CO2 emissions.

Resources

  • Accelerated performance

To use FireDucks, simply replace the “pandas” library in our python code with FireDucks. This way the same data manipulation code will be accelerated by up to 16 times and, on average, about five times (*2). This reduces the total time data scientists spend working by approximately 30% (*4).

Parallel utilization of all cores and reduced computation are the main reasons for this level of acceleration. FireDucks utilizes each core of a multi-core CPU to efficiently process large data sets in parallel.

Furthermore, rather than running processes in the same order and range specified in the program, the data sets required to produce the results are identified upfront in the overall process, meaning that processing only needs to be performed for those data sets. This, in turn, makes it possible to speed up processing.

  • High compatibility

As mentioned, another feature of this software is its high compatibility with “pandas”, meaning that no changes to the code are necessary other than replacing the “pandas” library with FireDucks. Although some libraries are able to achieve faster processing speeds than “pandas”, they require several steps, including the need to rewrite the program. On the other hand, FireDucks can be easily applied as only one line of the program needs to be rewritten to perform analysis and coding in the same way as one would use pandas.

Real Results

The following results were obtained when FireDucks was used in real operations by Toyota Technical Development Corporation (*5) (TTDC).

  • 60% reduction in time spent analyzing data using an in-house AI framework (Spicy MINT)
  • 76% decrease in analysis PC operating time

An interview in which TTDC employees who used FireDucks spoke with members of the development team to provide feedback on the newly developed software can be seen at the following website https://www.nec.com/en/global/rd/technologies/202312/index.html.

Future plans

By making the beta version of FireDucks available for free and allowing data scientists to use it in practice, NEC will work to improve its functionality and verify its effectiveness, with the goal of commercializing it by fiscal year 2024.

Grades:

(*1) This software was developed with support from the New Energy and Industrial Technology Development Organization (NEDO) in Japan.

(*2) According to NEC test results based on benchmark TPCx-BB.

(*3) State of Data Science 2020 https://www.anaconda.com/resources/whitepapers/state-of-data-science-2020

(*4) Based on calculations carried out internally by NEC.

(*5) About Toyota Technical Development Corporation (TTDC): focused on building ideal environments for product development through comprehensive solutions driven by cutting-edge information and technology.

quick access

en_USEN