Share

In September, IBM, University of São Paulo (USP) and FAPESP (Foundation for Research Support of the State of São Paulo) celebrated a year since the beginning of the work of the Center for Research in Artificial Intelligence of Brazil (C4IA) , with advances in academic articles and cutting-edge research in AI to solve issues of great social and economic impact. In this first year of activities, the C4IA presents important advances in the Natural Language Processing (PLN), health and environment fronts, with researches related to the improvement of the PLN in Portuguese, works for the automatic characterization of cerebrovascular accidents (CVAs) and in the development of an interactive and intelligent base on the Brazilian coast, known as Amazônia Azul.

“We live in a global moment in which we need to implement scientific thinking in all layers of society. Initiatives such as the C4AI, which brings together public, private, researchers and students, represents a great collaboration for the innovation ecosystem and encourages collaborative work in research related to Artificial Intelligence to, over the next few years, accelerate discoveries and scientific progress and positively impact everyone's lives”, says Claudio Pinhanez, Research Manager in Conversational Intelligence at IBM Research Brazil and vice-director of C4AI.

Machine learning and knowledge representation with a focus on the Blue Amazon

The AI Center has been working to build a conversational agent that harnesses existing knowledge about the Blue Amazon, the vast region of the Atlantic Ocean off the Brazilian coast, rich in biodiversity and energy resources. Within this initiative, the Center announces the pirah, the first large-scale question-and-answer dataset in Portuguese and English. It contains more than 160 thousand pairs of questions and answers in English about the Brazilian ocean coast, created from scientific texts and eight thousand pairs of questions in Portuguese created by hand. Its existence will substantially contribute to the evolution of conversational technologies, including virtual assistants in Brazil, and intends to answer the most diverse questions about the marine ecosystem.

Stroke Diagnosis and Recovery to Support Physicians

In the research project focused on modeling strokes (cerebrovascular accidents) with AI techniques, a data collection of electroencephalograms (EEGs) was carried out with the help of the Neuromodulation Laboratory of the Physical Medicine and Rehabilitation Institute of the Hospital das Clínicas of the Faculty of Medicine at USP. From these data, an initial system of classification of CVA was developed using complex networks, using machine learning techniques and with multimodal data. A system for data filtering using AI and a platform for manipulation, visualization and analysis of EEGs were also developed. Machine learning applications in medicine often need to deal with large-scale, heterogeneous and dynamic datasets such as texts, images, and genetic biomarkers. The integration of this information is essential to correctly treat health problems, allowing physicians and professionals in the field to select and understand which attributes are most relevant for the classification of a stroke, providing important information for decision-making.

Natural language processing in Portuguese

In the great challenge related to the Portuguese language, the C4IA is making available three fundamental datasets for the advancement of the computational processing of the language. These datasets contain texts from different sources, meticulously annotated by linguistics students, and recordings of the Portuguese language from different regions of Brazil. All this work aims to produce and collect data and tools that allow a high level of performance in Natural Language Processing in Portuguese, as it already exists for other languages, and develop computer solutions to support the language, enabling the creation of applications for last generation. Research is focused on both written and spoken Portuguese.

  • One of them is composed of the largest set of syntactic data available in Brazil, containing texts from various sources such as news, tweets and consumer comments. The data follows all the privacy control rules of the General Data Protection Law (LGPD) and was meticulously annotated, sentence by sentence, by a team of dozens of linguistics students at USP.
  • CROWN contains more than 260 hours of recordings in the Portuguese language, from different regions of Brazil, from four pre-existing datasets, but now audited by university students. The multi-diversity of the content provided by CORAA offers, for example, greater regional diversity in the creation of future conversation applications, respecting accents, cultures and local customs. The goal is to reach 600 hours of recordings in the next version.
  • Carolina contains information on more than 120 billion words and terms in Portuguese, annotated by typology and origin, offering a wide range of details about the etymology. These three datasets significantly enhance the work of Natural Language Processing in Portuguese and will enable, among other things, the development of next-generation AI applications, with the ability to better understand the language and, consequently, offer a better experience to the users. users.

In addition, the Center created a network of researchers interested in the link between AI techniques and the food production chain, in view of the economic and social importance of agribusiness in Brazil, and a network of researchers from various fields of the humanities, from social sciences in law, which investigate topics such as the relationship between AI, education and work; the relationship between AI, ethics and law; violence, bias, and social impacts of AI; public policies and governance in the face of AI.

"The mission of the Artificial Intelligence Center is to develop cutting-edge research in this area in Brazil, seeking to improve human life through the results of these researches, as well as to disseminate results and foster social debate about this technology", says Fábio Cozman, director of the Artificial Intelligence Center at the University of São Paulo.

Committees in action

Another milestone in this first year of AI Center activities was the entry of 17 large companies into the industry and society committee, which reinforces the relevance of the theme for the country's economy, including: B3, Banco do Brasil, Banco Original, BRF, Cubo Itaú, Energisa, FAPESP, Gerdau, IBM, Magalu, Motorola, Petrobras, Raízen, Vale and WEG, among others. This committee aims to understand the challenges of the sector and find ways to disseminate and bring to the industry new technologies, scientific advances and qualified professionals.

The diversity and inclusion committee was also created, whose function is to promote and increase the participation of women, African descendants and other members of society, generating a more inclusive participation in the AI sector. The committee is already functioning and has 10 members so far, composed of professors and students from different USP faculties. Currently, the work is focused on increasing the participation of women, PPIs (Blacks, Browns and Indigenous Peoples) and PCDs (people with disabilities) in the center's activities and AI projects at USP, promoting education and discussion in the market and in academy on underrepresented groups in the field of AI.

“The C4AI is establishing itself in a way that is perfectly in line with the principles of the FAPESP Engineering Research Centers program: a research center of international excellence with strong work in the areas of innovation and dissemination to society. The fruits that are already beginning to be produced will benefit the AI research and innovation ecosystem in São Paulo and Brazil, as you can see about the databases and research results in Natural Language Processing, for example”, says Roberto Marcondes, member of the coordination of FAPESP's Research, Innovation and Diffusion Centers (CEPIDs) program.

Currently, C4AI has 41 fellows supervised by more than 80 professors. In 2022, the goal is to reach 120 professors and 130 scholarship holders. In a year of activities, more than 50 articles were published in scientific journals, medical and AI conferences, in addition to the promotion of two series of online seminars which debated, for thousands of participants, the perspectives and advances of AI in Brazil and in the world and promoted discussions on public policies to support research and innovation in AI.

quick access

en_USEN