Understanding the value of data in order to improve Machine Learning

The value of data is significantly increasing as we move towards a more autonomous society.

Data can reveal patterns and offer insights into our everyday behaviour. For instance, healthcare workers can use data to report the rate of flu infections in a particular state, and manufacturers can use data to better understand average production times, factors which all highlight the value of data in today’s society.

Additionally, data plays a huge role in Artificial Intelligence (AI) decision-making. By understanding how individual data sources contribute to technology-based decision-making processes, AI users can expect a more effective and improved experience.

Measuring the value of data enables us to eliminate inputs that might contribute to biased models. Furthermore, understanding the value of data allows us to assign appropriate pricing to data sources, thereby facilitating data sharing. This is particularly important to industries where specific data is difficult to obtain or for small businesses grappling with limited data access.

Increasing knowledge around the value of data

Assistant Professor Ruoxi Jia in the Bradley Department of Electrical and Computer Engineering at Virginia Tech has received a National Science Foundation (NSF) Faculty Early Career Development (CAREER) award to investigate fundamental theories and computational tools needed to measure the value of data.

The $500,000 grant, awarded over five years, will allow Jia and her team to develop scalable and reliable data valuation techniques that support strategic data acquisition and improve Machine Learning-based data analytics.

“Right now, there is much excitement about Machine Learning and AI, especially after the emergence of ChatGPT,” Jia said. “But what’s under the hood is a lot of data. That’s what enables this kind of machine, which is why we aim to raise awareness around the value of data.”

Making quality-based data tools more accessible

ChatGPT, an AI chatbot launched this fall, allows users to ask for help with things such as writing essays, drafting business plans, generating code, and even composing music. As of December 2022, ChatGPT already had over one million users.

Machine Learning
© shutterstock/archy13

Jia noted the importance of data quality and how it can impact Machine Learning results. She explained: “If bad data feeds into Machine Learning, you will get bad results. We want to get an understanding, especially a quantitative understanding, of the value of data for data selection.”

ChatGPT developers have noticed the importance of more quality-based data as they just announced the release of GPT-4. The latest technology is multimodal, meaning both images and text prompts can spur it to generate content.

How can we acquire data that is currently private?

A large amount of data is required to develop this type of machine intelligence, but not all data is open-sourced or public. Private entities own some data sets, and there is privacy involved.

Jia hopes that in the future, monetary incentives can be introduced to help acquire these types of data sets and improve the machine learning algorithms needed in all industries.

The University of California-Berkeley grad has had conversations with Google Research and Sony AI Research, among others, who are interested in the value of data and its research benefits.

Sharing data and adopting improved Machine Learning algorithms will significantly benefit both industries and individual consumers.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Featured Topics

Partner News

Advertisements



Similar Articles

More from Innovation News Network