Published on

January 5, 2024

Big Data
eBook

How to Use Structured and Unstructured Data for ML Insights: 5 Examples

Did you know that 90% of the world's data is unstructured? That's a lot of data to leave on the table if you're only focusing on structured data.
Craig Wisneski
Co-Founder & Head of G&A, Akkio
Big Data

Did you know that 90% of the world's data is unstructured? That's a lot of data to leave on the table if you're only focusing on structured data. In order to get the most out of machine learning, you need to consider both types of data.

In this post, we'll explore how you can use both structured and unstructured data for better machine learning insights that can help you make better business decisions.

What is structured and unstructured data?

Structured data refers to clean, organized data sets that are easy to perform data analysis on. This type of data typically comes from databases or spreadsheets, where each piece of information has a specific place and format. 

Relational databases are commonly used to store structured data, and SQL (Structured Query Language) is a popular query language for accessing and manipulating structured data sets. JSON (JavaScript Object Notation) and XML (Extensible Markup Language) are two popular formats for storing and exchanging structured data. 

Big data is a term used to describe extremely large data sets that may be difficult to process using traditional data processing techniques. Semi-structured data is a type of big data that contains some elements of structure, but not as much as fully structured data. Data lakes and data warehouses are two common storage solutions for big data sets.

Tools like Salesforce, Hubspot, and Google Analytics all contain structured data. Salespeople use CRMs, or customer relationship management software, to track their interactions with leads and customers. Marketers use marketing automation software to track the progress of their marketing campaigns. And analysts use business intelligence software to visualize data and identify trends.

Similarly, HR professionals use HRIS systems, or human resource information systems, to track employee data. This includes everything from contact information and performance reviews to benefits and vacation days.

The advantage of structured data is that it's easy to understand and process. The disadvantage is that it only represents a small part of the total available data.

Unstructured data is data that doesn't have a specific structure or format which makes it difficult for computers to understand and process. It can contain text or images which cannot be easily categorized or indexed. Examples of unstructured data sources include emails, social media posts, surveillance data, invoices, and resumes. 

What should you use?

The old adage, "if all you have is a hammer, everything looks like a nail," is relevant here. If all you have is structured data, you're only getting a small part of the story. The same is true if all you have is unstructured data.

Gaining the most value from data analytics means looking at a broad set of data structures, including qualitative data and quantitative data. Examples of qualitative data (unstructured data) include customer surveys, social media posts, and emails. Examples of quantitative data (structured data) include revenue numbers, company size, and customer demographics.

The best way to use machine learning is to consider both types of data. While qualitative data may not be usable in its native format, data mining techniques like text analysis can be used to glean insights from it. Modern analytics tools like Akkio can help make sense of both structured and unstructured data.

Data management and data storage are important considerations when working with either data type. Where will you store all this data? How will you organize it? What is the best way to structure your data so that it can be easily analyzed?

Together, structured and unstructured data give you a more complete picture of your business and help you make better decisions.

5 examples of using structured and unstructured data for ML insights

The main limits to using machine learning are the quality and quantity of data. Here are 5 examples of where you can find both types of data, along with use cases and functionality for each.

1. Social media posts

Prior to the advent of social media, businesses could only track customer sentiment by conducting surveys or manually reading customer reviews. Now, social media platforms generate petabytes of data every day, which has made it possible for businesses to track sentiment in real-time.

While social media posts are unstructured data, potentially including both text files and video files, they can be converted into structured data using sentiment analysis. This process involves using natural language processing (NLP) algorithms, a subset of AI, to classify the sentiment of a text as positive, negative, or neutral.

Using social media data for machine learning insights requires careful consideration of how the data is collected and processed. For example, Facebook offers a tool called the Graph API, which allows developers to access data that has been shared publicly on the platform. However, the data collected through the Graph API may not be representative of the entire population of Facebook users.

It is also important to consider how data collected from social media platforms can be biased. When using social media data for machine learning, it is important to be aware of these potential biases and limitations. However, when used correctly, social media data can be a powerful tool for generating insights.

2. Customer feedback forms

TypeForm, Jotform, Google Forms... the list of customer feedback form tools goes on. And while most businesses see customer feedback forms as a necessary evil, they can actually be a goldmine of machine learning data.

Customer feedback forms are semi-unstructured, since they usually consist of a mix of multiple-choice and open-ended questions. However, they can be converted into structured data using text analysis.

Text analysis is a process of extracting insights from text data. This can be done using a variety of methods, such as topic modeling, sentiment analysis, and named entity recognition.

For instance, consider an open-response question that asks "What was your biggest challenge during your recent project?" The answers to this question can be analyzed using topic modeling, which would identify the main themes in the responses.

Sentiment analysis can also be used to analyze customer feedback. This can be used to identify which aspects of the product or service are most likely to generate positive or negative sentiment.

Named entity recognition (NER) is another text analysis technique that can be used on customer feedback forms. NER can be used to identify named entities, such as product names, locations, and people.

3. Photos

Trending social media apps and smartphones come and go, but one thing that remains constant is that people love taking photos. In fact, it is estimated that 1.72 trillion photos were taken in 2022. 

While a photo itself is unstructured data, it can contain a wealth of structured data. For example, most photos today are taken with smartphones, which embed EXIF data in the image file.

EXIF data is a type of metadata that contains information about the photo, such as the date and time it was taken, the camera settings used, and the GPS coordinates of where it was taken. This information can be very useful for machine learning applications.

For example, consider a retail store that wants to use machine learning to improve customer service. The store could use EXIF data from customer photos to identify patterns in when and where customers are most likely to take photos. This information could be used to optimize the store layout or deploy staff members to areas where they are most likely to be needed.

4. Customer Profiles

Most businesses have some type of customer profile, which is a collection of information about their target audience. This information can include demographics, psychographics, and behavioral data.

While customer profiles are usually created using structured data, they can be enhanced with unstructured data. For example, a customer profile could be enhanced with data from social media posts, customer feedback forms, and photos.

5. Email

Detecting spam is used as a classic example of machine learning on unstructured data. Beyond spam detection, there are many other ways machine learning can be used on emails. For example, NLP can be used to identify the topics of an email, which can be used to prioritize emails or automatically route them to the appropriate team member.

Emails also contain a lot of structured data, such as the sender and recipient information, and the date and time of the email. An organization could, for instance, use machine learning to predict which customers are most likely to need help based on the content of their emails.

Across these examples, combining structured and unstructured data can provide a more complete picture that can be used to generate insights that would not be possible with either type of data alone. The key is to carefully consider the sources of data and how the data will be processed. With the right approach, machine learning can be a powerful tool for any business.

How do I use both structured and unstructured data?

Structured and unstructured data live in different corners of the business world. Structured data is well-organized information that fits into a predefined schema, such as customer transaction records, product inventories and financial data. 

Unstructured data is information that doesn't fit neatly into a predefined category, including social media posts, webpages, images, videos and sensor data.

Organizations use different tools to collect and store these different types of data. Some tools are built specifically for collecting unstructured data, such as forms, surveys, and customer feedback. Other tools are built specifically for storing structured data, such as databases and spreadsheets.

Google Forms and SurveyMonkey are two popular tools for collecting unstructured data. This data can be stored in a variety of formats, including JSON, XML, and text files. Nosql databases like MongoDB and Cassandra are well-suited for storing this type of data due to their flexibility and scalability.

Structured data is typically stored in a tabular format in relational database management systems (RDBMS) like MySQL, Oracle, and Microsoft SQL Server. This data can also be stored in spreadsheet applications like Microsoft Excel and Google Sheets. The main advantage of using an RDBMS is that it enforces a predefined data model, which can make data analysis easier. 

There are also many tools that enable organizations to use machine learning to analyze both structured and unstructured data. Akkio is one such tool. Akkio enables users with no coding skills or knowledge of machine learning to leverage machine learning algorithms, such as classification models and regression models, to analyze all types of data and make predictions and business decisions.

For example, Akkio can be used for fraud detection or identifying customer lifetime value. Akkio makes it easy to work with machine learning by providing a user-friendly interface and pre-built models that require no coding or data science expertise.

Akkio is just one example of a tool that makes it easy to use machine learning to analyze both structured and unstructured data. There are many other similar tools available, and the number of options is growing rapidly as the field of machine learning evolves.

Conclusion

Most businesses are only taking advantage of a small number of data formats, which is limiting their ability to glean insights from data. In order to get the most out of artificial intelligence, businesses need to take advantage of both structured and unstructured data.

Traditionally, businesses would need data scientists to use tools like Python to analyze unstructured data. However, there are now many tools that make it easy to use machine learning to analyze both structured and unstructured data. 

Akkio is one such tool. Akkio enables users with no coding skills or knowledge of machine learning to leverage machine learning algorithms, such as classification models and regression models, to analyze all types of data and make predictions and business decisions. Sign up for a free trial to get started.

By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.