The data sets which are being labelled (to my knowledge) as unstructured are;
1. Text
2. Images
3. Audio
4. Video
I have been part of Business Intelligence projects in the past, which has used the first two and to some extent the third - Audio.
TEXT
I am going to sub categorize the "text" data set into two.
- Machine Generated. These are data generated by machines like digital signage player (which keeps playing content), wireless sensor (which captures temperature). The data captured can be massaged with a structure once we know the meta data the machine uses to generate the data. This then provides the context for the new data element.
- Interaction Generated. These are the data streams generated by people when they interact with social networking site, feed back surveys, product/service feedback forms. These are text which people enter in open fields which brings out their likes, dislikes, sentiments, behavior, etc., This data can be brought into a structured form by the use of sentimental dimensions, which provide the ability to score the text stream.
IMAGES
There is no dearth of images being captured especially when mobile devices are part of the business process. Images captured by themselves do not provide any business context. Ways and means have to be arrived to bring a structure to the data element. We had used images to calculate the visual appeal of the product packaging. In our case, a separate algorithm was run against the image to score the image for visual appeal and this provide the structure and the context to the image.
AUDIO
Audio is increasingly produced with the advent of smart devices in the support of business process. The audio by themselves do not provide any business context, but have to be processed to extract attributes in support of the business. Though we have used audio to time stamp the "greet time", to measure service levels in QSR solutions there are many more possibilities to use Audio data. One solution we considered is to use voice to text conversion and then apply the interaction text analysis. But if we have to capture the mood of the speaker (e.g. contact center) then we can apply separate set of interaction voice analysis.
To conclude It is obvious from that there are new forms of data beyond the traditional structured data which can be leveraged in a Business Intelligence or Big Data Solution. There is a need to create new data transformation techniques to add the business context and extract the value out of these new data types.