Site icon Insights

Voice recording and speech-to-text mining

Written by Sandra Baethge & Eleonora Paul

Searching for storytelling at the POS

Over the years, the spoken word has become hard currency in market research. In the past, it was usual to finish a quantitative survey with open-ended questions, whether the respondents had further remarks. Today, more and more qualitative questions are seen even in standardised surveys.

The need to constantly generate more qualitative insights even with large caseloads means the respondent’s open-ended statements are becoming ever more important – especially in combination with a closed valuation.

Challenge people at the POS

The other world… people always have less time per se directly at the POS. As personal surveys are becoming ever more challenging, those surveyed also have to be convinced about how meaning participation is. The interviewer is also under time pressure but is encouraged to conduct ever more complex surveys at the POS. As all of the respondents give lengthy open-ended answers, the interviewer then often (unknowingly) records a shortened version.

So how can more meaningful insights be obtained through more qualitative answers at the POS?

IWD deals with this question on a daily basis in the more than 3 million personal surveys conducted every year at the POS in 25 European countries.

The solution can only be: the spoken word shall prevail.

More qualitative insights through audio recording at the POS

To be able to gather completely qualitative statements in a short time, IWD relies on audio recordings. For this, IWD uses the recording ability of smartphones; the indispensable device for computerised, personal surveys. The respondent’s statements are gathered via audio recording during the survey mode. The interviewer neither needs to type nor interrupt the survey, they only need to start the recording at the correct place. The statements will be recorded for the entire interview and directly uploaded to the server.

To reduce the surrounding noise at the POS, the smartphones were equipped with small external microphones after the first trials. This adjustment enables better processing of the audio files.

High acceptance with the respondents and interviewer

The first trials have already showed a high willingness to participate and provide information amongst the respondents. 85% of the respondents allowed for their statements to be recorded. Only a few had reservations with the technology.

From interviewer to moderator

Positive effects were also seen on the interviewer’s side. Recording the open answers was, above all, a relief for the interviewer. They can follow the content of the conversation much better and are able to enter into the dialogue. Therefore, the interviewer is now able to focus on delving deeper into the statements to obtain more meaningful answers.

This change in role, from interviewer to moderator, motivates the interviewer, which culminates in more comprehensive statements. The audio statements contain significantly more context information and thus generate more insights.

The motivational effect becomes stronger, the more the interviewer knows how to use the tool. In this case, it is the responsibility of the interviewer to brief the respondent exactly, to decide when to explain the purpose of the tool and how the conversation will be approached, without endangering the clarity of the recording.

AI protected evaluation of the audio files

To be able to evaluate the large amount of audio files, these are transcribed with the help of a speech-to-text algorithm. Thanks to the automatic transcription, we get complete text statements in the shortest time. A manual transcription in scientifically acceptable times would not be possible in this instance.

As the speech-to-text algorithm has not yet mastered all facets of human language (something that we know all too well from our assistants on our smartphones), the statements are checked and corrected when necessary. Through machine learning, the transcription will become better every time and human correction will become less and less necessary.

The texts are translated and coded when needed after the transcription. Even here, IWD employs an AI algorithm. This is trained as a virtual coder and monitored. Therefore, we ensure an ever more constant coding, independent of any environmental influences and human signs of fatigue. The coded statements can subsequently be used with a higher significance, for example, for frequency and driver analysis.

From the mouth of the customer to analysis – Qualitative insights & authentic storytelling

The advantages of the audio statements are obvious: the client receives comprehensive insights and can, therefore, validate, or confirm, their quantitative assessments once again via the statements.  The same holds true for the analysts, who can now formulate more valid interpretations and guidance on the basis of these qualitative insights.

In addition, the audio recording enables immersion into the world of the customer at the POS.

Results of a study can be presented more authentically and vividly and offer the possibility to look directly onto the shop floor.

Live the customer experience

Finally, not only very detailed insights and reasons for certain ratings can be taken away (because the statement is recorded one to one). Recorded statements with background noises bring the listener mentally to the shop floor. You hear the scanning at the tills, the customers’ voices, the wheels of the shopping trolley… and now the customer can tell you what they find good and bad about your shop. The spoken word is becoming an ever more necessary link between the real world on the shop floor and important everyday decisions that must be made in the interest of the end customer.

First published: Research & Results, Issue 6/2019, Page 46

Exit mobile version