Imagine sitting in-front of Television watching a movie. And we just heard an awesome dialogue e.g romantic or sarcastic one-liner. We want to share it with friends/family. We just open the app to sync to the current segment by tapping on a button. This app magically pulls out the last one minute audio of that segment. As a user, I can bookmark the segment for later use. Or just share it right then.
We also allow users to share only a part of the last sixty seconds. Such trimming allows users to share the specific part they care about.
And all this happens magically in less than ten seconds (while watching the movie sitting on the couch)
What is the maximum duration of audio recorded on mobile? Ten seconds. We've a pretty accurate backend to work with this limit.
Do we stop the audio recording before 10 seconds? Yes. Sometimes we get a very high confidence match at just three seconds. In such a case we stop the recorder and show the result to the user.
What happens if we're unable to locate the recorded clip? Incase we're unable to locate it in the cinema even with a ten second recording, we stop it automatically and ask the user to "go near the sound source".
Along with the sixty second audio, do we have any additional information relating to the segment? We've a poster corresponding to the segment of the clip. This is sent from the server once a match is found. Optionally, we also have transcript of the segment (this lets user glose through the segment without having to hear it).
Why are we fetching the audio from original source stored on servers (instead of letting users directly share the recording)?
1) Users only realise that they want to share a particular audio after having heard it. Not before. So we fetch last sixty seconds audio to let them share any part of it.
2) Even in cases where user is aware of what's coming next, the audio recorded on mobile is of very poor quality. Because the recording doesn't happen in a silent noise-free zone. Also people tend to be far from the source of audio. For example, we tend to be at-least 4 feet away from Television while watching. All of these combined with environmental noise reduce the quality.