Print | Close
print friendly page for http://australianetwork.com/nexus/stories/s2010595.htm
18 December 2007
BERNADETTE NUNN: Meet Alice.
ALICE: I'm proud of the ability of token-based methods of identity verification.
BERNADETTE NUNN: She may not look or sound quite life-like, but she's an important part of the latest in biometric security - a new system that merges face and voice information to identify a person.
ALICE: ..difficulty in remembering several PINs and passwords.
GIRIJA CHETTY: I'm trying to check whether liveliness is there of the person or if there is a still photo or a pre-recorded audio. To test that sort of thing, I need to create something like an Alice.
ALICE: All biometric trades do not enjoy the same level of user acceptance.
BERNADETTE NUNN: If someone played my video, how can the system tell that it's a video, not me, really.
GIRIJA CHETTY: First thing is the synchrony between your lip movement and voice. If it is a video, it's different from a live person. Video is flat. It's 2-D. And I use 3-D information. And then it also codes the head movement, eye blinks, which is not going to be similar when your video is played.
BERNADETTE NUNN: First, it has to learn what makes a face. It's an incredibly complicated calculation that sIt sounds straightforward, because we don't have any trouble telling a person from a video recording. It's much harder to teach a computer to know the difference.tarts with a databank of 10,000 images of different faces to build a three-dimensional picture of the average face.
PROFESSOR MICHAEL WAGNER: Humans, when they see a face, they will actually only notice differences between the average face that they know and someone having their eyes just a tiny bit further apart or their nose having a slightly different shape. So we're trying to program our computers to actually work according to the same principle.
BERNADETTE NUNN: What is average about both our faces when you and I look so different?
GIRIJA CHETTY: Well, the skin colour we are quite similar. Even though you look white, I don't look white, the skin colour, when you take off the luminance, we are quite close. African, Asian, American, you know, Indian - whatever you take - Australian. We are not much different as far as a computer is concerned, and it cannot distinguish your colour from my colour.
BERNADETTE NUNN: More algorithms teach the computer to locate the lips and track their movement to prove the speaker is alive, not pre-recorded.
GIRIJA CHETTY: It has to come from the algorithm side, technical side, how we do it. How to detect liveness. It hasn't been done before.
MAN 1: Hello.
MAN 2: How are you?
WOMAN 1: My name is Yuko.
PROFESSOR MICHAEL WAGNER: One of our new projects is to combine face information and voice information to, for example, find out who the person is, what their linguistic background is...
WOMAN 2: My first language is Cantonese.
MAN 3: Farsi.
WOMAN 3: Mandarin.
MAN 4: Vietnamese.
MAN 5: English.
MAN 2: Sinhalese.
PROFESSOR MICHAEL WAGNER: Well, one application is to make the actual speech recognition better. When the system needs to understand what you are saying, it is clearly an advantage if the system knows that you have a German accent, a Vietnamese accent, or a Cantonese accent.
BERNADETTE NUNN: The Canberra University team is also combining face and voice recognition systems to train computers to detect human emotion.
PROFESSOR MICHAEL WAGNER: Human beings are incredibly capable of detecting minute differences in emotion. So when someone is angry, it might be important for the computer or the system to know and react accordingly. So we are finding out what makes a voice sound angry or happy or sad, in much the same way as human beings can detect those differences. This is an exciting new project which we are just beginning to make our systems more intelligent.
BERNADETTE NUNN: Why would we want a computer or a robot to know all those things about us?
PROFESSOR MICHAEL WAGNER: If you're running a call centre business, it would be very important for you to know if this person ringing up is getting very frustrated and angry with your system, and take that as a cue to pass that person on to a human operator.
We want to make computers relate to human beings on the terms that we communicate on, rather than us having to learn how computers communicate. So that's part of the strategy of making computers more intelligent, recognise things that we as humans recognise very easily.