VoIP Eavesdropping: Counter Measurements

As we seen in two last posts SIP(Sesion Initiation Protocol) is a protocol easily sniffeable because of being transmitted unencrypted over the net. There are some solutions which solve this, but they are not definitive. Next picture show a very basic diagram of one VoIP infrastructure which I will use along this post, at this point we should understand SIP is used for creating, modifying and terminating sessions and this sessions are formed for one or several media streams and they occurs between clients, leaving SIP Proxy aside.

Figure: Basic VoIP network infrastructure

Mainly we have two options in order to avoid Eavesdropping attacks: encryption or network separation.

Network separation

It´s too difficult to own necessary resources to separate physically VoIP network of organization data network. The common solution is to use managed switches and setup different VLANs (Virtual Private Networks).

But this is only applicable inside your LAN and there are a lot of techniques for evading this kind of switches control which allow the attacker hop between different VLANs, we can find them with a simple search on Google:
In fact, software used in previous posts supports it for some Cisco routers as showed in the picture:

Figure: UCSniff VLAN hop


In this case we have some options too:

- VPN(Virtual Private Network): As you can see in the figure it is possible to cypher communications between different VoIP terminals of your system using a VPN, if all traffic is encrypted both SIP and RTP are also protected. This solution defends us from Internet sniffers but not inside the organization, this is the reason because a dedicated VLAN is also recommended in order to minimize data exposure. 

Figure: VPN example

- Built encryption: Some proprietary software as Skype uses its own cipher protocol, only understandable for Skype clients. Traffic is encrypted and protocol relies on a P2P network formed for clients and nodes, but this architecture is too complex for resume it in a few words, so I recommend the lecture of these papers:
Anyway, I wouldn’t use it if I want a real secure communication because i can´t be sure if my conversation is not being transmitted using another Skype user computer(maybe a bad guy one).

- “Standards” SRTP & ZRTP: SRTP(Secure Real Time Transport Protocol) cyphers RTP traffic to provide encryption, message authentication and integrity and replay protection. It depends of an external key management protocol to set up the initial master key, there are some other protocols to do this task: MIKEY, ZRTP(Media Path Key Agreement for Unicast Secure RTP) and SDES which seems to become de facto standard, principally for being an extremely simple technique. Basically, in this method keys are transported in a SIP message (SDP attachment) and ciphered using TLS(Transport Layer Security), you can imagine it if you think in HTTPS protocol. Also it could be possible to use other methods to implement this last funcionality like S/MIME but they are not too much widespread.

Figure: TLS example

On the other hand, ZRTP was developed as part of Zfone Project and its most important advantage is the only able to provide end-to-end encryption. Even SIP/TLS does not provide it because being the IP PBX a trusted third party which could be able to eavesdrop the conversation. Other benefits of this protocol:
- It uses a public key algorithm avoiding PKI(Public Key Infrastructure) complexity.
- It allows the detection of man-in-the-middle (MiTM) attacks, as commented before.
- It supports opportunistic encryption asking the other VoIP client if supports ZRTP before starting a call.

Figure: Detailed SRTP generic communication

NOTE: Eavesdropping through ZRTP protocol seems extremely difficult, but not impossible. To do this, an attacker would have to be present since the first call, be able to fake verbal SAS in real time and, preferably, to imitate voices. (Detailed explanation here)

They are not exactly standards but they are the most used option, in fact, SRTP(RFC4585) and MIKEY (RFC4738) are “Proposed standard” and ZRTP is an “Informational standard”. It was developed by Phil Zimmermann (among others) and published by IETF recently as RFC 6189.

Ok, this is a real mess of protocols, but now, what hardware and software solution would I get? You should choose what level of risk you want to assume, and then select software that supports it, I think this comparative list can help you:

Figure: Ekiga client 

To sum up I should to say I know this was a bored(sorry for that) theoretical post, but I found a lot of confusion in too many sites and forums among this group of protocols and what they can do for us, so I decided deep in and document it. From now I will come back to work on proofs of concept which are much more funny to test, write and read :)

Jesús Pérez