Intercepting MS Teams Communication
tl;dr
I was curious how the communication protocol of MS Teams works and why nobody has yet developed a nice Python client for it. So I looked into it and didn’t build a Python client. This article shortly explains how to intercept MS Teams communication on your local system, how to review the protocol and how to pipe the extracted data into Elasticsearch.
Introduction
There are several reasons why I wanted to look into MS Teams communication. First, I would love to have a simple Python client for MS Teams to automate communication and send direct messages to users. Another reason to open up the encrypted communication is based on the ambition to analyse metadata on messenger services. I believe that analysing metadata on messenger services can lead to something similar to this, which is super thrilling.
To test my ideas and to play a little with the data, I decided to man-in-the-middle the communication and analyse the data using Kibana. Thereby, I always tried to keep my analysis setup as simple as possible since I just wanted to get a glimpse of the potential opportunities.
First Step — Application Reconnaissance
For this analysis, I was running an Ubuntu 18.04 LTS with MS Teams installed from here. The first thing I wanted to know after starting MS Teams was how many communication streams are established and to which ports the client connects. This can be simply checked with the ‘lsof - list open files’ command, which revealed more than one established connection. They all connect to port 443 but various addresses.
$ # list all connections made by teams
$ lsof -i -P -n | grep teams
Port 443 is commonly used for TLS-encrypted web traffic, generally used by web browsers and for web-based communication. From previous analyses, I already know that many messenger clients today are actually hidden browsers that only display the messenger’s web interface. Good use of this idea and great examples are the applications Franz and Rambock.
MS Teams as well runs as a web application embedded in the Electron browser. Working with a known browser instead of a custom application comes with advantages like the ability to search available command-line switches of the browser. Switches like --proxy-server
or --ignore-certificate-errors
are available and allow to redirect the browser’s communication streams or bypass certificate checks. These switches are perfect to unscrew the communication encryption on a local machine.
Interception Approaches
Decrypting communication depends on a multitude of preconditions, such as the actual interception point, the used protocol, the selected cypher or the physical medium the data is transmitted on.
By reviewing the application in detail, it becomes obvious that the application communicates with other endpoints. Thereby data will be generated, formatted and encrypted in an application running on a system I control. The payload is transferred over TCP. Also, transmission modifications are prevented by checksums and the used decryption is standardized and in general mathematically proven secure. The addressed endpoints, however, can be replaced and as long the right command-line switches are used the application will accept the replacement.
Being able to fully control one of the involved communication systems is a huge advantage and allows accessing, controlling and modifying the application environment. Transport encryption like TLS used for TCP is made to protect data on transport. It is not made to protect it on the system. Therefore, TLS needs to assume that both endpoints are running secure and trustworthy operation environments. Otherwise, the encryption is doomed.
Having said this, I would like to examine possible approaches before going into practice.
Approach A
One way to access the data in an encrypted communication stream is to extract the TLS session keys used by the application while simultaneously recording the communication with network tools like tcpdump or Wireshark. A nice feature provided by Wireshark is the ability to load the extracted session keys and decrypting the recorded communication. Doing this is surprisingly simple and requires only the environment variable SSLKEYLOGFILE
to be set before starting the MS Teams client. The commands to do so are listed below:
$ mkdir -p ~/ms_teams# starting Wireshark - lisening on all interfaces
$ sudo wireshark -k -i 'any' -w output.pcap# starting tcpdump - lisening on all interfaces
$ sudo tcpdump -i 'any' -w output.pcap# starting Teams
$ SSLKEYLOGFILE=~/ms_teams/sslkeylog.log teams
After setting up the environment variable and launching the application, the sslkeylog.log file needs to be loaded. The key file can be set by selecting the following option path in Wireshark:
“ Edit ➔ Preferences ➔ Protocols ➔ TLS ➔ (Pre)-Master-Secret log filename”
Now it is possible to analyse the decrypted traffic in Wireshark.
Approach B
Another approach to intercept communication is based on the use of PolarProxy. This software adds a transparent SSL/TLS proxy to the system which re-encrypts the communication stream with self-signed certificates. To use it, all outgoing system connection with destination port 443 must be redirected through the proxy, which then forwards the stream to its original destination, but also exfiltrates in parallel a decrypted version of this stream to an evaluation interface such as Arkime. An article about this approach can be found here.
Even though Arkime is the network analysis tool of my choice, I did not want to redirect all TLS traffic of my system. I was concerned about the overwhelming and unnecessary amount of data to analyse, as additional unrelated network traffic from other applications would also be intercepted. Also, in case of sharing the recorded network data (via file or screencast), I wanted to be sure that I only share the intended data and nothing more. Sure, Wireshark and Arkime can filter streams based on ip, port, source and other categories, but adding all the filters every time before sharing is not the kind of easy of use I generally intend. Besides, filtering network traffic by application is, as far as I know, not both applications not possible.
Approach C
The approach I lastly favoured does man-in-the-middle the communication with mitmproxy. The proxy is easy to set up, auto-generates self-signed certificates for incoming connections and cleanly displays the transmitted payload. Another feature I very much welcome is the extendability of mitmproxy through addons. This enabled me to pipe the extracted data to Elasticsearch and to analyse it with Kibana. Also, the data stream from MS Teams can be easily redirected to the corresponding proxy port with the known browser command-line switches. This approach additionally allows to exclusively record MS Teams traffic with tcpdump or Wireshark by filtering the proxy port.
Decrypt Communication
For this analysis and in general, I always set up a project folder and a virtual environment. In this case, followed by installing mitmproxy.
$ mkdir -p ~/ms_teams
$ cd ~/ms_teams
$ python -m venv virtualenv
$ source ./virtualenv/bin/activate
$ pip install mitmproxy
In the given scenario, the biggest advantage gained from using a forwarding proxy is the separation of network communication on applications base. In this article, all communication related to port 3333
belongs to MS Teams. Therefore, the proxy need to be stared with the commands shown below:
$ mitmproxy --listen-port 3333
After that, the MS Teams client can to be started and the traffic will be redirected. As previously discovered, the client is actually an Electron browser and can be started with additional command-line switches. Configuring the switch --proxy-server=http://localhost:3333
will forward all traffic to the previously started proxy.
Considering that the proxy replaces the used TLS certificate, the switch --ignore-certificate-errors
must also be set to bypass the certificate check. The MS Teams client can now be started as shown below:
$ /usr/share/teams/teams \
--proxy-server=http://localhost:3333 \
--ignore-certificate-errors
Immediately after the start, the proxy displays the first intercepted communication streams and proves the approach working.
The stream can now be reviewed in deep. The image blow displays a request response with user information in JSON format. Furthermore, the destination URL and the content of the Cookies are accessible.
Extracting Data From Mitmproxy with Addons
Accessing the pure data stream is very useful and already enables us to analyse the protocol. Another worthwhile capability would now be to extract and store the data and separate it in different ways.
This goal can be achieved by extending mitmproxy with custom Python addons. Writing these addons is very simple and is well documented in the mitmproxy documentation. Referring to these examples as an orientation, it is just a small step to develop customized addons.
The code below provides a simple example which extracts all occurring URLs and writes them into to the fileteams_urls.txt
.
# example_urls.pyfrom pathlib import Path
from pprint import pprint
from mitmproxy import http
urlFile = Path("teams_urls.txt")def request(flow: http.HTTPFlow) -> None:
with urlFile.open("a+") as f:
f.write(flow.request.pretty_url + "\n")
The addon can simply be used by adding the switch --scripts SCRIPT
to the command.
$ mitmproxy --listen-port 3333 --scripts example_urls.py
A cleaned up look at the harvested URLs generated by this addon is listed below. The result nicely lists some API endpoints and their format.
Evaluating the Data in Kibana
Implementing the last part of this approach will make the harvested payload evaluable in Kibana. Therefore, the data intercepted by the mitmproxy needs to be exported to Elasticsearch. Which of course requires a running ELK Stack first. Since I often use the ELK Stack to sneak peek into different kinds of data, I ofcourse have my own docker-compose file for it. Setting up this “i-just-want-to-try-something” ELK stack can be done as shown below:
$ git clone git@github.com:botlabsDev/basic-elkg.git
$ cd basic-elkg
$ docker-compose up -d
The stack is accessible under the following URLs:
- http://localhost:9200 (Elasticsearch)
- http://localhost:5601 (Kibana)
In order to send data to from Python to Elasticsearch the installation of the Elasticsearch Python library is required. Therefore, the virtual environement mentioned before needs to be extended as shown below.
$ pip install elasticsearch
Having this done. It is now possible to create an mitmproxy addon which pipes MS Teams communicaiton data to Elasticsearch. An example addon on how to export the intercepted data stream to Elasticsearch is listed below and named example_toElasticSearch.py
.
## example_toElasticSearch.py
import datetime
from elasticsearch import Elasticsearch
from mitmproxy import http
es = Elasticsearch(['localhost'], port=9200)
def request(flow: http.HTTPFlow) -> None:
sendDataToEs(index="msteams_request", flow=flow)
def response(flow: http.HTTPFlow) -> None:
sendDataToEs(index="msteams_response", flow=flow)
def sendDataToEs(index: str, flow: http.HTTPFlow) -> None:
data = {"url": flow.request.pretty_url,
"content": flow.request.content.decode(),
"timestamp": datetime.datetime.utcnow()}
try:
es.create(index=index, id=flow.__hash__(), body=data)
print("send data to es")
except Exception as e:
print(e)
# exit()
The addon can be executed in the same way as the first one.
$ mitmproxy --listen-port 3333 --scripts example_toElasticSearch.py
Shortly after starting all systems, Kibana will list results, similar to the screenshot below. It remains to mention that an index must be set in Kibana before the data can be evaluated. This is usually the problem when no data is visible in the Kibana dashboard. The documentation that describes how to create an index can be found here.
Finally, the data collection will increase related to its runtime, and data interpretation and analysis can now be done. By adapting the sample addons to specific requirements, even better results can be achieved.
Summary
Summarizing all introduced steps ends in the following compact list of commands. This list combines two data recording options, firstly extracting the used encryption key and capturing the full traffic related to port 3333 in pcap file and secondly intercepting the encrypted traffic from the MS Teams client and storing the payload in Elasticsearch. This way, a full communication dump is available if required at any later point as well access to an evaluation interface for the (not anymore encrypted) payload.
$ ##### TERMINAL 1
$ mkdir ~/ms_teams
$ cd ~/ms_teams$ python -m virtualenv venv
$ source ./venv/bin/activate
$ pip install mitmproxy$ git clone git@github.com:botlabsDev/basic-elkg.git
$ cd basic-elkg
$ docker-compose up -d
$ ##### TERMINAL 2
$ mitmproxy \
--listen-port 3333 \
--scripts example_toElasticSearch.py$ ##### TERMINAL 3
$ sudo tcpdump port 3333 -i 'any' -w ~/ms_teams/output.pcap
$ ##### TERMINAL 4
$ SSLKEYLOGFILE=~/ms_teams/sslkeylog.log \
/usr/share/teams/teams \
--proxy-server=http://localhost:3333 \
--ignore-certificate-errors
Afterwards, the data can be analysed by using Kibana or directly with Wireshark. When doing so, remember to load the SSLKEYLOGFILE file into Wireshark as described above.
$ firefox http://localhost:5601 # Kibana$ wireshark ~/ms_teams/output.pcap
Conclusion
In this article, I have illustrated how to intercept web browser-based communication on a local system and how to load it into an evaluation environment. Building upon this, it should be possible to go further and analyse, evaluate or manipulate the communication stream. This workflow in general fits for all web-based client messengers like Whatsapp, Telegram or Signal.
Future work
With the demonstrated level of access, the following steps can now be considered:
- Reversing the communication protocol and create documentation.
- Development of an open source MS Team client.
- Apply metadata analysis against the communication streams and evaluate user behaviour (e.g. Users are online/offline, in calls, …)
- Probably more ..
I currently plan a second article about how to analyse metadata harvested from MS Teams.