Identify systems with suspiciously high or low metrics in different areas. These outliers can then be used to dig into further with other types of analysis.
All of the scenarios here use Zeek logs. So be sure to analyze your pcap using Zeek before starting.
More advanced techniques look at how long connections are held open or how regular connections are being made between two IP addresses. Sometimes it is useful to simply see how many total connections were made. A high number of connections made indicates that systems were communicating quite a bit and this type of analysis can be used to determine where to dig further.
cat conn.log | zeek-cut id.orig_h id.resp_h id.resp_p proto | awk 'BEGIN{ FS="\t" } { arr[$1 FS $2 FS $3 FS $4] += 1 } END{ for (key in arr) printf "%s%s%s\n", key, FS, arr[key] }' | sort -nrk 5 | head
192.168.88.2 165.227.88.15 53 udp 108856
10.55.200.10 172.16.200.11 53 udp 64285
10.55.100.111 165.227.216.194 443 tcp 20054
10.55.182.100 10.233.233.5 80 tcp 4190
10.55.200.10 216.239.34.10 53 udp 3856
10.55.200.11 193.108.88.128 53 udp 3660
10.55.200.11 88.221.81.192 53 udp 2742
10.55.200.11 205.251.195.166 53 udp 2289
10.55.200.11 216.239.34.10 53 udp 2265
10.55.200.10 205.251.195.166 53 udp 1931
cat conn.log | zeek-cut id.orig_h id.resp_h id.resp_p proto
- We’re taking Zeek’s conn.log
and only keeping the source IP, destination IP, destination port, and destination protocol.awk
- The following explains the pieces of the awk
script.BEGIN{ FS="\t" }
- Set the FS
(field separator) variable to a tab character. This is what is separating columns in our Zeek logs as well as what we want to use in our output. BEGIN
means this instruction is only executed one time, before any data is processed.{ arr[$1 FS $2 FS $3 FS $4] += 1 }
- Creates an array (named arr
) with the number of connections. The important part here is that we are using the concatenation of the first four fields ($1
through $4
) as our array key. Which means that as long as the source and destination IPs, destination port, and protocol remain the same it counts the connections under the same key. awk
executes this instruction repeatedly for every line of data.END{ for (key in arr) printf "%s%s%s\n", key, FS, arr[key] }
- Here we are looping through all the elements in the array and printing out the results. END
signifies that awk
only executes this instruction one time, after processing all the data.sort -nrk 5 | head
- The number of connections is the 5th and final column printed in the output. Here we are sorting on this column in descending order and keeping the top results.From the results we can see that 192.168.88.2
was communicating with 165.227.88.15
a large number of times. The dataset used is taken over a period of 24 hours so we can see that 108,856 connections divided by 86,400 seconds in a day means over 1 connection was sent every second on average. Furthermore, we know that these connections were made over UDP port 53, which is normally DNS. Typically, DNS results are cached in a number of places including:
It is suspicious that DNS requests were being made so frequently. This could be a result of a misconfiguration or misbehaving software. Or it could indicate malicious software. Assuming we are familiar with the network configuration, we should be able to quickly tell that 165.227.88.15
is not an IP of a DNS server we recognize, which makes this traffic even more suspicious.
Large data transfers out of a network can indicate data exfiltration. Databases containing sensitive information, intellectual property in the form of images, videos, PDFs, or other binary formats are often targets for attackers. Each of these can consist of large amounts of information that can then be detected when transferred out of your network.
The following command will display the total number of bytes sent from the IP address in column 1 to the IP address in column 2. When the IP address in column 1 is an internal address and column 2 an external address, it means that this data was exfiltrated out of your network.
cat conn.log | zeek-cut id.orig_h id.resp_h orig_bytes | awk 'BEGIN{ FS="\t" } { arr[$1 FS $2] += $3 } END{ for (key in arr) printf "%s%s%s\n", key, FS, arr[key] }' | sort -nrk 3 | head
192.168.88.2 165.227.88.15 6723739
10.55.100.111 23.38.115.36 981527
10.55.100.111 34.233.92.30 958540
10.55.100.111 24.220.113.56 778452
10.55.100.111 24.220.113.58 775648
10.55.100.111 23.52.163.40 734881
10.55.100.100 23.38.115.36 705408
10.55.100.103 134.170.58.189 637453
10.55.100.111 23.63.220.157 618329
10.55.100.105 23.38.115.36 615374
cat conn.log | zeek-cut id.orig_h id.resp_h orig_bytes
- We’re taking Zeek’s conn.log
and only keeping the source IP, destination IP, bytes sent by the source IP.awk
- The following explains the pieces of the awk
script.BEGIN{ FS="\t" }
- Set the FS
(field separator) variable to a tab character. This is what is separating columns in our Zeek logs as well as what we want to use in our output. BEGIN
means this instruction is only executed one time, before any data is processed.{ arr[$1 FS $2] += $3 }
- Creates an array (named arr
). The important part here is that we are using the concatenation of the source and destination IPs as our array key. Which means that this command will add up the bytes for each pair of IPs. awk
executes this instruction repeatedly for every line of data.END{ for (key in arr) printf "%s%s%s\n", key, FS, arr[key] }
- Here we are looping through all the elements in the array and printing out the results. END
signifies that awk
only executes this instruction one time, after processing all the data.sort -nrk 3 | head
- The number of connections is the 3rd and final column printed in the output. Here we are sorting on this column in descending order and keeping the top results.You can also modify the command to get the total amount of data sent in both directions.
cat conn.log | zeek-cut id.orig_h id.resp_h orig_bytes resp_bytes | awk 'BEGIN{ FS="\t" } { arr[$1 FS $2] += $3+$4 } END{ for (key in arr) printf "%s%s%s\n", key, FS, arr[key] }' | sort -nrk 3 | head
10.55.100.111 162.252.74.5 2027554933
10.55.100.103 13.107.4.50 91904287
10.55.100.111 24.220.113.59 29008351
10.55.100.110 72.21.81.240 23388664
10.55.100.106 40.77.228.30 21117996
10.55.100.111 23.38.115.36 19471375
10.55.100.107 13.107.4.50 17989779
10.55.100.110 23.38.115.36 17960372
10.55.100.106 13.107.4.50 17653174
10.55.100.100 13.107.4.50 17528299
Attacker tools which send HTTP traffic will often include a User Agent header (UA) in the HTTP request. Many tools (e.g. nikto) will have a custom UA that identifies the tool. If an attacker forgets to set a custom value it should be a red flag to have such UA appear on your network. Even tools which have their default value set to a common web browser will sometimes make typos, such as including an extra space, that likewise will appear as anomalies. One final reason to look at UA’s is that if your network has consistent patching then you should have relatively few unique UA strings as each system will have identical browsers and versions. A unique value in this case could indicate: a system with missing patches, a user installing unauthorized software, or an attacker attempting to blend in by choosing a common UA but it doesn’t match what’s on your network.
This command uses Zeek’s http.log
file to find all UA strings, and count how many of each appear in connections.
cat http.log | zeek-cut user_agent | sort | uniq -c | sort -n | head
1 client connection
1 Windows-Update-Agent/7.9.9600.18756 Client-Protocol/1.21
2 Microsoft-CryptoAPI/6.3
9 Windows-Update-Agent/10.0.10011.16384 Client-Protocol/1.40
12 OfficeClickToRun
25 Microsoft BITS/7.8
40 MICROSOFT_DEVICE_METADATA_RETRIEVAL_CLIENT
44 Mozilla/5.0 (Windows NT 10.0; Win64; x64; Trident/7.0; rv:11.0) like Gecko
48 Mozilla/4.0 (compatible; FCT 5.6.0; Windows NT 5.1)
374 Microsoft-Delivery-Optimization/10.0
Next, we can investigate the unique UA strings to find out who was making the request and where the request was going. In both of the below cases, we are pulling out the source and destination IPs from the Zeek log, along with the HTTP Host
header and the requested URI
. Finally, we include the UA string so that we can filter out only the UAs we found above.
cat http.log | zeek-cut id.orig_h id.resp_h host uri user_agent | grep 'client connection'
10.55.200.10 191.239.52.100 tele.trafficmanager.net /{AA35F099-DF2E-4104-8F15-FCB887FB32F3} client connection
This appears to be a tracking server of some kind. Some quick reconnaissance on the domain shows that it belongs to Microsoft. We can tentatively assume that this is benign metrics tracking.
cat http.log | zeek-cut id.orig_h id.resp_h host uri user_agent | grep 'Windows-Update-Agent/7.9.9600.18756 Client-Protocol/1.21'
10.55.200.10 52.183.118.171 statsfe2.update.microsoft.com /ReportingWebService/ReportingWebService.asmx Windows-Update-Agent/7.9.9600.18756 Client-Protocol/1.21
In this case, the domain is also pretty clearly a Microsoft property, and we can verify that Microsoft also owns the destination IP. This appears to be part of the normal update process, but since it is a unique UA string the next step might be to investigate the system 10.55.200.10
to see why it is not running the same version of software as all the other systems.
Note that the Host
header can be spoofed in HTTP requests so you should not rely on this alone. You should verify in the DNS logs that the domain listed there actually resolved to the IP address in the HTTP log. Furthermore, the destination IP address may change and no longer be associated with that domain in this age of cloud computing. It is quite common for services to rotate or recycle IP addresses for a given domain name.