Using VirusTotal API v3 Data to Detect Malicious Activity — Part 1

6 min readAug 29, 2020

Part One — Tracking down possible malicious web activity with the help of Splunk and VirusTotal API v3 WHOIS data.

Background

If you’re unfamiliar with VirusTotal, it is a free service that allows security researchers to submit and search for malicious files, urls, domains, IP addresses, etc and their associated anti-virus detections. As the service has matured, VirusTotal offers some premium features for businesses and security researchers. A full list of their features and offerings can be found HERE, but for the sake of this article I’ll only be focusing on their Premium API v3 offering.

Combining VirusTotal’s immense community supported database of malware samples with Splunk is a match made in infosec heaven. This article aims to demonstrate how combining real-time threat intelligence with the correlating functionality of Splunk gives defenders an edge over adversaries when trying to track down possible compromised endpoints.

What We Need

Splunk
Splunk App — URL Toolbox
VirusTotal API Key (Premium API v3 Key recommended)

The URL Toolbox app for Splunk is not totally necessary but its a useful tool that helps parse domains, URI, and other relevant information from URLs, email addresses, and URLs. Splunk previously wrote a very detailed and helpful article as part of their “Hunting with Splunk: The Basics” series.

Configure the VirusTotal External Lookup in Splunk

This entire thing relies upon having the ability to perform an API call to the VirusTotal service using their API functionality. Luckily, Splunk gives us a very convenient and simple way to achieve this capability. External Lookups can be utilized to execute a python script or executable. I’m not going to walk through how to create the python script in this article, however, a good starting point can be found HERE. You can choose how to configure your environment, but for my purposes I ended up creating 3 different lookups; domain, IP, hash.

Once the external lookup is set it should be a simple matter to pipe results to the lookup and get back whatever data you configured to retrieve in your script. Here is an example of the output from my domain lookup:

Now, you can choose which fields you are most interested in to be returned when you perform the VirusTotal query. For the purpose of finding malicious traffic I am focused on a few different fields: WHOIS (whois_creation_date), last_analysis_positive, cert_issuer, and categories.

WHOIS — this date is the creation date extracted from the Domain’s whois information (UTC timestamp). NOTE — VirusTotal passively collects DNS information so that field may be blank if it hasn't collected that information yet.
Last Analysis Positive — the number of AV engines that consider the domain malicious.
Certificate Issuer — the certification authority who issued a certificate for the domain. Only will be available if the site has an SSL/TLS certificate associated with it.
Categories — categories web proxy services assigned to the domain.

Each of these fields can be used in a different way to detect malicious activity. For example, you may want to alert on domains whose category is malicious or suspicious. Or you may want to know if any domains your users are visiting has been identified as malicious by other vendors. For this article, we’re focused on the WHOIS date and the Certificate Issuer.

The Detection

The theory: Domains that have been recently registered using the Let’s Encrypt certification authority have a higher likelihood of hosting malicious content. If you’re not familiar with Let’s Encrypt, its a non-profit service that offers free SSL/TLS certificates for use on websites. Why focus on new domains? They are more likely to be hosting malicious content because of insecure configurations leading to compromise or because they were in intentionally purchased and setup to be used for this purpose. Why focus on Let’s Encrypt? Threat actors attempt to circumvent security controls and SSL/TLS encryption is one of the easiest ways to accomplish this since most companies are not performing MITM or SSL inspection at their edge network devices. This means if a threat actor can get a user to visit a website that has been categorized as non-malicious other security technologies are likely not going to detect malicious activity because they cannot see past the encrypted tunnel SSL/TLS provides.

Here is the full detection:

Methodology: provide the first time a process that is not a browser reached out to a domain and provide every host that reached out to that same domain.

| tstats summariesonly=t count last(event_time) as event_time values(dest) as dest where index=main sourcetype=sysmon
NOT process_name IN (chrome.exe,iexplore.exe,firefox.exe,opera.exe,microsoftedge.exe,safari)
by http_host

Methodology: Do NOT include null http_host fields and filter out IP addresses

| eval http_host = lower(http_host)
| eval is_ip = if(match(http_host, “\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}”),1,0)
| where isnotnull(http_host) AND is_ip=0

Methodology: use the URL Toolbox app to parse domain names to ensure we’re only pulling in the parent domain.

| eval list=”mozilla”
| `ut_parse_extended(http_host,list)`
| stats count dc(dest) as dc_dest last(event_time) as event_time by ut_domain

Methodology: I have the Alexa Top 1 Million Sites loaded into Splunk as a static lookup table. Using that to filter on known good domains. Yes, these can be compromised, but they are well known brands and will not be hosting malware for very long so the risk is very low. Additionally, filter out any domains where the distinct count of hosts visiting that domain within the search time is less than 10. This helps reduce the number of domains we’re going to search in VirusTotal data. Furthermore, the goal is to catch compromises early and hopefully 10 or more systems were not compromised within the search time (every 15 mins).

| lookup alexa_by_str.csv domain as ut_domain OUTPUTNEW domain
| eval on_alexa_list = if(match(domain,ut_domain),1,0)
| where on_alexa_list = 0
| where dc_dest <= 10

Methodology: Here i wrapped the vt_domain external lookup into a macro so i could reduce the SPL lines used to manipulate some of the fields. Effectively we’re preforming the lookup using our previously created external VirusTotal API lookup for each domain we previously filtered out of our data.

`virustotal_domain_enrich(ut_domain)`
| rename ut_domain as http_host
| fields — ut_* domain

Methodology: To determine the difference between the earliest event we capture and the WHOIS creation date/time stamp we need to convert that field value (which is just a normal text string) into UNIX time.

| eval epoch_whois = strptime(whois_creation_date,”%Y-%m-%d”)
| eval epoch_etime = strptime(event_time,”%Y-%m-%dT%H:%M:%S.%NZ”)

Next we’re going to determine the difference in time between the event_time and WHOIS creation time by subtracting the two UNIX timestamps then dividing them by 60 2x and 24 (also rounding up 2 spaces)

| eval diff_time = round((epoch_etime — epoch_whois)/60/60/24,2)
| eval cert_avail = if(isnotnull(cert_issuer),”yes”,”no”)

The following are totally optional but I found them to be useful additional filters. Essentially we’re ONLY going to be looking at domains where VirusTotal had a WHOIS creation timestamp, does not have an Alexa score (provided by VirusTotal), and has a value for the Certificate Authority field available.

| eval whois_avail = if(isnotnull(whois_creation_date),”yes”,”no”)
| eval cat_avail = if(isnotnull(category),”yes”,”no”)
| eval alexa_avail = if(isnotnull(popularity_alexa),”yes”,”no”)

Methodology: Only domains that are we’re interested for this purpose are those using the Let’s Encrypt cert authority and have been registered within the last 45 days.

| search (alexa_avail=no cert_avail=yes cert_issuer = “*Let’s Encrypt Authority*” diff_time <= 45) OR (alexa_avail=no whois_avail=yes diff_time <= 45) OR (alexa_avail=no last_analysis_positive >= 2)

Conclusion

By combining real-time threat intelligence using VirusTotal we’re able to determine that at least 3 hosts have communicated with a domain that was registered less than 14 days before those hosts first started communicating with it.

In Part 2 of this series we’ll investigate the hosts and determine what may be going on — stay tuned!