Logging is probably one of the most crucial areas when a security or operational incident needs to be investigated…. is what most technology practitioners would say.
But, monitoring logs for malicious or unusual activity, carrying out threat hunting are proactive techniques to help organisations catch infiltrations or compromise early.
This is KEY to reducing the impact of a cyber attack or production issue and hence minimising the cost associated with its aftermath.
From a security standpoint, there’s a famous quote by John Chambers, Former CEO – Cisco:
“There are only two types of organisations: those that have been hacked and those that don’t know it yet!”
This statement holds true for operational or production issues as well.
Now that we’ve put things in perspective, let’s carve out a log management strategy for a public cloud setup. We will deep dive into five AWS logging best practices that will help answer these questions – what to log, how to log, and how to scale.
Though we’ll mostly be calling out AWS terminology, one can safely assume a parallel component in a different public cloud setup as well.
Before we start figuring out what to log and what not to, it’s important to get our heads around the criticality or sensitivity of various components within your public cloud deployment, such as compute, storage, data systems etc.
Why is this important?
Well, for the simple reason that too less logging associated with credit card systems can lead to non-compliance, at the same time, enabling too much logging on let’s say internal non-critical HTTP services can lead to added costs and noise from a cyber security perspective.
When you’re working with public cloud deployments, infrastructure is extremely dynamic and it is nearly impossible to maintain static configurations like hosted deployments or network devices. For example, IP addresses change every time an application scales up or scales down.
This is where tagging your resources basis criticality helps, which can feed as a crucial input to the logging configuration. Thankfully most public cloud providers support tags!
Let’s take an example:
Name - prod-int-customerapi
Classification - sensitive
Environment - production
Just by looking at these tags, one can tell that the resource contains sensitive data, is a production system and is an internal application. More on tagging best practices can be found here.
This strategy can eventually stitch itself into your CI/CD or infra-as-code pipelines. Sounds great, doesn’t it? There’s a lot more!
This helps you automate your classification approach. Let’s look at the below resource name:
and here’s what we can infer:
Prod -> production
Int -> internal service
Payments -> application name
Cy5’s cloud security products and their Contextual Intelligence capabilities can inspect tags and resource names to automate classification for you!
Data Discovery Tools
What to log?
Once you’ve got your asset classification strategy sorted, establishing configurations for logging levels comes next.
Let’s use a simple 3×3 matrix that will help us clearly map out how log levels can be defined, on the basis of data classification and network reachability.
Take for instance, a publicly available server that hosts a web application – you would want to log every kind of traffic that lands on it. On the other hand, take an internal server that holds credit card data, again – you would want to log everything you possibly can. However, an internal server that hosts data that isn’t really sensitive, well, you might want to log only writes or updates to it.
The devil is in the details.
Let’s break this down a little now. We would recommend you consider logging against the following services in your public cloud environment.
Every AWS interaction gets logged in CloudTrail as an event and is a MUST have for any production AWS account for visibility, analysis and incident response. A CloudTrail log consists of the following key elements:
It enables you to query logs basis few of the above plus some additional fields as shown below.
Users can use the CloudTrail Event History section to quickly investigate operational or security issues.
CloudWatch is a logging service by AWS that helps it’s customers collect, store and monitor logs from various sources. It is also the de-facto logging service for AWS services. Apart from log storage, customers can create metrics around attributes and can generate alerts when those attributes change or cross certain thresholds.
AWS customers can integrate CloudTrail with CloudWatch to analyse CloudTrail logs at scale and quickly. However, a word of caution here – CloudWatch can turn out pretty expensive when use with large amounts of data. For larger volumes, consider using an Athena table instead.
ALB / ELB Access Logs
Not all load balancers are public, in fact most that you would create would end up being internal facing. Consider enabling ALB or ELB access logging for your public facing load balancers as they would give you meaningful insights into user (or attacker) access patterns, and are crucial when investigating an attack on a public facing application.
Check out this article for more on enabling ALB / ELB access logs.
For reasons similar to public facing ALB / ELB, CloudFront CDN deployments should log public requests.
You can have real time logs that contain detailed information about every request on a CloudFront distribution.
Adding more to this, the standard logging comes as an optional feature with CloudFront service by AWS and is free of cost. Although you’d have to pay for the log retention at destination (S3).
VPC Flow Logs
Your VPC is loaded with network activity all the time, but not all network traffic is as critical for security purposes as others. For example, your internet facing components such as public EC2 instances, NAT gateways, the DMZ; or cardholder data environments (PCI-DSS CDE) should be monitored for network activity regardless. Whereas internal subnets, EC2 instances, might not be as critical and hence depending on the organisation’s risk appetite, one might choose to ignore them from VPC logging.
Enabling VPC flow logs is as easy as:
S3 Access Logs
Leveraging our classic data classification dialogue, not all S3 buckets might require logging; but it would be crucial to enable access logging for public and sensitive buckets. Our article on S3 security best practices goes deep into other S3 security aspects as well.
API Gateway Trace Logging
Where necessary enable trace logs in your sensitive and public facing API gateways (in case they aren’t associated with CloudFront where logging might already be enabled).
Apart from these, consider enabling logs for your data systems such as ElasticSearch, RDS, DynamoDB etc keeping the 3×3 matrix as a guiding stick.
CloudTrail vs CloudWatch
Low Latency, Moderate to High Cost
Moderate Latency, Low Cost
Security Data Lake
- Parquet is a columnar data structure and works better compared to json (size and querying efficiency)
- Try and keep your logging infrastructure as cloud native as possible – by using s3, athena and quicksight
- Hadoop or big data infrastructure works great, but one needs to manage it