AWS Interview Question-3

If you have to use in join level queries frequently then which distribution styles would you utilize for the table in Redshift?

Answer : KEY. A distribution key is a column that is used to determine the database partition in which a particular row of data is stored. A distribution key is defined on a table using the CREATE TABLE statement.  The columns of the unique or primary key are used as the distribution keys.

Which method can be used to disable automated snapshots in Red shift?

Answer : Set the retention period to -1

What is the default retention period for a Kinesis stream?

Answer : 1day

What is DynamoDB?

              DynamoDB is a non-relational database for applications that need performance at any scale.

  • NoSQL managed database service
  • Supports both key-value and document data model
  • It’s really fast
    • Consistent responsiveness
    • Single-digit millisecond
  • Unlimited throughput and storage
  • Automatic scaling up or down
  • Handles trillions of requests per day
  • ACID transaction support
  • On -demand backups and point-in-time recovery
  • Encryption at rest
  • Data is replication across multiple Availability zones
  • Service-level agreement (SLA)up to 99.999%
What are the non-relational Databases?

The Non-Relational databases are NoSQL databases.
These databases are categorized into four groups:

  • Key-value stores
  • Graph stores
  • Column stores
  • Document stores
List the Data Types supported by DynamoDB?

DynamoDB supports four scalar data types, and they are:

  • Number
  • String
  • Binary
  • Boolean

DynamoDB supports collection data types such as:

  • Number Set
  • String Set
  • Binary Set
  • Heterogeneous List
  • Heterogeneous Map

DynamoDB also supports Null values.

List the APIs provided by Amazon DynamoDB?
  • CreateTable
  • UpdateTable
  • DeleteTable
  • DescribeTable
  • ListTables
  • PutItem
  • BatchWriteItem
  • UpdateItem
  • DeleteItem
  • GetItem
  • BatchGetItem.
  • Query
  • Scan
what are global secondary indexes?

An index with a different partition and partition-and-sort key from those on the table is called global Secondary index.

List types of secondary indexes supported by Amazons DynamoDB?
  • Global Secondary index – It is an index with a partition or a partition sort key that is different from those on the table. The global secondary index is considered to be global because queries on the index can span all the items in a table, across all the partitions.
  • Local secondary index – An index that has the same partition key as that of the table but different sort key. It is considered to be “local” because every partition of the index is scoped to a table partition that has the same partition key.
How many numbers of global secondary indexes do you create per table?

We can create a maximum of 5 global secondary indexes per table.

Where Does DynamoDB Fit In?

Amazon Relational Database Service (RDS)

Support for Amazon Aurora. PostgreSQL. MySQL MariaDB. Oracle Database, and SQL Server

Amazon DynamoDB

Key-value and document database

Amazon ElastiCache

Managed. Redis- or Memcached compatible in-memory data store

Amazon Neptune

Graph database for applications that work with highly connected data sets

Amazon Redshift

Petabyte-scale data warehouse service

Amazon QLDB

Ledger database providing a cryptographically verifiable transaction log

Amazon DocumentDB MongoDB-compatible database service

Explain Partitions and Data Distribution.

DynamoDB stores data in partitions. A partition is an allocation of storage for a table, backed by solid-state drives (SSDs) and automatically replicated across multiple Availability Zones within an AWS Region.

To get the most out of DynamoDB throughput, create tables where the partition key has a large number of distinct values. Applications should request values fairly uniformly and as randomly as possible.

Table: Collection of data. DynamoDB tables must contain a name, primary key. and the required read and write throughput values. Unlimited size.

Partition Key: A simple primary key. composed of one attribute known as the

partition key This is also called the hash attribute.

Partition and Sort Key: Also Known as a composite primary key. this type of key comprises two attributes. The first attribute is the partition key. and the second attribute is the sort key S also called the range attribute

Explain DynamoDB Performance?

On Demand Capacity: Database series according to demand

Good for -new tables with unknown workloads

Applications with unpredictable traffic

Prefer to pay as you go

Provisioned Capacity

  • Allows us to have consistent and predictable performance
  • Specify expected read and write throughput requirements
  • Read Capacity Units (RCU)
  • Write Capacity Units (WCU)
  • Price is determined by provisioned capacity
  • Cheaper per request than On-Demand mode
  • Good option for

Applications with predictable traffic

Applications whose traffic is consistent or ramps gradually

Capacity requirements can be forecasted, helping to control costs

Both capacity modes have a limit of 40.000 RCUs and 40.000 WCUs.

You can switch between modes only once per 24 hours.

  • Track metrics (data points over time)
  • Create dashboards
  • Create alarms
  • Create rules for events
  • View logs

DynamoDB Metrics

  • ConsumedReadCapacityUnits
  • ConsumedWriteCapacityUnits
  • ProvisionedReadCapacityUnits
  • ProvisionedWriteCapacityUnits
  • ReadThrottleEvents
  • SuccessfulRequestLatency
  • SystemErrors
  • Throttled Requests
  • UserErrors
  • WriteThrottleEvents

Alarms can be created on metrics, taking an action if the alarm is triggered.

Alarms have three states:

  • INSUFFICIENT: Not enough data to judge the state — alarms often start in this state.
  • ALARM: The alarm threshold has been breached (e.g., > 90% CPU).
  • OK: The threshold has not been breached.

Alarms have a number of key components:

  • Metric: The data points over time being measured
  • Threshold: Exceeding this is bad (static or anomaly)
  • Period: How long the threshold should be bad before an alarm is generated
  • Action: What to do when an alarm triggers
  • SNS
  • Auto Scaling
  • EC2

Explain Below terminology.

Provisioned Throughput

Maximum amount of capacity an application can consume from a table or index. Throttled requests: ProvisionedThroughputExceededException

Eventually vs. Strongly Consistent Read

Eventually consistent reads might include stale data.

Strongly consistent reads are always up to date but are subject to network delays.

Read Capacity Units (RCUs)

One RCU represents one strongly consistent read request per second, or two eventually consistent read requests, for an item up to 4 KB in size.

Filtered query or scan results consume full read capacity.

For an 8 KB item size:

  • 2 RCUs for one strongly consistent read
  • 1 RCU for an eventually consistent read
  • 4 RCUs for a transactional read

Write vs. Transactional Write

Writes are eventually consistent within one second or less.

One WCU represents one write per second for an item up to 1 KB in size. Transactional write requests require 2 WCUs for items up to 1 KB.

Standard: 3 WCUs  Provisioned Throughput

Transactional: 6 WCUs      Calculations

Explain Scan
  • Returns all items and attributes for a given table
  • Filtering results do not reduce RCU consumption; they simply discard data
  • Eventually consistent by default, but the Consistent Read parameter can enable strongly consistent scans
  • Limit the number of items returned
  • A single query returns results that fit within 1 MB
  • Pagination can be used to retrieve more than 1 MB
  • Parallel scans can be used to improve performance
  • Prefer query over scan when possible; occasional real-world use is okay
  • If you are repeatedly using scans to filter on the same non-PK/SK attribute, consider creating a secondary index
Explain Query
  • Find items based on primary key values
  • Query limited to PK. PK+SK. or secondary indexes
  • Requires PK attribute
  • Returns all items with that PK value
  • Optional SK attribute and comparison operator to refine results
  • Filtering results do not reduce RCU consumption; they simply discard data
  • Eventually consistent by default, but the Consistent Read parameter can enable strongly consistent queries
  • Querying a partition only scans that one partition
  • Limit the number of items returned
  • A single query returns results that fit within 1 MB
  • Pagination can be used to retrieve more than 1 MB
Explain BatchGetltem.
  • Returns attributes for multiple items from multiple tables
  • Request using primary key
  • Returns up to 16 MB of data, up to 100 items
  • Get unprocessed items exceeding limits via UnprocessedKeys
  • Eventually consistent by default, but the Consi stentRead parameter can enable strongly consistent scans
  • Retrieves items in parallel to minimize latency
Explain BatchWriteltem
  • Puts or deletes multiple items in multiple tables
  • Writes up to 16 MB of data, up to 25 put or delete requests
  • Get unprocessed items exceeding limits via Unprocessed Iterns
  • Conditions are not supported for performance reasons
  • Threading may be used to write items in parallel
Explain Provisioned Capacity
  • Minimum capacity required
  • Able to set a budget (maximum capacity)
  • Subject to throttling
  • Auto scaling available
  • Risk of underprovisioning — monitor your metrics
  • Lower price per API call
  • S0.00065 per WCU-hour (us-east-1 )
  • S0.00013 per RCU-hour (us-east-1 )
  • S0.25 per GB-month (first 25 GB is free)
Explain On-Demand Capacity
  • No minimum capacity: pay more per request than provisioned capacity
  • Idle tables not charged for read/write, but only for storage and backups
  • No capacity planning required — just make API calls
  • Eliminates the tradeoffs of over- or under-provisioning
  • Use on-demand for new product launches
  • Switch to provisioned once a steady state is reached
  • $1.25 per million WCU (us-east-1)
  • $0.25 per million RCU (us-east-1 )