DynamoDB Interview Questions

What is DynamoDB?
  • DynamoDB is a non-relational database for applications that need performance at any scale.
  • NoSQL managed database service
  • Supports both key-value and document data model
  • It’s really fast
    • Consistent responsiveness
    • Single-digit millisecond
  • Unlimited throughput and storage
  • Automatic scaling up or down
  • Handles trillions of requests per day
  • ACID transaction support
  • On -demand backups and point-in-time recovery
  • Encryption at rest
  • Data is replication across multiple Availability zones
  • Service-level agreement (SLA)up to 99.999%
What are the non-relational Databases?
  • The Non-Relational databases are NoSQL databases. These databases are categorized into four groups:
    • Key-value stores
    • Graph stores
    • Column stores
    • Document stores
List the Data Types supported by DynamoDB?

DynamoDB supports four scalar data types, and they are:

  • Number
  • String
  • Binary
  • Boolean

DynamoDB supports collection data types such as:

  • Number Set
  • String Set
  • Binary Set
  • Heterogeneous List
  • Heterogeneous Map

DynamoDB also supports Null values.

List the APIs provided by Amazon DynamoDB?
  • CreateTable
  • UpdateTable
  • DeleteTable
  • DescribeTable
  • ListTables
  • PutItem
  • BatchWriteItem
  • UpdateItem
  • DeleteItem
  • GetItem
  • BatchGetItem.
  • Query
  • Scan
what are global secondary indexes?

An index with a different partition and partition-and-sort key from those on the table is called global Secondary index.

List types of secondary indexes supported by Amazons DynamoDB?
  • Global Secondary index – It is an index with a partition or a partition sort key that is different from those on the table. The global secondary index is considered to be global because queries on the index can span all the items in a table, across all the partitions.
  • Local secondary index – An index that has the same partition key as that of the table but different sort key. It is considered to be “local” because every partition of the index is scoped to a table partition that has the same partition key.
How many numbers of global secondary indexes do you create per table?

We can create a maximum of 5 global secondary indexes per table.

In your project, you have data in Dynamo DB tables and you have to perform complex data analysis queries on the data (stored In the Dynamo DB tables). How will you do this?

Answer : We can copy the data on AWS(Amazon Web Service) Red shift and then perform the complex queries.

Where Does DynamoDB Fit In?
  • Amazon Relational Database Service (RDS):
    • Support for Amazon Aurora. PostgreSQL. MySQL MariaDB. Oracle Database, and SQL Server
  • Amazon DynamoDB:
    • Key-value and document database
  • Amazon ElastiCache:
    • Managed. Redis- or Memcached compatible in-memory data store
  • Amazon Neptune
    • Graph database for applications that work with highly connected data sets
  • Amazon Redshift
    • Petabyte-scale data warehouse service
  • Amazon QLDB
    • Ledger database providing a cryptographically verifiable transaction log
  • Amazon DocumentDB MongoDB-compatible database service
Which AWS Service filter, transform messages (coming from sensor) and store them as time series data in Dynamo DB?

Answer : loT Rules Engine. The Rules Engine is a component of AWS IoT Core. The Rules Engine evaluates inbound messages published into AWS IoT Core and transforms and delivers them to another device or a cloud service, based on business rules you define.

Explain Partitions and Data Distribution.
  • DynamoDB stores data in partitions. A partition is an allocation of storage for a table, backed by solid-state drives (SSDs) and automatically replicated across multiple Availability Zones within an AWS Region.
  • To get the most out of DynamoDB throughput, create tables where the partition key has a large number of distinct values. Applications should request values fairly uniformly and as randomly as possible.
  • Table: Collection of data. DynamoDB tables must contain a name, primary key. and the required read and write throughput values. Unlimited size.
  • Partition Key: A simple primary key. composed of one attribute known as the partition key This is also called the hash attribute.
  • Partition and Sort Key: Also Known as a composite primary key. this type of key comprises two attributes. The first attribute is the partition key. and the second attribute is the sort key S also called the range attribute
Your application is writing a large number of records to a Dynamo DB table in one region. There is a requirement for a secondary application to take in the changes to the Dynamo DB table every 4 hours and process the updates accordingly. How will you process here?

Answer : Use Dynamo DB streams to monitor the changes in the Dynamo DB table.Once you enable DynamoDB Streams, it captures a time-ordered sequence of item-level modifications in a DynamoDB table and stores the information for up to 24 hours. As we know,applications can access a series of stream records, which contain an item change, from a DynamoDB stream in near real time. So we can use Dynamo DB streams to monitor the changes in the Dynamo DB table. 

Explain DynamoDB Performance?
  • On Demand Capacity:
    • Database series according to demand
    • Good for -new tables with unknown workloads
    • Applications with unpredictable traffic
    • Prefer to pay as you go
  • Provisioned Capacity
    • Allows us to have consistent and predictable performance
    • Specify expected read and write throughput requirements
    • Read Capacity Units (RCU)
    • Write Capacity Units (WCU)
    • Price is determined by provisioned capacity
    • Cheaper per request than On-Demand mode
    • Good option for Applications with predictable traffic
    • Applications whose traffic is consistent or ramps gradually
    • Capacity requirements can be forecasted, helping to control costs
    • Both capacity modes have a limit of 40.000 RCUs and 40.000 WCUs.
    • You can switch between modes only once per 24 hours.
You need to ensure in your project that each user can only access their own data in a particular DynamoDB table. Many users already have accounts with a third-party identity provider, such as Face book. Google. or Login with Amazon. How would you implement this requirement?

Answer : Use Web identity federation and register your application with a third-party identity provider such as Google, Amazon, or Face book.Create a DynamoDB table and call it “Test.”

  1. Create Partition key and a Sort key. Complete creation of the table Test.
  2. Navigate  to “Access control” and select ‘Facebook’ as the identity provider or any other as per your requirement.
  3. Select the “Actions” that you want to allow your users to perform.
  4. Select the “Attributes” that you want your users to have access to.
  5. Select Create policy and copy the code generated in the policy panel. 
Explain DynamoDB Items?
  • Item: A table may contain multiple items. An item is a unique group of attributes. Items are similar to rows or records in a traditional relational database. Items are limited to 400 KB.
  • Attribute: Fundamental data element. Similar to fields or columns in an RDBMS.
Explain Data Types.
  • Data Types
    • Scalar: Exactly one value — number, string, binary, boolean, and null. Applications must encode binary values in base64-encoded format before sending them to DynaboDB.
    • Document: Complex structure with nested attributes (e.g.. JSON) — list and map.
  • Document Types
    • List: Ordered collection of values
      • FavoriteThings: [“Cookies”, “Coffee”, 3.14159]
    • Map: Unordered collection of name-value pairs (similar to JSON)
      • {
      • Day: ’Monday*,
      • UnreadEsalls: 42, lte«sOnMyOesk: |
      • “Coffee Cup”,
      • “Telephone”,
      • {
      • Pens: ( Quantity : 3},
      • Pencils: { Quantity : 2),
      • Erasers: { Quantity : 1>
      • }
      • ]
      • }
    • Set: Multiple scalar values of the same type — string set, number set, binary set.
      • [“Black”, “Green”, “Red”]
      • [42.2, -19, 7.5, 3.14]
      • [“U3Vubnk=”, “UmFpbnk=”, “U25vd3k=”]
Your team is planning on using the AWS loT Rules service to allow loT enabled devices to write information to Dynamo DB. What action must be done to ensure that the rules will work as intended?

Answer : Ensure that the right lAM permissions to AWS Dynamo DB is given.IAM user – An IAM user is an identity within your AWS account that has specific custom permissions (for example, permissions to create a table in DynamoDB).

You have an application that uses Dynamo DB to store JSON data ( Read and Write capacity of the Dynamo DB table is enables). Now you have deployed your application and You are unsure of the amount of the traffic that will be received by the application during the deployment time. How can you ensure that the Dynamo DB is not highly throttled and does not become a bottleneck for the application?

Answer : Dynamo DB’s auto scaling feature will make sure that no read/write throttling will happen due to heavy traffic. To configure auto scaling in DynamoDB, you set the minimum and maximum levels of read and write capacity in addition to the target utilization percentage. Auto scaling uses Amazon CloudWatch to monitor a table’s read and write capacity metrics. 

Explain DynamoDB Table.

Creating a Table

  • Table names must be unique per AWS account and region.
  • Between 3 and 255 characters long
  • UTF-8 encoded
  • Case-sensitive
  • Contain a-z. A-Z. 0-9, _ (underscore). • (dash), and. (dot)
  • Primary key must consist of a partition key or a partition key and sort key.
  • Only string, binary, and number data types are allowed for partition or sort keys
  • Provisioned capacity mode is the default (free tier).
  • For provisioned capacity mode, read/write throughput settings are required
  • Secondary indexes creates a local secondary index.
  • Must be created at the time of table creation
  • Same partition key as the table, but a different sort key
  • Provisioned capacity is set at the table level.
  • Adjust at any time or enable auto scaling to modify them automatically
  • On-demand mode has a default upper limit of 40.000 RCU/WCU — unlike auto scaling, which can be capped manually

Create DynamoDB table

DynamoDB is a schema-loss database that only requires a table name and primary key. The table’s primary key is made up of one or two attributes that uniquely identity items, partition the data, and sort data within each partition.

How will you analyze a large set of data that updates from Kinesis and Dynamo DB?

Answer : Elastic search. Elasticsearch allows you to store, search, and analyze huge volumes of data quickly and in near real-time and give back answers in milliseconds. It’s able to achieve fast search responses because instead of searching the text directly, it searches an index.

Explain DynamoDB Console Menu Items.

DynamoDB Console Menu Items

  • Dashboard
  • Tables

Storage size and item count are not real time

  • Items: Manage items and perform queries and scans.
  • Metrics: Monitor CloudWatch metrics.
  • Alarms: Manage CloudWatch alarms.
  • Capacity: Modify a table s provisioned capacity.
  • Free tier allows 25 RCU, 25 WCU. and 25 GB for 12 months
  • Cloud Sandbox within the Cloud Playground
  • Indexes: Manage global secondary indexes.
  • Global Tables: Multi region, multi master replicas
  • Backups: On-demand backups and point in time recovery Triggers: Manage triggers to connect DynamoDB streams to Lambda functions.
  • Access control: Set up fine grained access control with v/eb identity federation.

Tags: Apply tags to your resources to help organize and identify them.

  • Backups
  • Reserved capacity
  • Preferences
  • DynamoDB Accelerator (DAX)
How can you apply aws cli in DynamoDB?

Installing the AWS CLI

  • Preinstalled on Amazon Linux and Amazon Linux 2
  • Cloud Sandbox within the Cloud Playground

Obtaining IAM Credentials

  • Option 1 : Create IAM access keys in your own AWS account.
  • Option 2: Use Cloud Sandbox credentials.
  • Note the access key ID and secret access key.

Configuring the AWS CLI

  • aws configure
  • aws sts get-caller-identity
  • aws dynamodb help

Using DynamoDB with the AWS CLI

  • aws dynamodb create-table
  • aws dynamodb describe-table
  • aws dynamodb put-item
  • aws dynamodb scan

Object Persistence Interface

  • Do not directly perform data plane operations
  • Map complex data types to items in a DynamoDB table
  • Create objects that represent tables and indexes
  • Define the relationships between objects in your program and the tables that store those objects
  • Call simple object methods, such as save. load, or delete
  • Available in the AWS SDKs for Java and NET
How can we use cloudwatch in dynamodb?
  • CloudWatch monitors your AWS resources in real time, providing visibility into resource utilization, application performance, and operational health.
    • Track metrics (data points over time)
    • Create dashboards
    • Create alarms
    • Create rules for events
    • View logs
  • DynamoDB Metrics
    • ConsumedReadCapacityUnits
    • ConsumedWriteCapacityUnits
    • ProvisionedReadCapacityUnits
    • ProvisionedWriteCapacityUnits
    • ReadThrottleEvents
    • SuccessfulRequestLatency
    • SystemErrors
    • Throttled Requests
    • UserErrors
    • WriteThrottleEvents
  • Alarms can be created on metrics, taking an action if the alarm is triggered.
  • Alarms have three states:
    • INSUFFICIENT: Not enough data to judge the state — alarms often start in this state.
    • ALARM: The alarm threshold has been breached (e.g., > 90% CPU).
    • OK: The threshold has not been breached.
  • Alarms have a number of key components:
    • Metric: The data points over time being measured
    • Threshold: Exceeding this is bad (static or anomaly)
    • Period: How long the threshold should be bad before an alarm is generated
    • Action: What to do when an alarm triggers
    • SNS
    • Auto Scaling
    • EC2

Explain Below terminology

  • Provisioned Throughput
    • Maximum amount of capacity an application can consume from a table or index. Throttled requests: ProvisionedThroughputExceededException
  • Eventually vs. Strongly Consistent Read
    • Eventually consistent reads might include stale data.
    • Strongly consistent reads are always up to date but are subject to network delays.
  • Read Capacity Units (RCUs)
    • One RCU represents one strongly consistent read request per second, or two eventually consistent read requests, for an item up to 4 KB in size.
    • Filtered query or scan results consume full read capacity.
    • For an 8 KB item size:
      • 2 RCUs for one strongly consistent read
      • 1 RCU for an eventually consistent read
      • 4 RCUs for a transactional read
  • Write vs. Transactional Write
    • Writes are eventually consistent within one second or less.
    • One WCU represents one write per second for an item up to 1 KB in size. Transactional write requests require 2 WCUs for items up to 1 KB.
    • Standard: 3 WCUs  Provisioned Throughput
      • Transactional: 6 WCUs      Calculations
Explain Scan
  • Returns all items and attributes for a given table
  • Filtering results do not reduce RCU consumption; they simply discard data
  • Eventually consistent by default, but the Consistent Read parameter can enable strongly consistent scans
  • Limit the number of items returned
  • A single query returns results that fit within 1 MB
  • Pagination can be used to retrieve more than 1 MB
  • Parallel scans can be used to improve performance
  • Prefer query over scan when possible; occasional real-world use is okay
  • If you are repeatedly using scans to filter on the same non-PK/SK attribute, consider creating a secondary index
Explain Query
  • Find items based on primary key values
  • Query limited to PK. PK+SK. or secondary indexes
  • Requires PK attribute
  • Returns all items with that PK value
  • Optional SK attribute and comparison operator to refine results
  • Filtering results do not reduce RCU consumption; they simply discard data
  • Eventually consistent by default, but the Consistent Read parameter can enable strongly consistent queries
  • Querying a partition only scans that one partition
  • Limit the number of items returned
  • A single query returns results that fit within 1 MB
  • Pagination can be used to retrieve more than 1 MB
Explain BatchGetltem.
  • Returns attributes for multiple items from multiple tables
  • Request using primary key
  • Returns up to 16 MB of data, up to 100 items
  • Get unprocessed items exceeding limits via UnprocessedKeys
  • Eventually consistent by default, but the Consi stentRead parameter can enable strongly consistent scans
  • Retrieves items in parallel to minimize latency
Explain BatchWriteltem
  • Puts or deletes multiple items in multiple tables
  • Writes up to 16 MB of data, up to 25 put or delete requests
  • Get unprocessed items exceeding limits via Unprocessed Iterns
  • Conditions are not supported for performance reasons
  • Threading may be used to write items in parallel
Explain Provisioned Capacity
  • Minimum capacity required
  • Able to set a budget (maximum capacity)
  • Subject to throttling
  • Auto scaling available
  • Risk of underprovisioning — monitor your metrics
  • Lower price per API call
  • S0.00065 per WCU-hour (us-east-1 )
  • S0.00013 per RCU-hour (us-east-1 )
  • S0.25 per GB-month (first 25 GB is free)
Explain On-Demand Capacity
  • No minimum capacity: pay more per request than provisioned capacity
  • Idle tables not charged for read/write, but only for storage and backups
  • No capacity planning required — just make API calls
  • Eliminates the tradeoffs of over- or under-provisioning
  • Use on-demand for new product launches
  • Switch to provisioned once a steady state is reached
  • $1.25 per million WCU (us-east-1)
  • $0.25 per million RCU (us-east-1 )
Explain Point-in-Time Recovery (PITR)

Helps protect your DynamoDB tables from accidental writes or deletes. You can restore your data to any point in time in the last 35 days.

  • DynamoDB maintains incremental backups of your data.
  • Point-in-time recovery is not enabled by default.
  • The latest restorable timestamp is typically five minutes in the past.

After restoring a table, you must manually set up the following on the restored table:

  • Auto scaling policies
  • AWS Identity and Access Management (1AM) policies
  • Amazon CloudWatch metrics and alarms
  • Tags
  • Stream settings
  • Time to Live (TTL) settings
  • Point-in-time recovery settings

What are partitions?
  • They are the underlying storage and processing nodes of Dynamo DB
  • Initially, one table equates to one partition
  • Initially, all the data for that table is stored by that one partition
  • We don’t directly control the number of partitions
  • A partition can store 10 GB
  • A partition can handle 3000 RCU and 1000 WCU
  • So there is a capacity and performance relationship to the number of partitions THIS IS A KEY CONCEPT
  • Design tables and applications to avoid I/O “hot spots”/”hotkeys”
  • When >10 GB or >3000 RCU OR >1000 WCU required a new partition is added and the data is spread between them over time.
  • Partitions will automatically increase
  • While there is an automatic split of data across partitions, there is no automatic decrease when load/performance reduces
  • Allocated WCU and RCU is split between partitions
  • Each partition key is…
  • Limited to 10GB data
  • Limited to 3000 RCU 1000 WCU
  • Key concepts
  • • Be aware of the underlying storage infrastructure – partitions
  • • Be aware of what influences the number of partitions
  • • Capacity
  • • Performance (WCU / RCU )
  • • Be aware that they increase, but they don’t decrease
Explain Indexes in DynamoDB?

Dynamo DB offers two main data retrieval operations, SCAN and QUERY Without indexes.
Indexes allow secondary representations of the data in a table.
It allows efficient queries on those representations
Indexes come in two forms – Global Secondary and Local Secondary

Explain Local Secondary Indexes(LSI)

• LSI’s contain Partition, Sort, and New Sort + optional projected values
• Any data written to the table is copied Async to any LSI’s
• Shares RCU and WCU with the table
• A LSI is a sparse index. An index will only have an ITEM if the index sort key attribute is contained in the table item (row)

• Storage and performance considerations with LSI’s
• Any non-key values by default are not stored in an LSI
• If you query an attribute that is NOT projected, you are charged for the entire ITEM cost from pulling it from the main table
• Take care with planning your LSI and item projections – its important

Explain Global Secondary Indexes

• It shares many of the same concepts as a Local secondary index, BUT, with a GSI we can have an alternative Partition & sort key
• Options for attribute projection
• KEYS.ONLY – New partition and sort keys, old partition key and if applicable, old sort key
• INCLUDE – Specify custom projection values
• ALL – Projects all attributes
• Unlike LSI’s where the performance is shared with the table, RCU and WCU are defined on the GSI – in the same way as the table
• As with LSI, changes are written to the GSI asynchronously
• GSI’s ONLY support eventually consistent reads

What is a DynamoDB stream ?

• When a stream is enabled on a table, it records changes to a table and stores those values for 24 hours
• A stream can be enabled on a table from the console or API
• But can only be read or processed via the streams endpoint and API requests
• streams.dynamodb.us-west-2.amazonaws.com

• AWS guarantee that each change to a Dynamo DB table occur in the stream once and only once AND….
• That ALL changes to the Table occur in the stream in near realtime

• A Lambda function triggered when items are added to a dynamo DB stream, performing analytics on data
• A Lambda function triggered when a new user signup happens on your web app and data is entered into a users table

How will you Put Item in DynamoDB through Boto3?
import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('employees')
table.put_item(
    Item={
        'emp_id': '3',
        'name': 'vikas',
        'salary': 2000
    }
)

How will you get and delete item from DynamoDB through Boto3?
import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('employees')
resp = table.get_item(
#Key is dictionary
    Key={
        'emp_id': '3'
    }
)

print(resp['Item'])

table.delete_item(
    Key={
        'emp_id': '3'
    }
)
How will you insert batch records into Dynamodb through Boto3?
import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('employees')

with table.batch_writer() as batch:
    for x in range(100):
        batch.put_item(
            Item={
                'emp_id': str(x),
                'name': 'Name-{}'.format(x)
            }
        )