You have setup Red shift cluster in your AWS development account in us-east-i. Now your manager decided to move the cluster to the production account in us-west-i. What will you do in first step?
Answer : Create a manual snapshot of the Red shift cluster.As we know,A snapshot contains data from any databases that are running on your cluster.It contains information about your cluster, including the number of nodes, node type, and master user name.If you backup your cluster from a snapshot, Amazon Redshift uses the cluster information to create a new cluster.
What is AWS Glue ?
- AWS Glue is a cloud service that prepares data for analysis through automated extract, transform and load processes.
- It supports MySQL, Oracle, Microsoft SQL Server and PostgreSQL databases that run on Amazon Elastic Compute Cloud instances.
- It is a fully-managed, pay-as-you-go, extract, transform, and load (ETL) service that automates the steps of data preparation for analytics.
- It is a flexible scheduler that handles dependency resolution and job monitoring.
- AWS Glue is serverless, it means ,there’s no infrastructure to set up or manage.
- AWS Glue consists of:
- Central metadata repository-AWS Glue Data Catalog
- ETL engine
- Flexible scheduler
John is new to AWS Glue service and he has given new task to process data through Glue. What would be his action items?
- Step 1: He has to define a crawler to populate AWS Glue Data Catalog with metadata table definitions.
- Step 2: He has to point crawler at a data store and the crawler creates table definitions in the Data Catalog.
- Step 3: He can write a script to transform data or he can provide the script in the AWS Glue console. Like Spark script.
- Step 4: He can run job or he can set it up to start when a specified trigger occurs.
- Step 5: Once job runs, a script extracts data from data source, transforms the data, and loads it to data target.
- Step 6: This script runs in an Apache Spark environment in AWS Glue.
What will happen when a crawler Runs?
- It classifies data to determine the format, schema, and associated properties of the raw data.
- It groups data into tables or partitions and data is grouped.
- It writes metadata to the Data Catalog.
- A crawler can crawl multiple data stores in a single run.
After job completion, the crawler creates or updates one or more tables in Data Catalog.
What is Data Catalog in Glue?
- It is a central repository and persistent metadata store to store structural and operational metadata.
- It stores its table definition, job definitions, and other control information to manage your AWS Glue environment.
- Table definitions are available for ETL and also available for querying in Athena, EMR, and Redshift Spectrum to provide a view of the data between these services.
- Each AWS account has one AWS Glue Data Catalog per region.
What are the 2 types of nodes in a Red shift Cluster?
Answer : Leader Node,Compute Node.
Redshift Architecture and Its Components
- Leader Node.
- Compute Node.
- Node Slices.
- Massively Parallel Processing.
- Columnar Data Storage.
- Data Compression.
- Query Optimizer.
You are trying to use SQL Client tool from an EC2 Instance, but you are not able to connect to the Red shift Cluster. What must you do?
Answer : Modify the VPC Security Groups.
Open the Amazon VPC console at https://console.aws.amazon.com/vpc/ .
- In the navigation pane, choose Security Groups.
- Select the security group to update.
- Choose Actions, Edit inbound rules or Actions, Edit outbound rules.
- Modify the rule entry as required.
- Choose Save rules.
Which method should you use for publishing and analyzing the logs (logs from the EC2 Instances need to be published and analysed for new application’s feature)?
Answer : Use consumers to analyze the logs
Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/ .
- In the left navigation pane, choose Instances, and select the instance.
- Choose Actions, Monitor and troubleshoot, Get system log.
What is Virtual Machine?
A virtual machine (VM) is a part of a physical machine that’s isolated by software from other VMs on the same physical machine.
It consists of CPUs, memory, networking interfaces, and storage.
The physical machine is called the host machine, and the VMs running on it are called guests.
A hypervisor is responsible for isolating the guests from each other and for scheduling requests to the hardware, by providing a virtual hardware platform to the guest system.
Question: Which term is the most closely related to a pilot light DR strategy?
Ans – Active/passive
Question: Which term describes OpsWorks?
Ans – Configuration management
How will you launch a virtual machine?
1. Open the AWS Management Console at https://console.aws.amazon.com.
2. Make sure you’re in the N. Virgina (US East) region
3. Find the EC2 service in the navigation bar under Services, and click it.
4. Click Launch Instance to start the wizard for launching a virtual machine.
Examples of instance families and instance types:
Instance type | Virtual CPUs | Memory | Description | Typical use case | Hourly cost (USD) |
t2.nano | 1 | 0.5 GB | Smallest and cheapest instance type, with moderate baseline performance and the ability to burst CPU performance above the baseline | Testing and development environments, and applications with very low traffic | 0.0059 |
m4.large | 2 | 8 GB | Has a balanced ratio of CPU, memory, and networking performance | All kinds of applications, such as medium databases, web servers, and enterprise applications | 0.1 |
r4.large | 2 | 15.25 GB | Optimized for memory-intensive applications with extra memory | In-memory caches and enterprise application servers | 0.133 |
There are instance families optimized for different kinds of use cases.
- T family—Cheap, moderate baseline performance with the ability to burst to higher performance for short periods of time
- M family—General purpose, with a balanced ration of CPU and memory
- C family—Computing optimized, high CPU performance
- R family—Memory optimized, with more memory than CPU power compared to M family
- D family—Storage optimized, offering huge HDD capacity
- I family—Storage optimized, offering huge SSD capacity
- X family—Extensive capacity with a focus on memory, up to 1952 GB memory and 128 virtual cores
- F family—Accelerated computing based on FPGAs (field programmable gate arrays)
- P, G, and CG family—Accelerated computing based on GPUs (graphics processing units)
Instance types and families
The names for different instance types are all structured in the same way. The instance family groups instance types with similar characteristics. AWS releases new instance types and families from time to time; the different versions are called generations. The instance size defines the capacity of CPU, memory, storage, and networking.
The instance type t2.micro tells you the following:
- The instance family is called t. It groups small, cheap virtual machines with low baseline CPU performance but the ability to burst significantly over baseline CPU performance for a short time.
- You’re using generation 2 of this instance family.
- The size is micro, indicating that the EC2 instance is very small.
Connecting to Your Virtual Machine
Installing additional software and running commands on your virtual machine can be done remotely. To log in to the virtual machine, you have to figure out its public domain name:
1. Click the EC2 service in the navigation bar under Services, and click Instances in the submenu at left to jump to an overview of your virtual machine.
Select the virtual machine from the table by clicking it. it shows the overview of your virtual machines and the available actions.
2. Click Connect to open the instructions for connecting to the virtual machine.
Shutting down a Virtual Machine
To avoid incurring charges, you should always turn off virtual machines you’re not using them. You can use the following four actions to control a virtual machine’s state:
- Start—You can always start a stopped virtual machine. If you want to create a completely new machine, you’ll need to launch a virtual machine.
- Stop—You can always stop a running virtual machine. A stopped virtual machine doesn’t incur charges, expect for attached resources like network-attached storage. A stopped virtual machine can be started again but likely on a different host. If you’re using network-attached storage, your data persists.
- Reboot—Have you tried turning it off and on again? If you need to reboot your virtual machine, this action is what you want. You won’t lose any persistent data when rebooting a virtual machine because it stays on the same host.
- Terminate—Terminating a virtual machine means deleting it. You can’t start a virtual machine that you’ve already terminated. The virtual machine is deleted, usually together with dependencies like network-attached storage and public and private IP addresses. A terminated virtual machine doesn’t incur charges.
Warning
The difference between stopping and terminating a virtual machine is important. You can start a stopped virtual machine. This isn’t possible with a terminated virtual machine. If you terminate a virtual machine, you delete it.
Stopping or terminating unused virtual machines saves costs and prevents you from being surprised by an unexpected bill from AWS. You may want to stop or terminate unused virtual machines when:
- You have launched virtual machines to implement a proof-of-concept. After finishing the project, the virtual machines are no longer needed. Therefore, you can terminate them.
- You are using a virtual machine to test a web application. As no one else uses the virtual machine, you can stop it before you knock off work, and start it back up again the following day.
- One of your customers canceled their contract. After backing up relevant data, you can terminate the virtual machines that had been used for your former customer.
After you terminate a virtual machine, it’s no longer available and eventually disappears from the list of virtual machines.
Cleaning up
Terminate the virtual machine named mymachine that you started at the beginning:
1. Open the EC2 service from the main navigation, and select Instances from the submenu.
2. Select the running virtual machine by clicking the row in the table.
3. In the Actions menu, choose Instance State > Terminate.
Changing the Size of a Virtual Machine
It is always possible to change the size of a virtual machine. This is one of the advantages of using the cloud, and it gives you the ability to scale vertically. If you need more computing power, increase the size of the EC2 instance.
In this section, you’ll learn how to change the size of a running virtual machine. To begin, follow these steps to start a small virtual machine:
1. Open the AWS Management Console, and choose the EC2 service.
2. Start the wizard to launch a new virtual machine by clicking Launch Instance.
3. Select Ubuntu Server 16.04 LTS (HVM) as the AMI for your virtual machine.
4. Choose the instance type t2.micro.
5. Click Review and Launch to start the virtual machine.
6. Click Edit Security Groups to configure the firewall. Choose Select an Existing Security Group and select the security group named ssh-only.
7. Click Review and Launch to start the virtual machine.
8. Check the summary for the new virtual machine, and click Launch.
9. Choose the option Choose an Existing Key Pair, select the key pair mykey, and click Launch Instances.
10. Switch to the overview of EC2 instances, and wait for the new virtual machine’s state to switch to Running.
Starting a Virtual Machine in Another Data Center
AWS offers data centers all over the world. Take the following criteria into account when deciding which region to choose for your cloud infrastructure:
- Latency—Which region offers the shortest distance between your users and your infrastructure?
- Compliance—Are you allowed to store and process data in that country?
- Service availability—AWS does not offer all services in all regions. Are the services you are planning to use available in the region?
- Costs—Service costs vary by region. Which region is the most cost-effective region for your infrastructure?
Question: What must be done after creating an EBS volume before it can be used?
Ans – It must be attached to an instance
Question: How are custom AMIs most commonly used after they are created?
Ans – Launch new instances
What is IAM?
- AWS Identity and Access Management (IAM) enables us to manage access to AWS services and resources securely.
- Using IAM, we can create and manage AWS users and groups, and use permissions to allow and deny their access to AWS resources.
Why IAM is Used?
- The purpose of AWS IAM is to help AWS admins to manage AWS user identities and their varying levels of access to AWS resources. For example, we can create multiple AWS users and provide them individual security credentials to connect their AWS resources.
- Doing so, organizations gain granular control over who has permission to access their AWS resources.
- IAM allows you to manage users and their level of access to the AWS Console. Its key features as below:
- Centralized control of your AWS account
- Shared Access to your AWS account
- Granular Permissions
- Identity Federation (including Active Directory, Facebook etc)
- Multifactor Authentication
- Provide temporary access for users/devices and services where necessary
- Allows you to set up your own password rotation policy
- Integrates with many different AWS services
- Supports PCI DSS Compliance
What is the Key Terminology For IAM?
- Users: End Users such as people, employees of an organization etc.
- Groups: A collection of users. Each user in the group will get the permissions of the group.
- Policies: Polices are made up of documents, called Policy documents. These documents are in a format called JSON and they give permissions as to what a User/Group/Role can do.
- Roles: You create roles and then assign them to AWS Resources.
How will you create IAM role and perform its actions?
- Step 1: Login to AWS Console. Navigate to IAM service.


- Here we can see IAM user sign-in link. If you want to change this then we can change through Customize link. Let’s click on Customize and follow steps

- After this, url is changed.

Step 2: Lets go to next step: Activate MFA on your root account. Click on Manage MFA. Post that click, we will get another page:

Click on “Continue to Security Credentials”. Now Activate MFA.

Select Virtual MFA device and continue.

If you forget credentials, then you can use QR code to reset it.
Enter details as per QR code:

And click on Assign MFA.

Now its successfully assigned.
Step 3: Navigate to home page and check access status:

We got Activate MFA on your Account status.Till now we have got two security status access. Now we will try to get 3rd security access.
Now click on Create individual IAM users and create user:
Enter Username, mark checkbox and provide next permission.


Step 4: Create Group: Provide Group Name and select AdministratorAccess. Click on Create Group.


Group is created.
Step 5: Click on next and user is created.

Here we can find Access Key ID and Secret access Key and please download csv file. Now we can see IAM status:

Now we have got four Security Status as green. We will try to get last security Status access.
Navigate to Policy and see AdministratorAccess:

Step 6: Now Navigate to Apply an IAM password policy and click on Manage Password Policy.

And select conditions as per your business need:

Apply password policy.
And navigate to Dashboard:
We can see, all 5 conditions are completed. Now security status is green.

If we will open csv file, then we can see password and other details.
Now we have seen IAM. We have Roles. Let’s create Role.
Navigate to Roles section and click on Create Role.

Select any policy name. Select S3 related policyname.

Role is created.


In this way we can create IAM ,user,group and assign permissions.
Do we use IAM only for 1 region?
- IAM is universal. It does not apply to regions currently.
- The “root account” is simply the account created when first setup our AWS account. It has complete Admin access.
- New Users have NO permissions when first created.
- New Users are assigned Access Key ID & Secret Access Keys when first time created. These are not the same as a password. We cannot use the Access key ID & Secret Access Key to Login in to the console. We can use this to access AWS via the APIs and Command Line. We only get to view these once. If we lose them, we must regenerate them. So, we should keep this in a secure location.
- It is recommended always to setup Multifactor Authentication on our root account.
- We can create and customize our password as per rotation policies.
Explain AWS GLUE Crawler.
- It is a program that connects to a data store (source or target), progresses through a prioritized list of classifiers to determine the schema for your data and then creates metadata tables in the Data Catalog.
- It scan various data stores to infer schema and partition structure to populate the Glue Data Catalog with corresponding table definitions and statistics.
- It can be scheduled to run periodically. Doing so, the metadata is always up-to-date and in-sync with the underlying data.
- It automatically add new tables, new partitions to existing table, and new versions of table definitions.
- We can determine the schema of complex unstructured or semi-structured data, which can save a ton of time.
Question: On the Identity and Access Management dashboard, which tasks are listed under Security Status?
Ans – Activate MFA on your root account
Deleting your root access keys
Create individual IAM users
Question: You are configuring multi-factor authentication (MFA) for your root account on AWS. Which options are available on the Manage MFA Device dialog?
Ans – Other hardware MFA device
U2F security key
Virtual MFA device
What is AWS S3?
Amazon Simple Storage Service (S3) is an important service where individuals, applications, and a long list of AWS services keep their data.
It Maintains backup archives, log files, and disaster recovery images.
It is running analytics on big data at rest and Hosting static websites.
What’s the difference between object and block storage?
With block-level storage, data on a raw physical storage device is divided into individual blocks whose use is managed by a file system. The file system is responsible for allocating space for the files and data that are saved to the underlying device and for providing access whenever the OS needs to read some data.
An object storage system like S3 provides a flat surface to store data. This simple design avoids some of the OS-related complications of block storage and allows anyone easy access to any amount of storage capacity.
When you write files to S3, they’re stored along with up to 2 KB of metadata. The metadata is made up of keys that establish system details like data permissions and the appearance of a file system location within nested buckets.
You are configuring an S3 lifecycle rule transition for current S3 object versions that will archive data using Glacier. What should be a valid configuration?
Days after creation
As per AWS Documentation,
When you add a Lifecycle configuration to a bucket, the configuration rules apply to both existing objects and objects that you add later. For example, if you add a Lifecycle configuration rule today with an expiration action that causes objects with a specific prefix to expire 30 days after creation, Amazon S3 will queue for removal any existing objects that are more than 30 days old.
What is AWS Backup?
- We can use the AWS Backup service to store backups in the AWS Cloud.It’s simply acting as a gateway or a proxy to store backups in the cloud.
- We can backup a number of AWS items using AWS Backup, including EBS volumes (used by EC2 instances or virtual machines).We can also backup RDS databases, DynamoDB, and even Amazon Elastic File System, or EFS.
- In order to configure this, you need to create a backup plan that has scheduling, retention, and even options to add tags to the recovery points that are stored as backups.
- AWS Backup includes scheduling, which relates to the recovery point objective. The RPO, recovery point objective, is a disaster recovery term that reflects the maximum tolerable amount of data loss measured in time. We have backup retention rules within the backup plan, and lifecycle rules to change the storage class of items that are backed up.
- We need a backup vault to store the recovery points.We can specify AWS Resources to be assigned to backup plan, or we can simply select items based on their AWS resource ID.
- We can monitor backup activity using the centralized AWS backup console.We can also perform on-demand backups. We don’t have to wait for the schedule, and we can also perform restorations from previously taken backups.
Explain S3 Service Architecture.
- S3 files are organized into buckets.By default, each user is allowed to create as many as 100 buckets for each AWS account.The name you choose for your bucket must be globally unique within the entire S3 system.
- Example:Here is the URL you would use to access a file called filename that’s in a bucket called bucketname over HTTP: https://s3.amazonaws.com/bucketname/filename
- Through AWS CLI: Same file can be accessed through below command:
- s3://bucketname/filename
How to configure AWS Backup?
- Sign in to the AWS Management Console and open the IAM console .
- In the IAM console, choose Roles in the navigation pane, and choose Create role.
- Choose AWS Service Roles, and then choose Select for AWS Backup.
- Choose Proceed.
Which PowerShell cmdlet is used to create a new S3 bucket?
New-S3Bucket
Explain S3 Block Public Access.
- By default, new buckets, access points, and objects don’t allow public access. However, users can modify bucket policies, access point policies, or object permissions to allow public access.
- In the S3 Management Console, we get a listing of buckets.
- For example, click on an existing bucket.
- Navigate to Permissions tab.The page contains four tabs called: Overview, Properties, Permissions, and Management. The Overview tab is open.When we go to the Permissions tab, we have a number of options.It contains four options called: Block public access, Access Control List, Bucket Policy, and CORS configuration.
- Click the Block public access. Click the Access Control List.Click the Bucket Policy option.click Edit and block all public access. save that setting.
- We can block public access to buckets and objects granted through any access control list.
- We can block public access to buckets and objects granted through new public bucket policies.
How will you create S3 Bucket through Boto3?
import boto3
client = boto3.client('s3')
response = client.create_bucket(
ACL='private',
Bucket='javahomecloud123',
CreateBucketConfiguration={
'LocationConstraint': 'ap-south-1'
}
)
You planned using the AWS management console to upload large files to an S3 bucket. What is the maximum file upload size when using the GUI?
160 GB
S3 Bucket Encryption and the GUI
- Encryption provides data confidentiality and you can enable default encryption on an S3 bucket, so that items uploaded to S3 will automatically be encrypted.
- We can also pick individual items within a bucket and determine whether they are encrypted individually.
- In the S3 Management Console, click on bucket to open up the settings for it.
- Click the bucket and the corresponding page opens.
- The page contains four tabs called: Overview, Properties, Permissions, and Management.The Overview tab is open. It contains: Upload, Create folder, Download and Actions options.
- Click the Properties tab.
- By Default encryption is Disabled.
- Click on that panel. Currently, it’s set to None.
- we can choose either AES-256, Advanced Encryption Standard 256 bits server-side encryption, with Amazon Managed keys. Or I can choose Key Management Service, or AWS-KMS managed keys, where I can choose the keys that get used for encryption.
Which configuration item defines the backup frequency and retention?
Backup plan
Question: If you would like to configure S3 bucket event notification. You have clicked on the S3 bucket name in console. What should you click next?
Ans – Properties
Question: You are viewing S3 bucket metrics through the S3 management console. Which metrics are shown by default?
Ans – BucketSizeBytes
NumberofObjects
Question: By Default, S3 bucket server access logging is disabled or not ?
Ans – Logging is disabled by default
Question: Which is a valid option for S3 inventory report file formats?
Ans- CSV
Question: How will you connect a Windows instance securely to an S3 bucket over the AWS global infrastructure?
Ans – Through Gateway endpoint
How will you upload file to S3 bucket?
import boto3
client = boto3.client('s3')
file_reader = open('create_bucket.py').read()
response = client.put_object(
ACL='private',
Body=file_reader,
Bucket='javahomecloud123',
Key='create_bucket.py'
)
You need to archive a file named File1. Which AWS CLI command should you use?
aws s3 cp s3://bucket1/folder1/file1.doc s3://bucket1/folder1/file1.doc –storage-class GLACIER
Question: What is a hybrid cloud storage service that gives us on-premises access to virtually unlimited cloud storage and storage tier management through a virtual appliance?
Ans – Storage Gateway
Question: What is the preferred method, written in JSON, to provide access to the objects stored in an S3 bucket?
Ans – Bucket policy
Question: Which of these statements is NOT true concerning elastic block store volume security?
Ans – In the management console, you can encrypt all volumes in the region by default
You are using the AWS Backup console. You need to restore a backup. Where should you click?
Ans – Protected resources
Question: What is the default S3 bucket encryption setting?
None
Question: You need to encrypt an S3 bucket folder. Which AWS CLI command should you use?
aws s3 encrypt
aws s3 cp
Ans – aws s3 cp
Question: Which PowerShell cmdlet is used to enable bucket encryption?
Enable-S3BucketEncryption
Set-S3BucketEncrypytion
Ans – Set-S3BucketEncrypytion
How will you delete object from S3?
import boto3
client = boto3.client('s3')
response = client.delete_object(
Bucket='javahomecloud123',
Key='create_bucket.py'
)
What is the AWS CLI command used to create a new S3 bucket?
aws s3api create-bucket
Question: Which of the following is the best reason to enable S3 bucket versioning?
Accidental object deletion
Auditing
Ans – Accidental object deletion
Question: What must be configured before S3 bucket cross-region replication can be enabled?
Auditing
Versioning
Ans – Versioning
How will you list objects from S3?
import boto3
client = boto3.client('s3')
response = client.list_objects(
Bucket='javahomecloud123'
)
for content in response['Contents']:
print(content['Key'])
What must be configured for data archiving to Glacier?
Vault
Question: What is a benefit of using CloudFront?
Encryption of data at rest
Reduced network latency
Ans – Reduced network latency
CloudFront is Amazon Web Services content delivery network solution. You can serve up content stored in an S3 bucket. You can even specify other external web apps or websites using a URL, so you can serve up HTTP web server content through CloudFront. You can configure CloudFront to cache dynamic content. You can enable streaming directly from S3 buckets. So, if you want to stream video or audio, you can do that. Now, CloudFront is designed to reduce network latency.
Question: Which of the following are valid CloudFront distribution types?
Web
REST
RTMP
Ans – Web
RTMP
You are using the AWS management console to create a new S3 bucket. What is the default permission for the new bucket?
Block all public access
Which protocol does Elastic File System use?
NFS
How will you find list of buckets through AWS CLI?
aws s3api list-buckets –query “Buckets[].Name”
How will you create S3 bucket through powerShell?
New-S3Bucket –BucketName bucket1234 –Region ca-central-1
How will you upload files in S3 Using the GUI?
- In Amazon Web Services, S3 buckets serve as cloud storage solutions.
- Open S3 Management Console. It contains a list of buckets.
- If the bucket is empty, then we can organize files on a storage drive.
- Create folders. A table with four columns is displayed. The column headers are Name, Last modified, Size, and Storage class.
- Use the bucket settings for encryption ,leave the default and click Save.
- So we got a folder that we created within our S3 bucket.
- we can create a subordinate folder within The Overview tab.
- We can upload files through The Upload dialog and Select files.
- We can drag and drop files from other parts of our screen to this location to upload them or we can click Add files, which I will do.
- When you’ve selected some files, you’ll see the number of files listed at the top along with the Size, which can be important in giving you an indication of how long it might take to upload these depending on your Internet connection speed.
- We can see the Target path where it’s going to be uploaded to the Projects folder in our bucket.
- If we forgot to add files, we can just click Add more files.
- We can also click the x to remove items if we don’t want to upload .
- If I have both Read and Write permissions. I can add other AWS accounts that have permissions to these uploaded items.
- For encryption, we can select a file and go to the Actions menu and make a change as Change encryption.
How will you upload files through cli?
aws s3 cp d:\samplefile s3://bucket44 –exclude “*“ –include “*.txt”
How will you upload file through PowerShell?
PS C:\> Write-S3Object -BucketName bucket17 -Fil,e d:\licensekey.txt -Key Projects/licensekey.txt -CannedACLName Private
Explain S3 Object classes?

AWS Elastic File System Overview
- Amazon Elastic File System is a cloud storage service provided by Amazon Web Services designed to provide scalable, elastic, concurrent with some restrictions, and encrypted file storage for use with both AWS cloud services and on-premises resources.
- Web serving & content management. Amazon EFS provides a durable, high throughput file system for content management systems and web serving applications that store and serve information for a range of applications like websites, online publications, and archives.
- AWS Elastic File System is an NFS version 4 mountable file system that we define centrally in the AWS cloud. You can configure either on-premises, devices like virtual machines to mount this file system, or EC2 instances defined in AWS, to mount that file system. Essentially, it’s an NFS shared folder in the cloud.
Configure Elastic File System
- Open the EFS Management Console.Click the Create file system button. specify the VPC affiliation.
- A page Create file system opens. It is divided into three parts. The first part is a toolbar. The second part is a navigation pane. The third part is a content pane. A page titled: Configure file system access is open in the content pane.
- Now down below, for each Subnet and Availability Zone, we have mount targets enabled. The mount target is what you actually make a connection to when you are mounting the EFS file system into your local file system. So it’s going to automatically assign an IP address to that. And it’s automatically got Security groups to control network traffic for those two mount targets.
- Configure optional settings is selected in the navigation pane and the corresponding page is open in the content pane. click Next Step and add some metadata, so some tag key and value pairs.
Explain Amazon S3 Glacier
- Amazon S3 Glacier is a service that’s all about cloud archiving.Amazon S3 Glacier will determine how long it takes to retrieve or restore data from the archive. It can be from minutes upto hours. S3 Glacier is that it’s inexpensive data storage for data that’s infrequently accessed.It is archived for the long-term.
- we can configure policies for regulatory compliance. For example, we can set the permissions to the vault such as “who is allowed to upload archives to the vault”, “who’s allowed to get the result of archive job output”, “who’s allowed to delete the vault.” We can even initiate a vault lock, that way we can set it to be immutable. In other words, it’s read-only. And that might be required for regulatory compliance in some cases. So immutable means that archives cannot be altered. We can also configure Amazon S3 Glacier as the target for an S3 bucket’s lifecycle management settings.
Sometimes we try to fetch metadata from a file from an S3 bucket and we get 404 Not Found issue. What should be the reasons?
If we make a HEAD or GET request for the S3 key name before creating the object, S3 provides eventual consistency for read-after-write. If file is not uploaded correctly in S3 then we get 404 Not Found issue.
Explain S3 Cross Region Replication
- Amazon S3 now supports cross-region replication, a new feature that automatically replicates data across AWS regions.
- With cross-region replication, every object uploaded to an S3 bucket is automatically replicated to a destination bucket in a different AWS region that you choose.
- You can enable AWS S3 bucket Cross-Region Replication to increase the availability of data stored in the S3 bucket.
- As the name implies, Cross-Region Replication allows you to essentially create a replica or copy of the contents of an entire bucket.
- you can specify only a subset of content from the bucket to be replicated to an alternate geographical location.
- Now, the benefit is that if we experience some kind of a regional outage or disaster due to weather or anything of that nature,
- then we can get a copy of the data elsewhere.
If you click on any bucket then you can find four tabs : Overview, Properties, Permissions, and Management.
- Consider this bucket’s Versioning is already enabled, where Versioning stores multiple versions of objects stored in S3.
- As you need to have Versioning enabled if you want to enable Cross-Region Replication. And it will remind you, as you’re enabling that, if you haven’t.
- We can click on the Management tab. It contains options called: Lifecycle, Replication, Analytics, Metrics, and Inventory.
- Click an option called: Replication.We can click the Add rule button to add a replication rule.
- A dialog box called: Replication rule opens.It contains four steps called: Set source, Set destination, Configuration rule options, and Review.
- The options for Set source step are displayed.
Now, we can replicate the entire bucket contents to an alternate location, to a bucket in a different region.
Explain CloudFront.
- CloudFront is Amazon Web Services content delivery network solution. You can serve up content stored in an S3 bucket. You can even specify other external web apps or websites using a URL, so you can serve up HTTP web server content through CloudFront.
- You can configure CloudFront to cache dynamic content. You can enable streaming directly from S3 buckets. So, if you want to stream video or audio, you can do that. Now, CloudFront is designed to reduce network latency.
Why do we need AWS CLI?
Consider you have multiple services in AWS and you want to manage them through terminal session. Then you can configure AWS CLI and can manage your AWS services from a terminal session. You can control them and automate them as per business requirement.
What is AWS CLI?
The AWS Command Line Interface (CLI) is a unified tool to manage your AWS services. With just one tool to download and configure, you can control multiple AWS services from the command line and automate them through scripts.
Now lets understand its practical through Lab. We will connect AWS CLI and will use our services like S3, EC2 etc.
Step 1: Login to AWS Console. Navigate to IAM service.

Step 2: Navigate to User and create user.
Click on Left side Pane – Users link. Enter UserName ,mark checkbox and provide next permission.

Step 3: Create Group: Provide Group Name and select AdministratorAccess. Click on Create Group.


Group is created.
Step 4: Click on next and user is created.

Here we can find Access Key ID and Secret access Key and please download csv file.
Step 5: Create an EC2 Instance and connect to cmd.(follow chapter 1- EC2 Instance Creation).
How to login EC2 instance through command line?
At the end , we will get below EC2 connection:
[ec2-user@ip-172-11-1-111 ~]$
- Type command aws s3 ls to see s3 buckets.

It means , its not configured. So now lets configure aws cli.
Type command as \
Aws configure

Post this command, it will ask above details as
Access Key ID
Secret Access Key
Default region name
And default output format.
Now type command
Aws s3 ls
It will list out all S3 buckets present in aws region.
Now lets navigate to root.
And move to directory
.aws.
Then list out through ls command.
It will list out config and credentials.
We can see credentials details through below command:
Nano credentials.

Q) Is Roles more secure?
Ans: Roles are more secure compare to storing your access key and secret access key on individual EC2 instances. Roles are easier to manage. It can be assigned to an EC2 instance after it is created using both the console & command line. And it is universal — you can use this in any region.
- in the volume to Amazon S3, where it is stored in multiple Availability Zones.
Does AWS provide any API to control AWS services?
VMs are software, if you want to start them remotely, you need hardware that can handle and fulfill your request. AWS provides an application programming interface (API) that can control every part of AWS over HTTP. Calling the HTTP API is very low-level and requires a lot of repetitive work, like authentication, data (de)serialization. AWS offers tools on top of the HTTP API that are easier to use. Those tools are:
- Command-line interface (CLI)—With one of the CLIs, you can make calls to the AWS API from your terminal.
- Software development kit (SDK)—SDKs, available for most programming languages, make it easy to call the AWS API from your programming language of choice.
- AWS CloudFormation—Templates are used to describe the state of the infrastructure. AWS CloudFormation translates these templates into API calls.Everything is available through the API. You can start a virtual machine with a single API call, create 1 TB of storage, or start a Hadoop cluster over the API.
What is Resource Tagging and how to do this?
- Tagging AWS resources means adding additional metadata, such as tying it to a project, a department, or a cost center.
- A tag is a label that you assign to an AWS resource. Each tag consists of a key and an optional value, both of which you define.
- Tags enable you to categorize your AWS resources in different ways, for example, by purpose, owner, or environment.
- Tagging can facilitate billing or searching and filtering for certain types of resources in the AWS cloud.
- In the AWS Management Console, when we create a new resource then we can work with tagging.
- For example, click the Create bucket button in S3 console.
- A dialog box labeled: Create bucket opens. It contains the following steps: Name and region, Configure options, Set permissions, and Review.
- The Name and region step is selected and the corresponding page is open. It includes a Bucket name text box and a Region drop-down list box.
- Provide any Bucket name and navigate to a section labeled: Tags. It includes Key and Value text boxes.
- For example, specify details for a project.
- In the Key text box, type Project. In the Value text box, type XYZ.
- click Next and accept all of the defaults to create the bucket.
- So, now we’ve created a new bucket that’s been tagged with a specific project XYZ.
- Now, we can also modify tags for an existing item. So, it’s not only during creation that we can tag resources.
- click on an existing bucket and navigate to Tags.
- And click on Tags and provide any new value.
- We can now see there are two tags for this specific resource.
How do I tag an existing resource in AWS?
- Find AWS resources to tag
- Sign in to the AWS Management Console, choose Resource Groups, and then choose Tag Editor.
- Choose at least one resource type from the Resource type drop-down list.
Which resources Cannot be tagged in AWS?
AWS spending that can’t be tagged
How do you automatically tag Amazon EC2 resources in response to API events?
- Clone the solution repo for that AWS resource.
- Select a CloudTrail trail service.
- Store your required AWS resource tags.
- Create the resource-auto-tagger Lambda function.
- Create a rule in CloudWatch Events.
What are the parts required for a tag?
- Each tag has two parts:
- A tag key (for example, CostCenter , Environment , or Project ). Tag keys are case sensitive.
- An optional field known as a tag value (for example, 1111 or Production ). Like tag keys, tag values are case sensitive.
Why is tag used?
We use tags to aid classification, mark ownership, note boundaries, and indicate online identity.
How do you tag Lambda?
- Open the Functions page on the Lambda console.
- Choose a function.
- Choose Configuration and then choose Tags.
- Under Tags, choose Manage tags.
- Enter a key and value. To add additional tags, choose Add new tag.
- Choose Save.
Question: Which CLI command is used to import a VM disk file?
Ans – aws ec2 import-image
Question: Which CLI command is used to import a VM disk image to S3?
Ans- aws ec2 import-image
Explain AWS GLUE Crawler.
- It is a program that connects to a data store (source or target), progresses through a prioritized list of classifiers to determine the schema for your data and then creates metadata tables in the Data Catalog.
- It scan various data stores to infer schema and partition structure to populate the Glue Data Catalog with corresponding table definitions and statistics.
- It can be scheduled to run periodically. Doing so, the metadata is always up-to-date and in-sync with the underlying data.
- It automatically add new tables, new partitions to existing table, and new versions of table definitions.
- We can determine the schema of complex unstructured or semi-structured data, which can save a ton of time.
When do I use a Glue Classifier in project?
- It reads the data in a data store.
- If it identifies the format of the data then it generates a schema.
- It provides classifiers for common file types, such as CSV, JSON, AVRO, XML, and others.
- AWS Glue provides a set of built-in classifiers, but you can also create custom classifiers.
- You can set up your crawler with an ordered set of classifiers.
- When the crawler invokes a classifier, the classifier determines whether the data is recognized or not.
What is Trigger in AWS Glue?
It is an ETL job and we can define triggers based on a scheduled time or event.
John joined new company where he is working in migration project.His project moved into serverless Apache Spark-based platform from ETL.
Then which service is recommended for Streaming?
AWS Glue is recommended for Streaming when your use cases are primarily ETL and when you want to run jobs on a serverless Apache Spark-based platform.
How will you import data from Hive Metastore to the AWS Glue Data Catalog?
Migration through Amazon S3:
Step 1: Run an ETL job to read data from your Hive Metastore
and it will export the data(Extract database, table, and partition objects) to an intermediate format in Amazon S3
Step 2:Import that data from S3 into the AWS Glue Data Catalog through AWS Glue ETL job.
Direct Migration:
You can set up an AWS Glue ETL job which extracts metadata from your Hive metastore and loads it into your AWS Glue Data Catalog through an AWS Glue connection.
Relational Database Management Systems (RDBMS) and NoSQL (or non-relational) databases.
Relational Databases
Relational databases provide a common interface that lets users read and write from the database using commands or queries written using SQL.
A relational database can be categorized as either an Online Transaction Processing (OLTP) or Online Analytical Processing (OLAP) database system.
Amazon Relational Database Service (Amazon RDS) significantly simplifies the setup and maintenance of OLTP and OLAP databases. Amazon RDS provides support for six popular relational database engines:
MySQL,
Oracle,
PostgreSQL,
Microsoft SQL Server,
MariaDB,
and Amazon Aurora.
Let’s understand about DWH and NoSQL DB.
Data Warehouses:
Many companies split their relational databases into two different databases: one database for OLTP transactions, and the other database as their data warehouse for OLAP. OLTP transactions occur frequently and are relatively simple. OLAP transactions occur much less frequently but are much more complex.
Amazon RDS is used for OLTP, but it can also be used for OLAP. Amazon Redshift is a high-performance data warehouse designed specifically for OLAP use cases. Sometimes Amazon RDS with Amazon Redshift is used in the same application and periodically extract recent transactions and load them into a reporting database.
NoSQL Databases
Many application use Hbase, MongoDB, Cassandra, CouchDB, Riak,and Amazon DynamoDBto store large volumes of data with high transaction rates.
We can run any type of NoSQL database on AWS using Amazon EC2, or we can choose Amazon DynamoDB to deal with the heavy lifting involved with building a distributed cluster spanning multiple data centers.
What Is ElastiCache?
ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. The service improves the performance of web applications by allowing you to retrieve information from fast, managed, in-memory caches,instead of relying entirely on slower disk-based databases. ElastiCache supports two open-source in-memory caching engines:
Amazon Relational Database Service (Amazon RDS)
Q: What is Amazon RDS?
Amazon RDS is a managed service that makes it easy to set up and operate. It provides cost-efficient and resizable capacity, while managing time-consuming database administration tasks.
It supports Amazon Aurora, MySQL, MariaDB, Oracle, SQL Server, and PostgreSQL database engines. We can run RDS on premises using Amazon RDS on Outposts and Amazon RDS on VMware.
It gives us access to the capabilities of MySQL, MariaDB, Oracle, SQL Server, or PostgreSQL database. This means that the code, applications, and tools we already use today with our existing databases should work with Amazon RDS.
RDS has two key features;
Multi-AZ – For Disaster Recovery
Read Replicas – For Performance
Data Warehousing databases use different type of architecture both from a database perspective and infrastructure layer. Amazon’s Data Warehouse Solution Is Called Redshift.
Lets understand RDS concept through Lab.
Step 1: Login to AWS Console and navigate to RDS:

Step 2: Click on Create Database

Step 3: Select MySQL and navigate to next field:

Step 4: Select Templates as free Tier and provide Database Name as cloudvikas.

Step 5: Next enter Credentials details:

Step 6: Fill details for remaining fields as below:

Next

After this, fill Connectivity section details:

After this, navigate to VPC security group:
Fill details as per below screenshot:




Next, we must fill details as below:

Now click on Create DataBase and database is created.


Database cloudvikas is created.


This is RDS lab and have learned its creation process.
Now please delete database else its cost will be generated after few days.

Now we will learn about Multi-AZ & RDS – Back UPS, Read Replicas.
Backup and Recovery
Amazon RDS provides a operational model for backup and recovery procedures. There are two different types of Backups for RDS:
Automated Backups
Database Snapshots
By using a combination of both techniques, we can design a backup recovery model to protect application data.
Automated Backups:
An automated backup is an Amazon RDS feature that continuously tracks changes and backs up our database. Automated Backups are enabled by default. Since the retention period can be between 1 and 35 days. So in that period, we can recover our database at any point in time.
Automated Backups used to take a full daily snapshot and store transaction logs. When we do a recovery, AWS will first choose the most recent daily back up, and then apply transaction logs.
After that the backup data is stored in S3 and we get free storage space equal to the size of database. We should be aware of that when you delete a DB Instance, all automated backup snapshots are deleted and cannot be recovered. Manual snapshots, however, are not deleted.
Automated backups used to happen daily during a configurable 30-minute maintenance window – called the backup window. Automated backups are kept for a configurable number of days, called the backup retention period.
Manual DB Snapshots
We can perform manual DB snapshots at any time. A DB snapshot is initiated and can be created as frequently. Then restore the DB Instance in the DB snapshot. DB snapshots can be created with the Amazon RDS console also.
Recovery
Amazon RDS permits you to recover your database quickly whether you are performing automated backups or manual DB snapshots. A new DB Instance is created when we restore. As soon as the restore is complete, we should associate custom DB parameter or security groups used by the instance from which you restored.
High Availability with Multi-AZ
One of the most powerful features of Amazon RDS is Multi-AZ deployments, which allows you to create a database cluster across multiple Availability Zones. Multi-AZ allows you to place a secondary copy of your database in another Availability Zone for disaster recovery purposes. When you create a Multi-AZ DB Instance, a primary instance is created in one Availability Zone and a secondary instance is created in another Availability Zone.
Amazon RDS automatically performs a failover in the event of any of the following:
Loss of availability in primary Availability Zone
Loss of network connectivity to primary database
Compute unit failure on primary database
Storage failure on primary database
Now lets do its practical and will understand its Lab session.
Step 1: Navigate to Database and click on Actions link. We can see multiple options to perform on created Database.

Step 2: Click on Modify and turn on Multi-AZ deployment

Next, click on continue. Here we can see Potential performance impact message and select Apply immediately. Now click on Modify DB instance button.

Step 3: Now navigate to Database and click on configuration tab:
We can see multi az option is as Yes.

We came to know below points:
RDS runs on virtual machines
You cannot log in to these operating systems however.
Patching of the RDS Operating System and DB is Amazon’s responsibility
RDS is NOT Serverless
Aurora Serverless IS Serverless
Read Replicas
Can be Multi-AZ.
Used to increase performance.
Must have backups turned on.
Can be in different regions.
Can be Aurora or MySQL.
Can be promoted to master, this will break the Read Replica
How can you Migrate data from AWS Glue to Hive Metastore through Amazon S3?
We can use two AWS Glue jobs here.
The first job extracts metadata from databases in AWS Glue Data Catalog and loads them into S3. The first job is run on AWS Glue Console.
The second job loads data from S3 into the Hive Metastore. The second can be run either on the AWS Glue Console or on a cluster with Spark installed.
How can you Migrate data from AWS Glue to AWS Glue?
We can use two AWS Glue jobs here.
The first extracts metadata from specified databases in an AWS Glue Data Catalog and loads them into S3.
The second loads data from S3 into an AWS Glue Data Catalog.
What is Time-Based Schedules for Jobs and Crawlers ?
We can define a time-based schedule for crawlers and jobs in AWS Glue. When the specified time is reached, the schedule activates and associated jobs to execute.
Which chart you use for comparing measure values over time in Amazon Quick sight?
Answer : Line charts.Use line charts to compare changes in measure values over period of time, for the following scenarios:
- One measure over a period of time.
- Multiple measures over a period of time.
- One measure for a dimension over a period of time.
Your application is writing a large number of records to a Dynamo DB table in one region. There is a requirement for a secondary application to take in the changes to the Dynamo DB table every 4 hours and process the updates accordingly. How will you process here?
Answer : Use Dynamo DB streams to monitor the changes in the Dynamo DB table.Once you enable DynamoDB Streams, it captures a time-ordered sequence of item-level modifications in a DynamoDB table and stores the information for up to 24 hours. As we know,applications can access a series of stream records, which contain an item change, from a DynamoDB stream in near real time. So we can use Dynamo DB streams to monitor the changes in the Dynamo DB table.
Your project has a Red shift cluster for peta byte-scale data warehousing and your project manager wants to reduce the overall total cost of running Red shift cluster. How will you meet the needs of the running cluster, while still reducing total overall cost?
Answer : Disable automated and manual snapshots on the cluster. To disable automated snapshots, set the retention period to zero. If you disable automated snapshots, Amazon Redshift stops taking snapshots and deletes any existing automated snapshots for the cluster.
You are working with a Kinesis Stream. What is used to group data by shard within a stream?
Answer : Partition Key.A partition key is used to group data by shard within a stream. Kinesis Data Streams segregates the data records belonging to a stream into multiple shards.
You need to ensure in your project that each user can only access their own data in a particular DynamoDB table. Many users already have accounts with a third-party identity provider, such as Face book. Google. or Login with Amazon. How would you implement this requirement?
Answer : Use Web identity federation and register your application with a third-party identity provider such as Google, Amazon, or Face book.Create a DynamoDB table and call it “Test.”
- Create Partition key and a Sort key. Complete creation of the table Test.
- Navigate to “Access control” and select ‘Facebook’ as the identity provider or any other as per your requirement.
- Select the “Actions” that you want to allow your users to perform.
- Select the “Attributes” that you want your users to have access to.
- Select Create policy and copy the code generated in the policy panel.
When troubleshooting slowness on an EMR cluster, which of the following node types does not need to be investigated for issues?
Answer : Core Nodes
- Step 1: Gather Data About the slowness Issue
- Step 2: Check the Environment
- Step 3: Examine the Log Files
- Step 4: Check Cluster and Instance Health
- Step 5: Check for Suspended Groups
- Step 6: Review Configuration Settings
- Step 7: Examine Input Data