Big Data Objective 1

Question: What is the purpose of the Hadoop File System, or HDFS?

Abstract away the complexity of distributing large data files across multiple data nodes

Every data file has multiple copies and HDFS tracks all the copies

Every file is cached on all distributed HDFS data nodes for faster file system I/O

Provides local resource access to distributed file shares

Ans:-

Abstract away the complexity of distributing large data files across multiple data nodes

Every data file has multiple copies and HDFS tracks all the copies

Question: What are the two functional parts that make up Hadoop?

NTFS

YARN

HDFS

SPARK

Ans:-

YARN

HDFS

Question: What are the design principles of Hadoop?

Move processing, not data

Embrace failure

Dumb hardware and smart software

Share nothing

Performance, not stability

Build applications, not infrastructure

Ans:-

Move processing, not data

Embrace failure

Dumb hardware and smart software

Share nothing

Build applications, not infrastructure

Question: What are the two primary parts of MapReduce?

Mapper – executes initial data crunch on data nodes

Reducer – aggregates data into final output

Mapmaker – creates a set of mapper files for processing

Reduction – handles the summation of data before post-processing

Ans:-

Mapper – executes initial data crunch on data nodes

Reducer – aggregates data into final output

Question: What are some features of YARN that enables it to scale massively?

Data locality

Trivial to write compute jobs

Share nothing

Move code to data

Ans:-

Data locality

Share nothing

Move code to data

Question: What are some of the key responsibilities for the master server in a Hadoop cluster?

Processing – execute instructions and perform work on data

Health Checks – monitor the health of all worker nodes

Scheduling – schedule and reschedule jobs

Coordinate – manage cluster of worker nodes

Error Handling – handle all failures and errors

Data Location and Movement – map the direction for data movement

Ans:-

Health Checks – monitor the health of all worker nodes

Scheduling – schedule and reschedule jobs

Coordinate – manage cluster of worker nodes

Error Handling – handle all failures and errors

Data Location and Movement – map the direction for data movement

Question: What are some key attributes for choosing Spark for processing Big data?

Can use cached datasets in memory to increase performance

Can be scaled beyond the HDFS limitations

Integrates higher level libraries, for example, streaming and machine learning

Can store data without writing it to disk

Ans:-

Can use cached datasets in memory to increase performance

Integrates higher level libraries, for example, streaming and machine learning

Question: Which statements about installing HBase in local mode are correct?

You can edit the hbase-site.xml configuration file before running HBase

Java must be installed before you run HBase

The JAVA_DATA environment variable must be set before you run HBase

Standalone run mode is the default

Ans:-

You can edit the hbase-site.xml configuration file before running HBase

Java must be installed before you run HBase

Standalone run mode is the default

Question: Select the method used to monitor the health of a Hadoop cluster.

DataNode heartbeat notification

NameNode heartbeat notification

Ans:- DataNode heartbeat notification

Question: Select the goal of the HDFS block storage mechanism.

High throughput of data access

Low latency file access

Ans:- High throughput of data access

Question: Match the node type with its appropriate description.

Answer Options:
A:NameNode
B:DataNode

Master node

A

B
Ans:- A

Worker node

A

B
Ans:- B

Question: Select the POSIX filesystem feature from the list that is missing from HDFS.

Access permissions

Symbolic linking

Ans:- Symbolic linking

Question: Select two of the fault tolerance replication factors which can be configured.

Replication factor

Block size

File size

Ans:-

Replication factor

Block size

Question: How does the NameNode know when a DataNode goes offline?

The DataNode stops sending heartbeat notifications to the NameNode

The NameNode stops receiving responses to ping requests

Ans:- The DataNode stops sending heartbeat notifications to the NameNode

Question: Select the underlying protocol used by HDFS to communicate between nodes.

UDP

TCP

Ans:- TCP

Question: Select the answer which describes HDFS data block sizes.

Fixed size

Guaranteed to be 128 MB in size

Ans:- Fixed size

Question: Match the accessibility feature with its description.

Answer Options:
A:FS Shell
B:DFSAdmin
C:Browser Interface

Provides a Graphical User Interface

A

B

C
Ans:- C

Provides common reporting commands

A

B

C
Ans:- B

Provides common file commands

A

B

C
Ans:- A

Question: How quickly does the NameNode free space when the replication factor is decreased?

All DataNodes report to the NameNode immediately for instruction

Replication reduction is propagated on DataNode heartbeats

Ans:- Replication reduction is propagated on DataNode heartbeats

Question: Which client requires the least configuration to access HBase?

Thrift

Java-based API

Ans:- Java-based API

Question: What are features of the HBase schema?

Column families can be stored across multiple records

Tables are stored within regions

A cell is a placeholder for a column value

Regions are stored together in RegionServers

Records are stored in HFiles as key-value pairs

Regions group a continuous range of rows

Ans:-

Tables are stored within regions

A cell is a placeholder for a column value

Regions are stored together in RegionServers

Records are stored in HFiles as key-value pairs

Regions group a continuous range of rows

Question: Match the characteristics to the corresponding types of compaction. More than one characteristic may match to each type.

Answer Options:
A:Adjacent HFiles are merged into one larger HFile
B:The original HFiles are not removed or deleted
C:A maximum of ten files are merged
D:All HFiles are merged into one larger HFile
E:Files that are compacted are removed
F:It could affect system performance

Major compaction

A

B

C

D

E

F
Ans:- D,E,F

Minor compaction

A

B

C

D

E

F
Ans:- A,B,C


Question: Select the aspects of Hadoop covered by Ranger.

Access Roles

Log aggregation

Authorization

Data security

Ans:-

Access Roles

Authorization

Data security

Question: Which folder are the binary tarballs put into after compilation of Ranger?

target

dist

Ans:- target

Question: What other Apache project is typically installed with Ranger for its search capabilities?

Apache Flume

Apache Solr

Ans:- Apache Solr

Question: Select the location of the users and groups section in Apache Ranger.

Settings -> Users/Groups

Audit -> Users/Groups

Ans:- Settings -> Users/Groups

Question: What dashboard is used to manage hadoop components in the Hortonworks Sandbox?

Ambari

EMR

Ganglia

Ans:- Ambari

Question: Select the methods used by Ranger to synchronize users.

SASL

UNIX

LDAP

Ans:-

UNIX

LDAP

Question: Select the effect that disabling ‘Hive global tables allow’ in Ranger will have on Hive.

Global tables will be disabled

Querying tables will be disabled

Ans:- Querying tables will be disabled

Question: What argument is passed to the request header of the REST API to receive the response in JSON format?

Return:json

Accept:application/json

Ans:- Accept:application/json

Question: Which statements about the filesystem used with HBase are correct?

Replication must be provided through the filesystem

Amazon’s Simple Storage System (S3) cannot be used without Amazon Elastic Compute Cloud (EC2)

The local file system bypasses Hadoop

HDFS is the default filesystem in a fully distributed cluster

ANs:-

Replication must be provided through the filesystem

The local file system bypasses Hadoop

HDFS is the default filesystem in a fully distributed cluster

Question: Which use cases would be suitable for use of HBase?

You have less than a terabyte of data

Data includes a lot of dynamic rows

Versions of data need to be kept

Data contains a variable number of columns

You expect thousands of reads and writes

You don’t have a lot of hardware

You want built-in compression

Ans:-

Data includes a lot of dynamic rows

Versions of data need to be kept

Data contains a variable number of columns

You expect thousands of reads and writes

You want built-in compression

Question: What should you consider when designing the schema to support versions, different datatypes, and joins?

Versions are stored in descending order

It’s best to choose large datatypes

A tuple consists of a row, column, value, and version

Joins have to be coded

HBase stores all data as bytes

Ans:-

Versions are stored in descending order

Joins have to be coded

HBase stores all data as bytes

Question: Which elements are contained within an HFile data block?

A value length

A key length

A row value

File info

Magic header

A column qualifier

Ans:-

A value length

A key length

A row value

Magic header

A column qualifier

Question: Match the minimum recommended memory that’s required to each functional element in an HBase hardware deployment.

Answer Options:
A:8 GB
B:2 GB
C:4 GB
D:1 GB
E:12 GB

HBase Master

A

B

C

D

E
Ans:- C
RegionServer

A

B

C

D

E
Ans:- E

NameNode

A

B

C

D

E
Ans:- A

JobTracker

A

B

C

D

E
Ans:- B

DataNode

A

B

C

D

E
Ans:- D

SecondaryNameNode

A

B

C

D

E
Ans:- A

Question: In which ways can you help ensure a secure HBase environment?

Set the authorization property value to true

Specify authentication, for example using Kerberos

Specify the port as 8080

Set the client-side protection property to privacy

Enable SSL

Ans:-

Set the authorization property value to true

Specify authentication, for example using Kerberos

Set the client-side protection property to privacy

Enable SSL

Question: Which statements about the use of data replication in HBase are correct?

Log shipping is done asynchronously

Cyclic replication requires a minimum of three servers

A master-push pattern keeps track of what’s being replicated using the WAL on each RegionServer

Ans;-

Log shipping is done asynchronously

A master-push pattern keeps track of what’s being replicated using the WAL on each RegionServer

Question: Which library can you use in the Ruby programming language to deploy a symmetric cipher?

opencypher

openssl

Ans:- openssl

Question: Which type of privacy is described as a statistical technique whose goal is to maximize the accuracy of queries from statistical databases while minimizing the privacy impact on individuals who may be identified by information in the database?

Divisive

Differential

Ans:- Differential

Question: Which of these are the two types of data encryption?

Logarithmic

Block

Stream

Deep

Ans:

Block

Stream

Question: Which of these represent the two common approaches to the input validation issue?

Encrypting the data being supplied to the adversary

Detecting and filtering malicious input after an adversary is successful

Preventing an adversary from generating and sending malicious input

Designing secure software in the software development lifecycle

Ans:-

Detecting and filtering malicious input after an adversary is successful

Preventing an adversary from generating and sending malicious input

Question: Which of these are the two approaches that help ensure a mapper’s trustworthiness?

Mandatory trust

Worker establishment

Mandatory Access Control

Trust establishment

Ans:-

Mandatory Access Control

Trust establishment

Question: Which of these statements accurately describe granular access control?

More precise with access privileges than course-grained access control

Can sometimes force otherwise shareable data into a restrictive category

Using granular access control does not compromise security

Ans:-

More precise with access privileges than course-grained access control

Using granular access control does not compromise security

Question: Which statements about the functionality of the WAL and MemStore in the HBase architecture are correct?

Once a threshold is reached, the MemStore is flushed to an HFile

Data is first written to the WAL, then to the MemStore

If a client doesn’t find data in MemStore, it reads from the RegionServer

Data from the WAL is restored if the RegionServer crashes before it’s written to an HFile

If things fail, the RegionServer could restore from the WAL

Ans:-

Once a threshold is reached, the MemStore is flushed to an HFile

Data is first written to the WAL, then to the MemStore

Data from the WAL is restored if the RegionServer crashes before it’s written to an HFile

If things fail, the RegionServer could restore from the WAL

Question: You have installed HBase in fully distributed mode. What runs on nodes in the cluster?

Replication

Passwordless SSH

A ZooKeeper quorum

Backup masters

RegionServers

Ans:-

A ZooKeeper quorum

Backup masters

RegionServers

Question: Which of the following is a characteristic of big data management?

Variety

Viewability

Vulnerability

Velocity

Volume

Ans:-

Variety

Velocity

Volume

Question: Which statement about adapting an operational big data system is true?

Can create 100% customer satisfaction

Can create a competitive advantage

Can help build new smart application

Can reduce cost

Can only be implemented by large organizations

Ans:-

Can create a competitive advantage

Can help build new smart application

Can reduce cost

Question: Which of the following do data lakes enable?

Ingestion of unstructured data from an application using ETL

Non-stop, real-time access, and processing of data and events

Ans:- Non-stop, real-time access, and processing of data and events

Question: Which component in an optimal big data monitoring system is not required?

Data lakes

Performance metrics

Ans:- Data lakes

Question: Which is an important performance indicator for a big data system?

Having an standard ETL process and a relational data base

Data being accessible to business teams

Ans:- Data being accessible to business teams

Question: Which statement best describes the insight which monitoring business transactions provide?

Real-time performance that users are experiencing

System’s memory issues

Ans:- Real-time performance that users are experiencing

Question: When in a big data network, how does a network packet broker a function?

Collect data from switched port analyzers

Only filter outbound traffic

Ans:- Collect data from switched port analyzers

Question: Which statement best describes big data orchestration?

The process of creating automated processes in a big data system involving both operational and analytical big data technologies

The process of integrating two or more applications and/or services together to automate a process in real-time

Ans:- The process of integrating two or more applications and/or services together to automate a process in real-time

Question: Which consideration is important for a business when adapting a big data technology?

To have a system with unique policies other than the ones applied to other data assets

To have a clear business need and purpose for any big data initiative

Ans:- To have a clear business need and purpose for any big data initiative

Question: Which of the following can be a challenge when using the traditional ETL process for big data systems?

Data latency issues

Resources needed for processing

Excessive load on the data warehouse

Traditional ETL cannot be done in a data lake

Ans:-

Data latency issues

Resources needed for processing

Excessive load on the data warehouse

Question: What information does the main page of the web-based management console for HBase provide about an HBase installation?

Region servers

Backup masters

Software attributes

Server metrics

Ans:-

Region servers

Backup masters

Software attributes

Question: What should you consider when designing rowkeys for HBase tables?

You can avoid hotspotting by using salting or hashing

A timestamp with the rowkey will make for faster scanning

You could store the rowkey in binary representation as opposed to string value

You should keep the size of your row key small

Poor design of rowkeys could cause hotspotting

Ans:-

You can avoid hotspotting by using salting or hashing

You could store the rowkey in binary representation as opposed to string value

You should keep the size of your row key small

Poor design of rowkeys could cause hotspotting

Question: What should you consider when designing an HBase table?

A single store on HDFS houses data for a given column family

Columns in a column family are stored together on disk

Rowkeys should be designed based on your update patterns

A single API call should be used to support access patterns

Ans:-

A single store on HDFS houses data for a given column family

Columns in a column family are stored together on disk

A single API call should be used to support access patterns

Question: Match each HBase element with its function.

Answer Options:
A:ZooKeeper
B:Master
C:RegionServer
D:Client
E:Catalog tables

Keeps a list of all regions

A

B

C

D

E
Ans:- E

Fetches data

A

B

C

D

E
Ans:- D

Runs the DataNode in the HDFS cluster

A

B

C

D

E
Ans:- C
Provides high-performance, centralized, distributed synchronization

A

B

C

D

E
Ans:- A

Provides an interface to all HBase metadata for client operations

A

B

C

D

E
Ans:- B

Question: What are features of the different HBase installation modes?

The standalone run mode is the default mode

The pseudo-distributed run mode runs on just two hosts

The pseudo-distributed run mode is used for prototyping HBase

The fully distributed mode runs with multiple instances of HBase daemons on multiple, clustered machines

Ans:-

The standalone run mode is the default mode

The pseudo-distributed run mode is used for prototyping HBase

The fully distributed mode runs with multiple instances of HBase daemons on multiple, clustered machines

Question: Which setting do you alter to integrate MapReduce with HBase?

HADOOP_JARPATH

HADOOP_CLASSPATH

Ans:- HADOOP_CLASSPATH

Question: Which statements about deleting cells in HBase are correct?

HBase will automatically delete rows according to TTL values

Expired rows are deleted during a minor compaction

Get and Scan operations do not see cells that have been marked for deletion

Ans:- HBase will automatically delete rows according to TTL values

Question: Which system features meet the recommended software requirements for HBase?

Java and Hadoop are installed

SSH is enabled

Windows is installed

The system runs the XFS filesystem

Forward and reverse DNS resolving are enabled

4,096 DataNode handlers are installed

Ans:-

SSH is enabled

The system runs the XFS filesystem

Forward and reverse DNS resolving are enabled

4,096 DataNode handlers are installed

Question: Which shell command creates an HBase table, Cars, with a column family named Technical?

create ‘Cars’,’Technical’

create “Cars” “Technical”

create Cars Technical

Ans:- create ‘Cars’,’Technical’

Question: What happens when you disable an HBase table?

Connections that are currently writing data to it can complete

Connections that are currently reading data from it are broken

Clients cannot use it

Ans:-

Connections that are currently writing data to it can complete

Clients cannot use it

Question: Which command will return full contents of a single row of the table scantable?

scan ‘scantable’,{LIMIT => 1}

scan “scantable”,{“LIMIT” = 1}

Ans:- scan ‘scantable’,{LIMIT => 1}

Question: Which statements about altering the properties of an HBase table are correct?

Some attributes have column-family scope

You use the alter command to set or reset attribute values

KEEP_DELETED_CELLS is a table scope attribute

Ans:-

Some attributes have column-family scope

You use the alter command to set or reset attribute values

Question: Which command do you enter in the HBase shell to create a new record in an existing database?

put

create

Ans:- put

Question: Which command will add 10 to the value of the counter Ticker in row3 of column family CF in the table TableA?

incr ‘TableA’, ‘row3’, ‘CF:Ticker’, 10

incr ‘TableA’, ‘row3’, ‘CF:Ticker’, =>10

Ans:- incr ‘TableA’, ‘row3’, ‘CF:Ticker’, 10

Question: Which command will delete all data from the HBase table Obsolete?

delete ‘Obsolete’, all

truncate ‘Obsolete’

Ans:- truncate ‘Obsolete’