Question: What is the purpose of the Hadoop File System, or HDFS?
Abstract away the complexity of distributing large data files across multiple data nodes
Every data file has multiple copies and HDFS tracks all the copies
Every file is cached on all distributed HDFS data nodes for faster file system I/O
Provides local resource access to distributed file shares
Ans:-
Abstract away the complexity of distributing large data files across multiple data nodes
Every data file has multiple copies and HDFS tracks all the copies
Question: What are the two functional parts that make up Hadoop?
NTFS
YARN
HDFS
SPARK
Ans:-
YARN
HDFS
Question: What are the design principles of Hadoop?
Move processing, not data
Embrace failure
Dumb hardware and smart software
Share nothing
Performance, not stability
Build applications, not infrastructure
Ans:-
Move processing, not data
Embrace failure
Dumb hardware and smart software
Share nothing
Build applications, not infrastructure
Question: What are the two primary parts of MapReduce?
Mapper – executes initial data crunch on data nodes
Reducer – aggregates data into final output
Mapmaker – creates a set of mapper files for processing
Reduction – handles the summation of data before post-processing
Ans:-
Mapper – executes initial data crunch on data nodes
Reducer – aggregates data into final output
Question: What are some features of YARN that enables it to scale massively?
Data locality
Trivial to write compute jobs
Share nothing
Move code to data
Ans:-
Data locality
Share nothing
Move code to data
Question: What are some of the key responsibilities for the master server in a Hadoop cluster?
Processing – execute instructions and perform work on data
Health Checks – monitor the health of all worker nodes
Scheduling – schedule and reschedule jobs
Coordinate – manage cluster of worker nodes
Error Handling – handle all failures and errors
Data Location and Movement – map the direction for data movement
Ans:-
Health Checks – monitor the health of all worker nodes
Scheduling – schedule and reschedule jobs
Coordinate – manage cluster of worker nodes
Error Handling – handle all failures and errors
Data Location and Movement – map the direction for data movement
Question: What are some key attributes for choosing Spark for processing Big data?
Can use cached datasets in memory to increase performance
Can be scaled beyond the HDFS limitations
Integrates higher level libraries, for example, streaming and machine learning
Can store data without writing it to disk
Ans:-
Can use cached datasets in memory to increase performance
Integrates higher level libraries, for example, streaming and machine learning
Question: Which statements about installing HBase in local mode are correct?
You can edit the hbase-site.xml configuration file before running HBase
Java must be installed before you run HBase
The JAVA_DATA environment variable must be set before you run HBase
Standalone run mode is the default
Ans:-
You can edit the hbase-site.xml configuration file before running HBase
Java must be installed before you run HBase
Standalone run mode is the default
Question: Select the method used to monitor the health of a Hadoop cluster.
DataNode heartbeat notification
NameNode heartbeat notification
Ans:- DataNode heartbeat notification
Question: Select the goal of the HDFS block storage mechanism.
High throughput of data access
Low latency file access
Ans:- High throughput of data access
Question: Match the node type with its appropriate description.
Answer Options:
A:NameNode
B:DataNode
Master node
A
B
Ans:- A
Worker node
A
B
Ans:- B
Question: Select the POSIX filesystem feature from the list that is missing from HDFS.
Access permissions
Symbolic linking
Ans:- Symbolic linking
Question: Select two of the fault tolerance replication factors which can be configured.
Replication factor
Block size
File size
Ans:-
Replication factor
Block size
Question: How does the NameNode know when a DataNode goes offline?
The DataNode stops sending heartbeat notifications to the NameNode
The NameNode stops receiving responses to ping requests
Ans:- The DataNode stops sending heartbeat notifications to the NameNode
Question: Select the underlying protocol used by HDFS to communicate between nodes.
UDP
TCP
Ans:- TCP
Question: Select the answer which describes HDFS data block sizes.
Fixed size
Guaranteed to be 128 MB in size
Ans:- Fixed size
Question: Match the accessibility feature with its description.
Answer Options:
A:FS Shell
B:DFSAdmin
C:Browser Interface
Provides a Graphical User Interface
A
B
C
Ans:- C
Provides common reporting commands
A
B
C
Ans:- B
Provides common file commands
A
B
C
Ans:- A
Question: How quickly does the NameNode free space when the replication factor is decreased?
All DataNodes report to the NameNode immediately for instruction
Replication reduction is propagated on DataNode heartbeats
Ans:- Replication reduction is propagated on DataNode heartbeats
Question: Which client requires the least configuration to access HBase?
Thrift
Java-based API
Ans:- Java-based API
Question: What are features of the HBase schema?
Column families can be stored across multiple records
Tables are stored within regions
A cell is a placeholder for a column value
Regions are stored together in RegionServers
Records are stored in HFiles as key-value pairs
Regions group a continuous range of rows
Ans:-
Tables are stored within regions
A cell is a placeholder for a column value
Regions are stored together in RegionServers
Records are stored in HFiles as key-value pairs
Regions group a continuous range of rows
Question: Match the characteristics to the corresponding types of compaction. More than one characteristic may match to each type.
Answer Options:
A:Adjacent HFiles are merged into one larger HFile
B:The original HFiles are not removed or deleted
C:A maximum of ten files are merged
D:All HFiles are merged into one larger HFile
E:Files that are compacted are removed
F:It could affect system performance
Major compaction
A
B
C
D
E
F
Ans:- D,E,F
Minor compaction
A
B
C
D
E
F
Ans:- A,B,C
Question: Select the aspects of Hadoop covered by Ranger.
Access Roles
Log aggregation
Authorization
Data security
Ans:-
Access Roles
Authorization
Data security
Question: Which folder are the binary tarballs put into after compilation of Ranger?
target
dist
Ans:- target
Question: What other Apache project is typically installed with Ranger for its search capabilities?
Apache Flume
Apache Solr
Ans:- Apache Solr
Question: Select the location of the users and groups section in Apache Ranger.
Settings -> Users/Groups
Audit -> Users/Groups
Ans:- Settings -> Users/Groups
Question: What dashboard is used to manage hadoop components in the Hortonworks Sandbox?
Ambari
EMR
Ganglia
Ans:- Ambari
Question: Select the methods used by Ranger to synchronize users.
SASL
UNIX
LDAP
Ans:-
UNIX
LDAP
Question: Select the effect that disabling ‘Hive global tables allow’ in Ranger will have on Hive.
Global tables will be disabled
Querying tables will be disabled
Ans:- Querying tables will be disabled
Question: What argument is passed to the request header of the REST API to receive the response in JSON format?
Return:json
Accept:application/json
Ans:- Accept:application/json
Question: Which statements about the filesystem used with HBase are correct?
Replication must be provided through the filesystem
Amazon’s Simple Storage System (S3) cannot be used without Amazon Elastic Compute Cloud (EC2)
The local file system bypasses Hadoop
HDFS is the default filesystem in a fully distributed cluster
ANs:-
Replication must be provided through the filesystem
The local file system bypasses Hadoop
HDFS is the default filesystem in a fully distributed cluster
Question: Which use cases would be suitable for use of HBase?
You have less than a terabyte of data
Data includes a lot of dynamic rows
Versions of data need to be kept
Data contains a variable number of columns
You expect thousands of reads and writes
You don’t have a lot of hardware
You want built-in compression
Ans:-
Data includes a lot of dynamic rows
Versions of data need to be kept
Data contains a variable number of columns
You expect thousands of reads and writes
You want built-in compression
Question: What should you consider when designing the schema to support versions, different datatypes, and joins?
Versions are stored in descending order
It’s best to choose large datatypes
A tuple consists of a row, column, value, and version
Joins have to be coded
HBase stores all data as bytes
Ans:-
Versions are stored in descending order
Joins have to be coded
HBase stores all data as bytes
Question: Which elements are contained within an HFile data block?
A value length
A key length
A row value
File info
Magic header
A column qualifier
Ans:-
A value length
A key length
A row value
Magic header
A column qualifier
Question: Match the minimum recommended memory that’s required to each functional element in an HBase hardware deployment.
Answer Options:
A:8 GB
B:2 GB
C:4 GB
D:1 GB
E:12 GB
HBase Master
A
B
C
D
E
Ans:- C
RegionServer
A
B
C
D
E
Ans:- E
NameNode
A
B
C
D
E
Ans:- A
JobTracker
A
B
C
D
E
Ans:- B
DataNode
A
B
C
D
E
Ans:- D
SecondaryNameNode
A
B
C
D
E
Ans:- A
Question: In which ways can you help ensure a secure HBase environment?
Set the authorization property value to true
Specify authentication, for example using Kerberos
Specify the port as 8080
Set the client-side protection property to privacy
Enable SSL
Ans:-
Set the authorization property value to true
Specify authentication, for example using Kerberos
Set the client-side protection property to privacy
Enable SSL
Question: Which statements about the use of data replication in HBase are correct?
Log shipping is done asynchronously
Cyclic replication requires a minimum of three servers
A master-push pattern keeps track of what’s being replicated using the WAL on each RegionServer
Ans;-
Log shipping is done asynchronously
A master-push pattern keeps track of what’s being replicated using the WAL on each RegionServer
Question: Which library can you use in the Ruby programming language to deploy a symmetric cipher?
opencypher
openssl
Ans:- openssl
Question: Which type of privacy is described as a statistical technique whose goal is to maximize the accuracy of queries from statistical databases while minimizing the privacy impact on individuals who may be identified by information in the database?
Divisive
Differential
Ans:- Differential
Question: Which of these are the two types of data encryption?
Logarithmic
Block
Stream
Deep
Ans:
Block
Stream
Question: Which of these represent the two common approaches to the input validation issue?
Encrypting the data being supplied to the adversary
Detecting and filtering malicious input after an adversary is successful
Preventing an adversary from generating and sending malicious input
Designing secure software in the software development lifecycle
Ans:-
Detecting and filtering malicious input after an adversary is successful
Preventing an adversary from generating and sending malicious input
Question: Which of these are the two approaches that help ensure a mapper’s trustworthiness?
Mandatory trust
Worker establishment
Mandatory Access Control
Trust establishment
Ans:-
Mandatory Access Control
Trust establishment
Question: Which of these statements accurately describe granular access control?
More precise with access privileges than course-grained access control
Can sometimes force otherwise shareable data into a restrictive category
Using granular access control does not compromise security
Ans:-
More precise with access privileges than course-grained access control
Using granular access control does not compromise security
Question: Which statements about the functionality of the WAL and MemStore in the HBase architecture are correct?
Once a threshold is reached, the MemStore is flushed to an HFile
Data is first written to the WAL, then to the MemStore
If a client doesn’t find data in MemStore, it reads from the RegionServer
Data from the WAL is restored if the RegionServer crashes before it’s written to an HFile
If things fail, the RegionServer could restore from the WAL
Ans:-
Once a threshold is reached, the MemStore is flushed to an HFile
Data is first written to the WAL, then to the MemStore
Data from the WAL is restored if the RegionServer crashes before it’s written to an HFile
If things fail, the RegionServer could restore from the WAL
Question: You have installed HBase in fully distributed mode. What runs on nodes in the cluster?
Replication
Passwordless SSH
A ZooKeeper quorum
Backup masters
RegionServers
Ans:-
A ZooKeeper quorum
Backup masters
RegionServers
Question: Which of the following is a characteristic of big data management?
Variety
Viewability
Vulnerability
Velocity
Volume
Ans:-
Variety
Velocity
Volume
Question: Which statement about adapting an operational big data system is true?
Can create 100% customer satisfaction
Can create a competitive advantage
Can help build new smart application
Can reduce cost
Can only be implemented by large organizations
Ans:-
Can create a competitive advantage
Can help build new smart application
Can reduce cost
Question: Which of the following do data lakes enable?
Ingestion of unstructured data from an application using ETL
Non-stop, real-time access, and processing of data and events
Ans:- Non-stop, real-time access, and processing of data and events
Question: Which component in an optimal big data monitoring system is not required?
Data lakes
Performance metrics
Ans:- Data lakes
Question: Which is an important performance indicator for a big data system?
Having an standard ETL process and a relational data base
Data being accessible to business teams
Ans:- Data being accessible to business teams
Question: Which statement best describes the insight which monitoring business transactions provide?
Real-time performance that users are experiencing
System’s memory issues
Ans:- Real-time performance that users are experiencing
Question: When in a big data network, how does a network packet broker a function?
Collect data from switched port analyzers
Only filter outbound traffic
Ans:- Collect data from switched port analyzers
Question: Which statement best describes big data orchestration?
The process of creating automated processes in a big data system involving both operational and analytical big data technologies
The process of integrating two or more applications and/or services together to automate a process in real-time
Ans:- The process of integrating two or more applications and/or services together to automate a process in real-time
Question: Which consideration is important for a business when adapting a big data technology?
To have a system with unique policies other than the ones applied to other data assets
To have a clear business need and purpose for any big data initiative
Ans:- To have a clear business need and purpose for any big data initiative
Question: Which of the following can be a challenge when using the traditional ETL process for big data systems?
Data latency issues
Resources needed for processing
Excessive load on the data warehouse
Traditional ETL cannot be done in a data lake
Ans:-
Data latency issues
Resources needed for processing
Excessive load on the data warehouse
Question: What information does the main page of the web-based management console for HBase provide about an HBase installation?
Region servers
Backup masters
Software attributes
Server metrics
Ans:-
Region servers
Backup masters
Software attributes
Question: What should you consider when designing rowkeys for HBase tables?
You can avoid hotspotting by using salting or hashing
A timestamp with the rowkey will make for faster scanning
You could store the rowkey in binary representation as opposed to string value
You should keep the size of your row key small
Poor design of rowkeys could cause hotspotting
Ans:-
You can avoid hotspotting by using salting or hashing
You could store the rowkey in binary representation as opposed to string value
You should keep the size of your row key small
Poor design of rowkeys could cause hotspotting
Question: What should you consider when designing an HBase table?
A single store on HDFS houses data for a given column family
Columns in a column family are stored together on disk
Rowkeys should be designed based on your update patterns
A single API call should be used to support access patterns
Ans:-
A single store on HDFS houses data for a given column family
Columns in a column family are stored together on disk
A single API call should be used to support access patterns
Question: Match each HBase element with its function.
Answer Options:
A:ZooKeeper
B:Master
C:RegionServer
D:Client
E:Catalog tables
Keeps a list of all regions
A
B
C
D
E
Ans:- E
Fetches data
A
B
C
D
E
Ans:- D
Runs the DataNode in the HDFS cluster
A
B
C
D
E
Ans:- C
Provides high-performance, centralized, distributed synchronization
A
B
C
D
E
Ans:- A
Provides an interface to all HBase metadata for client operations
A
B
C
D
E
Ans:- B
Question: What are features of the different HBase installation modes?
The standalone run mode is the default mode
The pseudo-distributed run mode runs on just two hosts
The pseudo-distributed run mode is used for prototyping HBase
The fully distributed mode runs with multiple instances of HBase daemons on multiple, clustered machines
Ans:-
The standalone run mode is the default mode
The pseudo-distributed run mode is used for prototyping HBase
The fully distributed mode runs with multiple instances of HBase daemons on multiple, clustered machines
Question: Which setting do you alter to integrate MapReduce with HBase?
HADOOP_JARPATH
HADOOP_CLASSPATH
Ans:- HADOOP_CLASSPATH
Question: Which statements about deleting cells in HBase are correct?
HBase will automatically delete rows according to TTL values
Expired rows are deleted during a minor compaction
Get and Scan operations do not see cells that have been marked for deletion
Ans:- HBase will automatically delete rows according to TTL values
Question: Which system features meet the recommended software requirements for HBase?
Java and Hadoop are installed
SSH is enabled
Windows is installed
The system runs the XFS filesystem
Forward and reverse DNS resolving are enabled
4,096 DataNode handlers are installed
Ans:-
SSH is enabled
The system runs the XFS filesystem
Forward and reverse DNS resolving are enabled
4,096 DataNode handlers are installed
Question: Which shell command creates an HBase table, Cars, with a column family named Technical?
create ‘Cars’,’Technical’
create “Cars” “Technical”
create Cars Technical
Ans:- create ‘Cars’,’Technical’
Question: What happens when you disable an HBase table?
Connections that are currently writing data to it can complete
Connections that are currently reading data from it are broken
Clients cannot use it
Ans:-
Connections that are currently writing data to it can complete
Clients cannot use it
Question: Which command will return full contents of a single row of the table scantable?
scan ‘scantable’,{LIMIT => 1}
scan “scantable”,{“LIMIT” = 1}
Ans:- scan ‘scantable’,{LIMIT => 1}
Question: Which statements about altering the properties of an HBase table are correct?
Some attributes have column-family scope
You use the alter command to set or reset attribute values
KEEP_DELETED_CELLS is a table scope attribute
Ans:-
Some attributes have column-family scope
You use the alter command to set or reset attribute values
Question: Which command do you enter in the HBase shell to create a new record in an existing database?
put
create
Ans:- put
Question: Which command will add 10 to the value of the counter Ticker in row3 of column family CF in the table TableA?
incr ‘TableA’, ‘row3’, ‘CF:Ticker’, 10
incr ‘TableA’, ‘row3’, ‘CF:Ticker’, =>10
Ans:- incr ‘TableA’, ‘row3’, ‘CF:Ticker’, 10
Question: Which command will delete all data from the HBase table Obsolete?
delete ‘Obsolete’, all
truncate ‘Obsolete’
Ans:- truncate ‘Obsolete’