Which of the following is not considered a cause of confusion about the precise meaning of the data science buzzwords?
The constant evolution of the data science industry and in turn the meaning of the data science buzzwords
Ans:-The speed with which new data science terms are appearing
Which of the following is related to the meaning of the term analytics?
Analytics is about separating a dataset into easy-to-digest chunks and studying them individually and examine how they relate to other parts
Ans:-Analytics is the application of logical and computational reasoning to the component parts obtained in an analysis
Which of the terms relates to the field of business analytics only?
Creating dashboards
Reporting with visuals
Ans:- Qualitative analytics
Which of the following is not considered a data analytics activity?
Ans:-Business case studies
Preliminary data reporting
Optimization of drilling operations
Which of the following is considered data science?
Business case studies
Qualitative analytics
Digital signal processing
Sales forecasting
Given that all activities can be done with ML and all can be done without ML, choose the best answer. Which of the following is considered Data science but not Machine learning?
Creating real-time dashboards
Ans:-Sales forecasting
Fraud prevention
Which of the following is not an example of where Machine Learning is being applied today?
Ans:-Symbolic reasoning
Client retention
Image recognition
From a data scientist’s perspective, the solution of every task begins:
by suggesting a few hypothetical and theoretical solutions to your boss
by gathering your team and deciding on what approach to follow to solve the task
Ans:-with a proper dataset
According to our infographic, which of the following is not considered data science?
Ans:-Big data
Business intelligence
Traditional data science methods
Which of the following is related to the pre-processing of a traditional data set?
Class labelling
Data cleansing
Dealing with missing values
Ans:-All of the above
Which of the following do you encounter when working with big data?
Text data
Integer
Digital image data
Ans:-All of the above
The process of representing observations as numbers is called:
Collecting observations
Ans:-Quantification
A measure that has a business meaning attached is called:
an observation
a quantification
Ans:-a metric
A KPI (Key Performance Indicator) can be best defined as:
the accumulation of observations to show some information
Ans:-a metric that is tightly aligned with your business objectives
a quantification that has a business meaning attached
an observation that can potentially be related to the business goals of a company
The job of a business intelligence analyst always involves the creation of:
reports
dashboards
KPIs
Ans:-All of the above
Which of the following columns from our infographic contain activities that are said to belong to the field of ‘predictive analytics’ and do not aim at explaining past behaviour?
Traditional data
Big data
Business intelligence
Ans:-Traditional methods
In business and statistics, which is the general term that refers to using a model for quantifying causal relationships?
Ans:-regression analysis
factor analysis
cluster analysis
time-series analysis
Which technique can be implemented if you want to reduce the dimensionality of a certain statistical problem?
Ans:-factor analysis
cluster analysis
time-series analysis
all of the above
Which technique is associated with plotting values against time, shown always on the horizontal line?
regression analysis
Ans:-time-series analysis
factor analysis
cluster analysis
When the data is divided into a few groups, you should apply:
factor analysis
Ans:-cluster analysis
time-series analysis
Which of the following statements is true?
The core of machine learning is creating an algorithm, which a computer then uses to find a model that fits the data as best as possible
In machine learning, one does not give the machine instructions on how to find a model. Rather, one provides it with algorithms which give the machine directions on how to learn on its own
A machine learning algorithm is like a trial-and-error process, but the special thing about it, is that each consecutive trial is at least as good as the previous one
Ans:-All of the above
Which line represents the four ingredients of any machine learning algorithm?
Model, data, reward system, objective function
Ans:-Data, model, objective function, optimization algorithm
Model, labelled data, unlabelled data, optimization algorithm
Choose the best answer.
In which type of machine learning is one always working with unlabelled data?
Supervised learning
Ans:-Unsupervised learning
Reinforcement learning
In reinforcement learning, a reward system is being used to improve the machine learning model at hand. The idea of using this reward system is to:
to minimize the error of the model
to minimize the objective function
Ans:-to maximize the objective function
to improve the optimization algorithm
Which of the following is an example where big data techniques are being applied?
Basic customer data
Ans:-Social media
What is NOT a type of machine learning?
supervised
unsupervised
Ans:-reinforced
Which of the following is a typical real-life example where BI techniques are being applied?
Historical stock price data
Ans:-Inventory management
In geometrical terms, a scalar can be represented as:
a line
a square
Ans:-a point
Which of the following is a typical real-life example where machine learning techniques are being applied?
Ans:-Client retention
Basic customer data
Which of the following is a typical real-life example where traditional data techniques are being applied?
Ans:-Basic customer data
Inventory management
Which of the following is a TYPICAL real-life example where traditional data science techniques are being applied?
Social media
Financial trading data
Inventory management
Ans:-Sales forecasting
You have financial data for 100 countries. You feed them to the algorithm and ask it to classify them in as many groups as it sees fit. It starts with 100 groups as each country represents a separate group. You decide to tell it to spit out 5 major groups, i.e. cluster them in 5 clusters. This is an instance of:
supervised learning
Ans:-unsupervised learning
Which software tool is frequently used when working with traditional data or when doing a BI analysis?
Ans:-Excel
Hadoop
R
Knowing which programming language is a huge advantage if you are supposed to be working with big data and/or machine learning?
SQL
Ans:-Java
Data which can be classified using a linear model is called:
clusterable
regressable
linear
Ans:-linearly separable
Econometric time-series analysis is the domain of which software tool?
Excel
Ans:-E-views
Why do we express probabilities numerically?
To determine whether they are likely or unlikely.
Ans:-To compute which event is relatively more likely.
How do we graphically express two mutually exclusive sets?
Their circles are tangent to one another.
Ans:-Their circles do not touch.
What are the intersection and union of two mutually exclusive sets?
The intersection is their sum and the union is the empty set.
The intersection is the smaller set and the union is the larger set.
The union is the empty set and their sum is the intersection.
Ans:-The union is their sum and the intersection is the empty set.
If the probability of an event remains unaffected by another event, the two are….
Dependent.
Ans:-Independent.
What do we call the probability we use to distinguish dependent from independent events?
The dependent probability.
The independent probability.
Ans:The conditional probability.
What can you conclude about events A and B, given that P(A) = P(A|B)?
Ans:The two are independent.
The two are dependent.
What is the difference between P(A|B) and P(B|A)?
The former suggests the two events are dependent, while the latter suggests they are independent.
The former suggests event A is more likely than event B, while the latter suggests B is the likelier of the two.
Ans:One indicates the probability of getting A, given B has occurred, while the other indicates the likelihood of getting B, given A has occurred.
What is the value of P(A|B), knowing P(B|A) = 0.6, P(A) = 0.4 and P(B) = 0.3?
0.24
0.2
Ans:0.8
Which of these characteristics of relational databases are not typical of data warehouses?
Ability to store and query data
Quick retrieval of individual records
Use of structured data
Transaction processing
Answer:
Quick retrieval of individual records.
Use of structured data.
Transaction processing
Question:
Which of these is NOT one of the ACID properties of transactions?
Atomicity
Coordinated
Durability
Answer:- Coordinated
Question:
Match the following use cases to the type of processing system which is best suited for the task.
OLAP
OLTP
- A:Money transfer from one account to another
- B:Generate monthly sales reports
- C:Spot trends in TV viewership over the week
- D:Update a student’s test score in a final exam
Answer:- OLAP (B,C)
OLTP(A,D)
Question:
Which of these is NOT a data warehouse?
Amazon Redshift
Hive
Jenkins
Teradata
Answer:- Jenkins
Question:
Which of these is NOT a feature of Amazon Redshift?
Ability to back up data using snapshots
Interfaces to query and analyze data
Fully managed data warehouse on the cloud
Highly optimized transaction processing
Answer:- Highly optimized transaction processing
Question:
Which of the following are characteristics of distributed systems?
Fault tolerance
Software to coordinate tasks
Multiple machines working together
Difficulty in scaling
ans:-
Fault tolerance
Software to coordinate tasks
Multiple machines working together
Question:
What does an AWS Policy represent?
A collection of permissions on AWS resources
A licensing agreement between the user and Amazon Inc.
A collection of users who can create resources on AWS
A list of suggested use cases for each AWS service
ANS- A collection of permissions on AWS resources
Question:
When provisioning an AWS user account that will be used to access AWS programmatically, what is generated in order to authenticate users?
An AWS Secret Access Key
An AWS Access Key ID
A Kubernetes secret password
An @aws.com email address
Ans-
An AWS Secret Access Key
An AWS Access Key ID
Question: Which statements about the Amazon Redshift Query Editor is TRUE?
It can be used to load and query data, but not create tables
It can be used to query system tables
It can be used to create tables and to load and query data
It can be used to query existing data but not load data into tables
answer:-
It can be used to query system tables
It can be used to create tables and to load and query data
Question:
What AWS CLI command will list all the Redshift clusters which a user has access to?
aws redshift describe-clusters
aws redshift list-clusters
aws redshift view-all-clusters
aws redshift list-clusters –all
ans:- aws redshift describe-clusters
Question:
Which of the following is not true of Redshift clusters?
Users can define the VPC in which they will be provisioned
The cluster can be scaled
It supports the encryption of data on the cluster
The minimum cluster size is four nodes
ans:- The minimum cluster size is four nodes
Question:
Which of these features and metrics are offered by Amazon Redshift?
Cluster snapshots
Optimized transaction processing
Saved queries
Resource usage metrics
ans:-
Cluster snapshots
Saved queries
Resource usage metrics
Question:
Which of these details need to be supplied when provisioning a Redshift cluster using the Quick launch feature?
Cluster identifier
Cluster credentials
Node type
Frequency of automated snapshots
Question:
How is a SMOTE oversampler initialized from a ModelFrame object named mf?
mf.over_sampling.SMOTE()
mf.imbalance.SMOTE()
mf.imbalance.over_sampling.SMOTE()
mf.SMOTE()
Ans:- mf.imbalance.over_sampling.SMOTE()
Question:
Which statements are true concerning disabling automated snapshots and excluding tables from snapshots?
To exclude a table from a snapshot, add the NO BACKUP clause to the table definition
To disable an automated snapshot, set the retention period to 1 days
To disable an automated snapshot, change the retention period to 0 days
To exclude a table from a snapshot, use the BACKUP NO clause following the table definition
Ans:-
To disable an automated snapshot, change the retention period to 0 days
To exclude a table from a snapshot, use the BACKUP NO clause following the table definition
Question
Given the following values in the body_style column of a Pandas dataframe, what would they look like after label encoding?
hatchback, sedan, suv (1, 0, 0), (0,1,0), (0,0,1) 1, 2, 3 (0, 1,1), (1,0,1), (1,1,0) 0, 1, 2 Ans:-0, 1, 2 Question Which of these statements about the application of the RandomOversampler are correct?
Samples from the minority class are oversampled until the dataset is balanced
Samples from the majority class are dropped until the dataset is balanced
Samples for the minority class are generated to balance the data
The resultant balanced dataset contains duplicates
Ans:- Samples from the minority class are oversampled until the dataset is balanced
The resultant balanced dataset contains duplicates
Question
Match the following sampling techniques to the type of sampling they perform.
Oversampling ABCD
Undersampling ABCD
- A:Near Miss
- B:SMOTE
- C:Tomek Links
- D:Neighborhood Cleaning Rule
Answer :- Oversampling B
Undersampling ACD
Question
How do you obtain the explained variance ratios for the principal components in your data once you have fitted a PCA object to it?
pca.explained_variance_ratios
pca.get_variance_ratios()
pca.variance_ratios()
pca.explained_variance
ANS:- pca.explained_variance_ratios
Question
How do you initialize a principal component analyzer to extract five principal components from a ModelFrame named mf_data?
decomposition.PCA(data = mf_data, n_components = 5)
mf_data.decomposition.PCA(n_components = 5)
dimensionality_reduction.PCA(data = mf_data, n_components = 5)
mf_data.dimensionality_reduction.PCA(n_components = 5)
ANS:- mf_data.decomposition.PCA(n_components = 5)
Question
What property of a ModelFrame contains the name of the target column?
target_column
target
target_header
target_name
Ans:- target_name
Question
Given you have your test data in a ModelFrame called mf_test, and have a trained estimator called trained_model, how would you get the predictions from the model on your test data?
mf_test.predict(trained_model)
trained_model.predict(mf_test)
trained_model.make_predictions(mf_test)
trained_model.get_predictions(mf_test)
ANs:-mf_test.predict(trained_model)
Question
Which of these are characteristics of the EasyEnsembleClassifier and the BalancedRandomForestClassifier?
They use oversampling to balance the data
They use undersampling to balance the data
They use point-based sampling to balance the data
They use an ensemble of learners where each individual learner is weak
The number of learners can be configured
Ans:-They use undersampling to balance the data
They use an ensemble of learners where each individual learner is weak
The number of learners can be configured
Question:
Recall the essential types of cloud migration.
Re-platforming
Shift to SaaS
Shift to PaaS
Retooling
Ans:-
Re-platforming
Shift to SaaS
Question:
Identify the essential benefits of implementing big data analytics in the cloud.
Scalability and flexibility
Improved analytical outcomes
Security and privacy
Improved analysis
Ans:-
Security and privacy
Improved analysis
Question:
Which are essential characteristics of Kubernetes?
Kubernetes servers can run within a Docker container
Kubernetes is a powerful cloud container management and orchestration tools
Kubernetes servers can run within a Docker virtual machine
Kubernetes is a powerful cloud application management and orchestration tools
Ans:-
Kubernetes servers can run within a Docker container
Kubernetes is a powerful cloud container management and orchestration tools
Question: Select the IP addresses that are used in Kubernetes.
Process EndpointIP
InternalIP
ExternalIP
DataIP
Ans:-
InternalIP
ExternalIP
Question: Name some of the prominent tools provided by GCP that are used to ingest data.
Kubernetes Engine
Cloud Pub/Sub
BigQuery
Cloud Firestore
Ans:-
Kubernetes Engine
Cloud Pub/Sub
Question: Choose the big data management services provided by AWS.
RedShift
DataScale
ENR
EMR
Ans:-
RedShift
EMR
Question: Which of the following storages can we use to back up big data in AWS?
Amazon Glacier
Amazon RedShift
Amazon DataScale
Amazon S3
Ans:-
Amazon Glacier
Amazon S3
Question: Identify which of the following services and implementations can help facilitate disaster recovery planning in the cloud?
S3 configuration
Data migration
Backup and Restore planning
Multi-region deployments
Ans:-
Backup and Restore planning
Multi-region deployments
Question:
Identify some of the critical perspectives of the Cloud Adoption Framework that can help facilitate the implementation or adoption of cloud computing.
Service perspective
People perspective
Application perspective
Business perspective
Ans:-
People perspective
Business perspective
Question: What are the some of the critical layers involved in a typical blockchain-based cloud framework?
Cloud
Defining resources and parameters that are required for deployment
Cloud automation
Blockchain management
Ans:-
Cloud automation
Blockchain management
Question: What are the prominent blockchain implementations that we can use to manage distributed ledgers?
Datachain
Livecoin
Tron
Ethereum
Ans:-
Tron
Ethereum
Question: To which data lifecycle phase does importing data apply to?
Archive
Share
Use
Create
Ans:- Create
Question: Which statement regarding ERDs in Visual Paradigm is true?
Table relationship diagrams cannot be established
Only Microsoft SQL Server database structures are supported
The design can be used to generate a database structure
Only MySQL database structures are supported
Ans:-The design can be used to generate a database structure
Question: Which item is used to control resource access?
Hashing
Software updates
ACL
Firmware updates
Ans:- ACL
Question: Which database statement is correct?
NoSQL commonly scales vertically
SQL commonly scales horizontally
NoSQL uses a structured schema
SQL uses a structured schema
Ans:- SQL uses a structured schema
Question: Which security standard protects cardholder data?
HIPAA
PIPEDA
PCI DSS
GDPR
Ans:-PCI DSS
Question: What is the default listening port for Microsoft SQL server?
1433
631
1389
110
Ans:- 1433
Question: Which statements regarding DynamoDB items is correct?
DynamoDB is a SQL solution
All items must store the same type of data
DynamoDB is a NoSQL solution
Each item can store different types of data
Ans:-
DynamoDB is a NoSQL solution
Each item can store different types of data
Question: Which of the following is an IT data architecture framework?
PIPEDA
GDPR
HIPAA
TOGAF
Ans:-TOGAF
Question: Which definition is correct?
Data stems from information
Information is organized data
Information stems from data
Data is organized information
Ans:-
Information is organized data
Information stems from data
Question: Which factor could undermine the legitimacy of Big Data summaries?
Pace of data creation
Value derivation
Data source accuracy
Amount of data
Ans:- Data source accuracy
Question: Which of the following best describes Apache Hadoop?
SQL database replication
NoSQL database replication
Clustered database analytical engine
Vertically scaled databases analytical engine
Ans:-Clustered database analytical engine
Question: Identify the approaches of data architecture that we can use to implement a hybrid data architecture.
MapReduce
Data warehouse
Hadoop
HDFS
Ans:-
Data warehouse
Hadoop
Question: Recall some of the essential characteristics of stream processing.
Query continuous data velocity
Stream processing facilitates faster data ingestion
Query continuous data streams
Stream processing facilitate faster reports and insights
Ans:-
Query continuous data streams
Stream processing facilitate faster reports and insights
Question: What are some of the essential benefits provided with the implementation of data partitioning?
Auto-scalability
Auto-scalability
Enhanced security
Increased performance
Ans:-
Enhanced security
Increased performance
Question: Choose the prominent data complexity contributors.
Velocity
Category
Format
Transformation
Ans:-
Velocity
Category
Transformation
Question: Identify the essential elements of the CAP theorem.
Consistency
Integration
Partition tolerance
Scalability
Ans:-
Consistency
Partition tolerance
Question: Select the prominent distributed data management models.
File format-oriented services
Record-oriented files
Relational database service
Stream-oriented files
Ans:-
Record-oriented files
Relational database service
Question: Which of the following method calls can we use to create and update data in Elasticsearch?
Get
Put
Option
Post
Ans:-
Put
Post
Question: Identify the essential Read preferences that we can configure in MongoDB for Read optimizations.
Elementary preferred
Farthest preferred
Primary preferred
Secondary preferred
Ans:-
Primary preferred
Secondary preferred
Question: Specify the prominent data modelling methodologies that we can use to model data.
Bottom-up modelling
Top-down logical data modelling
Stripping features model
Dimension extraction
Ans:-
Bottom-up modelling
Top-down logical data modelling
Question: Identify some of the valid and essential MongoDB services.
mongos
mongop
mongoq
mongod
Ans:-
mongos
mongod
Question: Which languages can we use to implement serverless architecture with Lambda?
Swift
C
Node.js
Python
Ans:-
Node.js
Python
Question: Which steps are involved in data discovery?
Sharing data
Data de-identification
Cleansing and preparing data
Converting unstructured data
Ans:-
Sharing data
Cleansing and preparing data
Question: Which of the following steps are involved in deriving a successful data POC?
Identifying data transformation requirements
Evaluating analytical requirements
Cleansing and preparing data
Connecting and blending data
Ans:-
Identifying data transformation requirements
Evaluating analytical requirements
Question: Specify some of the essential data management patterns that we can use for microservices.
Data per service
Event generation
Event sourcing
Database per servic
Ans:-
Event sourcing
Database per servic
Question: Which are types of data architectures that we can implement?
Non-relational data store architecture
Stream processing architecture
Real-time processing architecture
Relational data store architecture
Ans:-
Non-relational data store architecture
Real-time processing architecture
Question: Recall some of the essential benefits with the implementation of clusters.
Simplified management
Increased performance
Auto-scalability
Faster data ingestion
Ans:-
Simplified management
Increased performance
Question: What are some of the important data design principles that are recommended for data design?
Scalability
Integration
Availability
Extensibility
Ans:-
Availability
Extensibility
Question: Identify some of the important characteristics of the serverless architecture.
Serverless architecture eliminates the server management dependency
With a serverless architecture the applications are hosted using third-party services
Serverless architecture eliminates the data management dependency
With a serverless architecture the applications are hosted using in-built services
Question: Match the following statements with the correct type of data that it is a feature of.
Instruction: Match each option with its correct target. Each category may have more than one match.
Answer Options:
A:Order of information is not important
B:Its full content is known before processing begins
C:Its state is dynamic
D:It is an infinite dataset that is never-ending
Streaming Data
A
B
C
D
Batch Data
A
B
C
D
Ans:- Streaming Data C, D
Batch Data A,B
Question: Which of the following are qualities that are desirable in a stream processing system?
Fault tolerant
Idempotent
Low throughput
High Latency
Ans:-
Fault tolerant
Idempotent
Question: The following types of data sinks used to store the results of the transformations applied on streaming data. Which ones are used mainly for debugging purposes?
Memory sink
Kafka sink
Console sink
File sink
Ans:-
Memory sink
Console sink
Question: Which of the following features of Structured Streaming in Spark 2.0 are true.
Any data input to Structured Streaming APIs is processed exactly once
The streaming data processed by Structured Streaming APIs are fault tolerant
APIs used to process batch data is different from the API used to process streaming data
Data that is arrived late cannot be processing by Structured Streaming APIs
Ans:-
Any data input to Structured Streaming APIs is processed exactly once
The streaming data processed by Structured Streaming APIs are fault tolerant
Question: Among the following technologies, which offer support for prefix integrity?
Apache Flink
Apache Spark
Apache Storm
Apache Kafka
Ans:-Apache Spark
Question: Match the following statements related to RDDs with their correct Boolean values
Answer Options:
A:Transformations on RDDs will update them
B:They contain a collection of logical groupings of fields
C:RDDs are stored across multiple nodes in the Spark cluster
D:If one of the nodes where an RDD resides crashes, the data on that node is lost
True
A
B
C
D
False
A
B
C
True:- B,C
False:- A, D
Question: Match the following statements with the corresponding output mode that it describes.
Answer Options:
A:Only the rows which were updated since the previous trigger will be written out
B:Rows which existed previously and get updated in the current trigger won’t be written out
C:The entire contents of the result table are written out to the output
D:The connector to the storage here will determine exactly how that output is written out
Ans:-
Update Mode
A
Complete Mode
C
D
Append Mode
B
Question: What two kinds of data are at issue when considering global standards?
Expensive data
Analyzed data
Impractical data
Personal data
Transactional data
Ans:-
Personal data
Transactional data
Question: Which is not a benefit of a data compliance program?
Makes data cheaper
Complying with the law
Risk reduction
Customer retention
Ans:- Makes data cheaper
Question: If you’re collecting personal data, what should you know?
How big it is
Why you need it
The budget
The cost
Ans:- Why you need it
Question: What should you define when creating reporting and response procedures?
Testing procedures
External reporting only
Doing nothing
Audit avoidance
Ans:- Testing procedures
Question: Which regulation protects credit card information?
HIPAA
GDPR
DMCA
PCI DSS
Ans:- PCI DSS
Question: When building a data compliance strategy, you should do what?
Avoid post-breach procedures
Create an internal reporting structure
Avoid business in countries where reporting is mandatory
Find ways to avoid notifying regulators
Ans:- Create an internal reporting structure
Question: Which is not a feature of Big Data?
Cost-effectiveness
Expensive to store
Behavioral analysis
Long-term storage
Ans:- Expensive to store
Question: What are the two key areas of data protection that companies should focus on?
Policies
Abuses
Regulations
Frameworks
Capturing
Ans:-
Policies
Regulations
Question: Managerial training should focus on what?
Data
Responsibility
Technology
Complexity
Ans:- Responsibility
Question: Who should attend data compliance training for users?
Upper management
Law enforcement
Your customers
Network admins
Ans:-
Question: Who should attend data compliance training for users?
Upper management
Law enforcement
Your customers
Ans:- Network admins
Question: Which is not an element of a good data compliance strategy?
State rationale
Focus on privacy
Define boundaries
Reduce costs
Ans:- Reduce costs
Question: Which is not a reason for big data’s need for a governance structure?
Misuse
Tools
Data loss
Abuse
Ans:- Tools
Question: How do big data paradigms differ from traditional data paradigms?
They offer different analytical models
Traditional data paradigms offer more sources of data
They can incorporate automation into data analysis using AI
Access is limited due to volume
Ans:-
They offer different analytical models
They can incorporate automation into data analysis using AI
Question: Data collection is only as useful as what?
Single points-of-failure
The way in which it’s used
Its ability to evolve
Regular backups
Ans:- The way in which it’s used
Question: Which is not a main need for data governance?
Data validation
Protect data
Domain
Provide reliability
Ans:-Domain
Question: Which is not a business benefit of the cloud?
60% up time
Scalability
Analytics
Onboarding data
Ans:-60% up time
Question: What should you align your stakeholder needs to?
Skills
Curve
Data
Skepticism
Ans:-Skills
Question: What should you do when identifying data?
Budget for vision
Start big
Isolate different types
Align big data with isolation
Ans:-Isolate different types
Question: When identifying the players in a data governance team, you should do what?
Expect technical expertise
Expect a mix of technical and business expertise
Avoid technical expertise
Avoid a mix of technical and business expertise
Ans:-Expect a mix of technical and business expertise
Question: Which is not a key to a successful governance structure?
Data
Inform
Discuss
Educate
Ans:- Data
Question: Which is not a pillar of a successful data governance strategy?
Ensure availability
Secure your data
Access your data
Facilitate data changes
ANS:- Access your data
Question: Which is not a key principle of big data governance?
Best practices
People
Behavior
Process
Ans:- Behavior
Question: What must be done prior to designing effective data access governance policies?
ACLS must be modified
Assets, users and groups must be inventoried
Regulatory compliance must be achieved
Apply software updates
Ans:- Assets, users and groups must be inventoried
Question: Which of the following are examples of PII?
Mother’s maiden name
Product documentation
Social security number
MAC address
Ans:-
Mother’s maiden name
Social security number
Question: Which data storage solution does not use a rigid schema?
NTFS
SQL
NoSQL
Microsoft BitLocker
Ans:-NoSQL
Question: Which data loss threat involves user deception resulting in the disclosure of sensitive data?
Ransomware
Collusion
Malware
Social engineering
Ans:-Social engineering
Question: Which mechanism can be used to assign file system permissions?
ACL
Encryption
DLP
Malware scanner
Ans:- ACL
Question: Which behavior accurately describes the result of applying share and NTFS permissions?
Allow permissions apply only to groups
Deny permissions apply only to groups
Deny permissions override Allow permissions
When combining share and NTFS permissions, the most permissive prevails
Ans:- Deny permissions override Allow permissions
Question: Which of the following constitutes multi-factor authentication?
Smart card, keyfob
Username, password, smart card
Username, password
Username, password, PIN
Ans:-Username, password, smart card
Question: Which of the following are settings related to Amazon Web Services (AWS) IAM users?
Programmatic access
Database access
ACL access
AWS console access
Ans:-
Programmatic access
AWS console access
Question: Which action is required to assign permission to an Amazon Web Services (AWS) IAM group?
Edit the AWS ACL
Attach a policy
Deploy device authentication
Enable multi-factor authentication
Ans:- Attach a policy
Question: Which statement regarding vulnerability assessments is correct?
They are less invasive than penetration tests
The exploit weaknesses
The identify weaknesses
They are more invasive than penetration test
Ans:-
They are less invasive than penetration tests
The identify weaknesses
Question: Which of the following is considered a detective control?
Firewall
Log files
Malware scanning
User training
Ans:- Log files
Question: For which activity does data classification most facilitate data access governance?
Cloud storage
Data backups
Permission assignment
Intrusion detection and prevention
Ans:- Permission assignment
Question: Which Linux solution is used to implement centralized logging?
iptables
rsyslog
Event viewer
SSH
Ans:- rsyslog
Question: You need to capture all network traffic for devices plugged into a network switch. Your machine is plugged into port 1 and you plan to run the Wireshark free packet capturing tool. What is wrong with this configuration?
Nothing is wrong. All traffic will be captured.
You will not see broadcast traffic.
You will not see multicast traffic.
You will only see your own unicast traffic
Ans:- You will only see your own unicast traffic
Question: Which benefit is derived from the use of Microsoft BitLocker?
Network traffic encryption
Disk volume encryption
File encryption
Folder encryption
Ans:- Disk volume encryption
Question: Which of the following can be used as criteria to filter a custom log view?
Event ID
MAC address
Default gateway
User name
Ans:-
Event ID
User name
Question: How can you ensure newly added files are automatically classified?
Enable classification through Group Policy
Create a PowerShell script
Enable compression on folders
Run classification rules on a schedule
Ans:- Run classification rules on a schedule
Question: Which SCCM item is deployed to collections to monitor compliance?
Configuration item
Software updates
Inventory
Baseline
Ans:- Baseline
Question: Which statement regarding asymmetric cryptography is the most correct?
A private key is used
Unrelated public and private keys are used
Mathematically related public and private keys are used
A public key is used
Ans:- Mathematically related public and private keys are used
Question: Which type of VPN is configured by default when created on Windows Server 2016?
PPTP
L2TP
SSL
IKEv2
Ans:-PPTP
Question: You have been tracking database querying over time in the headquarters office and have concluded that performance is unacceptable. Which solution will most likely improve performance the most?
Encryption of data in transit
Encryption of data at rest
Database replicas
In-memory caching
Ans:- In-memory caching
Question: Where do Windows file system audit events get logged?
Audit log
Applicaton log
System log
Security log
Ans:- Security log
Question: What are some of the skills required for practicing data science?
Marketing
Advertising
Domain-specific knowledge
Communication
Statistical analysis
Programming
Ans:-
Domain-specific knowledge
Communication
Statistical analysis
Programming
Question: After collecting a batch of raw data, what should we do next in our data wrangling pipeline?
Integration
Filtering
Gathering
Conversion
Exploration
Ans:- Filtering
Question: What are three major concerns when dealing with large datasets?
Storage
Human resources
Data analysis techniques
Security
Bandwidth
Ans:-
Storage
Data analysis techniques
Security
Question: Match each problem class to its appropriate machine learning method.
Answer Options:
A:Supervised learning
B:Unsupervised learning
C:Reinforcement learning
Training a model based on a reward system
A
B
C
ANS:- C
Detecting patterns in unlabeled data
A
B
C
Ans:- B
Modeling based on input and response variables
A
B
C
Ans:- A
Question: Match each data science term with its description.
Answer Options:
A:Data cleansing
B:Data integrity
C:Data anonymization
Removing personally identifying information from a dataset
A
B
C
ANS:- C
Verifying the accuracy of a dataset
A
B
C
ANS:-B
Removing invalid data from a dataset
A
B
C
ANS:-A
Question: What are some of the ways that we use to make sense of our data science results for others?
Algorithm analysis
Charts
Significant values
Software design
Narratives
Ans:-
Charts
Significant values
Narratives
Question: What step should we perform before statistical analysis and machine learning?
Expert systems design
Data cleaning
Plotting
Graphing
Ans:- Data cleaning
Question: What are the two most common programming languages used in data science?
R
Common Lisp
Perl
Python
Ada
Ans:-
R
Python
Question: What are some strategies we can use to gather the data that accumulates continuously?
incremental synchronization
scheduled tasks
automated scripts
bayesian analysis
constantly recheck it manually
Ans:-
incremental synchronization
scheduled tasks
automated scripts
Question: What are some of the common options that curl provides for downloading web data?
modifying user-agent string
passing form data
parsing cookie information
automatic link following
obeying robots.txt
metadata inspection
Ans:-
modifying user-agent string
passing form data
parsing cookie information
metadata inspection
Question: What is the main advantage of using a command line utility to convert spreadsheets to csv format?
Exception handling
Automation through scripting
Standards compliance
Resolving broken links
Ans:- Automation through scripting
Question: What advantages do libraries like agate provide?
data anonymization
resolving editor disputes
consistent tabular data manipulation
deployment simplicity
general purpose data wrangling
resolving language disputes
Ans:-
consistent tabular data manipulation
general purpose data wrangling
Question: Legacy tabular data formats, such as dbf, are often converted to which of the following formats?
csv
sql
sbt
json
gif
jpg
Ans:-
csv
sql
json
Question: Which of the following features does the python library BeautifulSoup provide?
javascript validation
making url requests
syntax highlighting
HTML tag extraction
HTML DOM manipulation
Ans:-
HTML tag extraction
HTML DOM manipulation
Question: Select the main difference between data and metadata.
metadata is never encrypted
metadata is always in plain text format
metadata contains only the sender and receiver of a message
metadata describes context not contents
data is always proprietary
Ans:- metadata describes context not contents
Question: What are some examples of HTTP header information?
character encoding
email address
content length
user agent
spam scor
Ans:-
character encoding
content length
user agent
Question: Which of the following are examples of information we can get about a client from web server logs?
Phone number
Email address
Geographic location
IP address
Document requested
Ans:-
IP address
Document requested
Question: What are some examples of standard email header information?
Content-type
X-Forwarded-For
X-Spam-Score
Remote-Server
Received
Ans:-
Content-type
Received
Question: If you do not specify a username for your ssh connection, what is the default user set to?
system
local username
root
guest
Ans:- local username
Question: Specifying the remote port number with scp is done with which command line switch?
p
P
r
R
Ans:- P
Question: When running rsync with new options, you should first test with which option?
test
u
del
dry-run
Ans:- dry-run00
Question: What are some of the common reasons for filtering data?
Random sampling
Excluding old data
Removing invalid data
Removing duplicates
Finding relations
Creating simple models
Ans:-
Excluding old data
Removing invalid data
Removing duplicates
Question: Choose the date format which represents a date in ISO 8601.
YYYY-MM-DD
MM/DD/YYYY
DD/MM/YY
MM/DD/YY
DD-MM-YY
Ans:- YYYY-MM-DD
Question: What does the content-type header in an HTTP response represent?
Language
Media type
Compression type
Character encoding
Ans:- Media type
Question: Which of the following commands will skip over the first row in a .csv file?
csvcut -r 2
awk ‘NR+1’
csvcut -r 1
tail -n +2
tail -n -1
awk ‘NR>1’
Ans:-
tail -n +2
awk ‘NR>1’
Question: An e-mail message header is separated from the body of the message using which delimiter?
A
tag
The ‘Sender’ header
A blank line
The pipe character ‘ || ‘
A null character
Question: What format must the data be in for uniq to function correctly?
It must be encoded in UTF-8
It must be sorted
It must be delimited by commas
It must be encoded in Latin1
Ans:- It must be sorted
Question: Select the information that may be found in a JPEG EXIF header.
Copyright
Compression algorithm
GPS coordinates
Camera make and model
Encryption algorithm
XML tags
Ans:-
Copyright
GPS coordinates
Camera make and model
Question: How does pdfgrep handle nonsearchable pdf files?
It will perform OCR first
It will prompt the user to perform OCR
It will find text in image data
It will return nothing
Ans:- It will return nothing
Question: Choose the most appropriate method to deal with impossible combinations in our data set.
Leave the data as is
Randomly change a value so that the combination is valid
Set the invalid data to 0
Drop the data point or record
Set the invalid data to N/A
Ans:- Drop the data point or record
Question: Which one of the following does the disallow directive in robots.txt instructs a parser to do?
Do not index the directory or file
Blacklist the web site
Do nothing
Access but do not log
Ans:- Do not index the directory or file
Question: Which response is a valid reason for converting from CSV to JSON?
The CSV file is small
You get to code a parser
The JSON file is compressed
A JSON format may be required by the software tool that you use
Ans:- A JSON format may be required by the software tool that you use
Question: Which response is a valid reason for converting from XML to JSON?
JSON is less verbose
XML is proprietary
XML is out of style
JSON is used on the web
JSON is compressed by default
Ans:- JSON is less verbose
Question: What might be a problem when converting from CSV to SQL?
SQL queries do not work on data that is imported from CSV
CSV data can only be stored in SQLite databases
SQL tables are slower to query than the CSV tables
CSV file may not contain the information about the data type
Ans:- CSV file may not contain the information about the data type
Question: Why should we export data from SQL to CSV?
To import it into a similar table elsewhere
To process the data
To speed up the database
To back up the database
Ans:-
To import it into a similar table elsewhere
To process the data
Question: Why should we transform a file from comma delimited to tab delimited?
To export to UTF-8 encoding
To allow for commas in the fields
To speed up the processing
To work better on the web
To work on a traditional UNIX operating system
Ans:- To allow for commas in the fields
Question: What is another name for the Unix time?
ISO 2281
January 1, 1970
Greenwich Mean Time
Epoch Time
ISO 8601
Ans:- Epoch Time
Question: Which method is used to find the absolute value of a number in Python?
absval
-1 * n
absolute
abs
Ans:- abs
Question: What is the difference between rounding and truncating a floating-point number?
Truncate and round perform the same function
Rounding always goes to the nearest integer value
Truncating removes all decimal points except for two
Rounding is faster
Ans:- Rounding always goes to the nearest integer value
Question: Which operation should we perform before carrying out OCR on an image?
Recompress the image to 75% quality
Remove header information from the image
Clean the image for easier OCR processing
Rename the image extension to .tiff
Ans:- Clean the image for easier OCR processing
Question: Which options are true in scenarios where the text is more easily extracted from the PDFs?
The PDF was generated from a digital process
The destination format is UTF-8
The text is scanned from a book
The LaTeX source code is used by the OCR software
The PDF contains searchable text
Ans:-
The PDF was generated from a digital process
The PDF contains searchable text
Question: What advantage does csvgrep give over regular grep?
csvgrep excludes duplicate data
csvgrep uses a faster search algorithm
csvgrep operates on a column-by-column basis
csvgrep operates line-by-line
Ans:-csvgrep operates on a column-by-column basis
Question: What are some of the common basic statistics supported by csvstat?
min
history
max
version
mean
Ans:-
min
max
mean
Question: What is the main limitation to querying CSV data with csvsql?
Queries are slow on large data sets
csvsql does not support ‘order by’
csvsql does not support ‘group by’
Not all SELECT operations are performed
Ans:- Queries are slow on large data sets
Question: Which of the following features does gnuplot support?
Saving plots in
Capture
Install
Plotting in ASCII text
Plotting from the command line
Creating interactive plots in JavaScript
Ans:- Plotting in ASCII text
Question: Which features does the wc utility provide?
word frequency count
character count
word count
paragraph count
line count
Ans:-
character count
word count
line count
Question: Select the tools used to explore subdirectories from the command line.
apt
find
xargs
scp
grep
tree
Ans:-
find
xargs
tree
Question: Put the following pseudocode steps in order to carry out a word frequency count.
count the occurances of each word
remove duplicate words
split the text into a list of words
Ans:-remove duplicate words
Question: What method is used to take systematic samples (instead of random)?
srand()
division
rand()
modulus operator %
Ans:- modulus operator %
Question: What is the first step in finding the top 10 rows in a tabular data set based on a particular field?
randomly sample 10 rows from the data set
loop through the data set for each of the top 10 rows
sort the data based on the particular field
sort the data based on weighted rank
Ans:- sort the data based on the particular field
Question: Which two SQL functions are used to count unique rows in a table?
min
order by
limit
distinct
count
avg
max
Ans:-
distinct
count
Question: What is a good median deviation value used to identify outliers?
1
3
2
4
Ans:- 3
Given the following table definitions, which column would you use to join the second table to the first?
ID,Name,Education,Salary_ID1,Steve,1,12,Bob,2,23,Jim,3,34,Jill,4,35,Jane,5,5ID,Salary2,750003,1000004,1250005,1500001,50000
Salary -> Salary_ID
ID -> ID
ID -> Salary_ID
ID -> Education
Ans:- ID -> Salary_ID
Select the answer that explains the following command.
cat *.gz > all-logs.gz
unzip all logs with .gz extension
symbolically link all .gz files with all-logs.gz
concatenate all .gz files into the file all-logs.gz
delete all .gz files after storing them in all-logs.gz
zip files using the gzip utility
Ans:- concatenate all .gz files into the file all-logs.gz
What sorting algorithm does the ‘sort’ command line utility use?
Bubble sort
Heapsort
Merge sort
Quicksort
Ans:-Merge sort
Question
Given an HTML table, select the appropriate compatability criteria for merging two tables into one.
The tables must have the same number of rows
A column cannot have missing elements
The columns in both tables must be in the same order before merging
The tables have the same number of columns
Both tables must have identical header elements:
A row cannot have missing elements
Ans:-
The columns in both tables must be in the same order before merging
The tables have the same number of columns
Select the functions that provide summarized or aggregated data.
min
count
distinct
sort
limit
sum
Ans:-
min
count
sum
Select the answer that best describes normalizing tabular data.
compiling tabular data into code
creating connecting tables in normal form
summarizing or aggregating tabular data
transforming data from long format to wide format
reshaping data from column-key format to key-value format
Ans:- reshaping data from column-key format to key-value format
Given the following table, how many columns would the denormalized table have?
ID,Property,Value1,name,Steve2,name,Bob3,name,Jill1,birth_year,19192,birth_year,19853,birth_year,20001,salary,1250002,salary,500003,salary,250000
7
9
6
8
4
5
Ans:-
Given the following table, how many columns would the denormalized table have?
ID,Property,Value1,name,Steve2,name,Bob3,name,Jill1,birth_year,19192,birth_year,19853,birth_year,20001,salary,1250002,salary,500003,salary,250000
7
9
6
8
4
5
Ans:- 4
What type of operations or functions are typically used to create a pivot table?
summary
limit
join
distinct
group by or aggregate
Ans:-
summary
group by or aggregate
Considering the date range inclusive of 2000 to 2015 for the given data set, how many rows would be added in the homogenized table?
year,value2004,20002005,30002006,90002007,105002008,86002009,92002010,140002012,15000
8
7
4
5
10
6
9
Ans:- 8
Question: Select the best description of the geometric definition of the dot product of two vectors.
Logarithm
Difference
Magnitude
Vector sum
Tangent angle
Ans:-
Question: Select the best description of the geometric definition of the dot product of two vectors.
Logarithm
Difference
Magnitude
Vector sum
Tangent angle
Question: Select the method which describes how to add two matrices together.
The matrices are adding component-wise
The matrices are multiplied component-wise
Rows in the left-hand matrix are added to columns in the right-hand matrix
Not selected. Not selected is correct.
Each component in the right-hand matrix is scaled by a factor of every component in the left-hand matrix
Ans:-The matrices are adding component-wise
Question: Given a matrix A, what values make up the diagonal matrix D in A’s factorization via Singular Value Decomposition (SVD)?
The identity matrix
The determinant of A
The diagonal entries from A
The singular values from A
The orthogonal eigenvalues from A
A unitary matrix of eigenvectors from A
Ans:- The singular values from A
Question: Select examples of discrete categorical data.
Blood type
Temperature
Precipitation level
True or false
Heads or tails
Eye color
Ans:-
Blood type
Eye color
Question: What is an event?
The set of outcomes in an experiment
The number of ways an event can occur divided by the total possible outcomes
The set of all possible outcomes
The result of a coin flip or dice roll
The random distribution of an experiment
Ans:- The set of outcomes in an experiment
Question: Given events A and B, select the answer that describes P(A and B).
A ∩ B
B ∖ A
A – B
A ∪ B
Ans:- A ∩ B
Given the additional rule for calculating probability, why do we need to subtract P(A and B) away from the sum?
P(A or B) = P(A) + P(B) – P(A and B)
To account for the central limit theorem
To account for independent events
To account for Bayes rule
To account for the intersection of events
Ans:- To account for the intersection of events
Select the formal name of the “Bell curve” distribution.
Normal distribution
Discrete distribution
Continuous uniform distribution
Binomial distribution
Poisson distribution
Ans:- Normal distribution
Select the description of a binomial distribution with a single trial.
Bell curve
Poisson event
Discrete value
Bernoulli trial
Ans:- Bernoulli trial
Select the answer that describes the prior probability in a Bayesian email spam algorithm.
The probability that any message is spam
The probability that a spam rule shows up in a ham message
The probability that any message is ham
The probability that a rule appears in a spam message
Ans:- The probability that any message is spam