Panda

Theory

What is Pandas Series?

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). The difference between a NumPy array and a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn’t need to hold numeric data; it can hold any arbitrary Python Object.

How will you import library?

import numpy as np

import pandas as pd

How will one convert a list, numpy array, or dictionary to a Series?

Let us first create a list, an array and a dictionary:

labels = [‘a’,’b’,’c’]

my_list = [10,20,30]

my_arr = np.array([10,20,30])

my_dict = {‘a’:10,’b’:20,’c’:30}

now, for converting these to a series:

pd.Series(data=my_list)

output:

  • 10
  • 20
  • 30

       dtype: int64

A series can also be indexed by a label

pd.Series(data=my_list,index=labels)

output: a    10

               b    20

               c    30

               dtype: int64

or we can also write like:

pd.Series(my_list,labels)

output: a    10

               b    20

               c    30

Similarly, we can convert a numpy array to a series:

pd.Series(my_arr)

output: 0    10

              1    20

              2    30

              dtype: int32

pd.Series(my_arr,labels)

output: a    10

               b    20

               c    30

for converting a dictionary:

pd.Series(my_dict)

output: a    10

               b    20

               c    30

Create a dataframe with random numbers.

import pandas as pd

import numpy as np

from numpy.random import randn

df = pd.DataFrame(randn(5,4),index=[‘A’,’B’,’C’,’D’,’E’],columns=[‘W’,’X’,’Y’,’Z’])

df

How will one grab data from a particular column or columns in a dataframe.

df[‘X’]

Output:

A    0.018639

B   -1.044210

C   -1.432191

D   -0.689218

E   -1.306992

Name: X, dtype: float64

This returns a series

df[[‘X’,’Z’]]

This returns a dataframe.

DataFrame Columns are just Series.

How to add a new column to existing dataframe?

df[‘new’] = df[‘X’] + df[‘Z’]

How to drop a column and a row in pandas?

For dropping column, we must give axis=1

df.drop(‘new’,axis=1,inplace=True)

if inplace argument is not true then new column will not be removed permanently from df

Dropping a row

df.drop(‘E’,axis=0, inplace=True)

Explain conditional selection in dataframe using pandas.

Consider a dataframe with 4 columns and 4 rows:

For getting all the entries that are greater than 0, we can use:

df>0

 WXYZ
AFalseFalseTrueFalse
BTrueTrueTrueFalse
CTrueTrueTrueFalse
DFalseTrueTrueTrue
ETrueFalseTrueTrue

df[df>0]

 WXYZ
ANaNNaN0.070692NaN
B0.4332490.8490240.816763NaN
C0.4649471.0948420.010631NaN
DNaN1.8193950.0603820.484935
E0.580525NaN0.7475030.894294

df[df[‘X’]>0]

 WXYZ
B0.4332490.8490240.816763-1.495481
C0.4649471.0948420.010631-0.681866
D-0.2593431.8193950.0603820.484935

How to reset index and set it to something different?

Consider a dataframe

 WXYZ
A-1.1310961.292825-0.656096-0.028158
B-1.5593030.276486-0.5909940.520583
C3.3067511.359687-0.1372380.521303
D0.142652-0.495529-1.4178240.750177
E1.589873-0.2821131.042828-0.862563

df.reset_index()

 IndexWXYZ
0A-1.1310961.292825-0.656096-0.028158
1B-1.5593030.276486-0.5909940.520583
2C3.3067511.359687-0.1372380.521303
3D0.142652-0.495529-1.4178240.750177
4E1.589873-0.2821131.042828-0.862563

newind = ‘BL DL IX CC CH’.split()

df[‘States’] = newind

df

 WXYZStates
A-1.1310961.292825-0.656096-0.028158BL
B-1.5593030.276486-0.5909940.520583DL
C3.3067511.359687-0.1372380.521303IX
D0.142652-0.495529-1.4178240.750177CC
E1.589873-0.2821131.042828-0.862563CH

df.set_index(‘States’,inplace=True)

df

StatesWXYZ
BL-1.1310961.292825-0.656096-0.028158
DL-1.5593030.276486-0.5909940.520583
IX3.3067511.359687-0.1372380.521303
CC0.142652-0.495529-1.4178240.750177
CH1.589873-0.2821131.042828-0.862563

How to deal with missing data in Pandas?

Missing data is represented by Nan in pandas.

Let us create a dataframe with some missing values:

df = pd.DataFrame({‘A’:[1,2,np.nan],

                  ‘B’:[3,np.nan,np.nan],

                  ‘C’:[4,5,6]})

ABC
01.03.04
12.0NaN5
2NaNNaN6

For dropping all the rows from df having Nan value:

df.dropna()

ABC
01.03.04

For dropping all the columns from df having Nan value

df.dropna(axis=1)

C
04
15
26

For dropping all rows having Nan value except the rows with at least 2 non Nan values:

df.dropna(thresh=2)

ABC
01.03.04
12.0NaN5

Replacing Nan value with something else:

df.fillna(value=’X VAL’)

ABC
0134
12X VAL5
2X VALX VAL6

Explain different types of operations that can be performed on pandas.

Let us take a dataframe and perform various kinds of operation on it.

import pandas as pd

df = pd.DataFrame({‘col1′:[1,2,3,4],’col2′:[11,22,33,11],’col3’:[‘ab’,’cd’,’ef’,’gh’]})

df.head()

 col1col2col3
0111ab
1222cd
2333ef
3411gh

unique

To get unique values in the dataframe:

df[‘col2’].unique()

Out: array([11, 22, 33], dtype=int64)

nunique

To get the count of unique values in the dataframe:

df[‘col2’].nunique()

Out: 3

value_counts

To get the count of each value in the dataframe:

df[‘col2’].value_counts()

Out:

11    2

22    1

33    1

Name: col2, dtype: int64

isnull

To check for null values in the dataframe:

df.isnull()

col1col2col3
0FalseFalseFalse
1FalseFalseFalse
2FalseFalseFalse
3FalseFalseFalse

del

permanently removing a column

del df[‘col1’]

df

col2col3
011ab
122cd
233ef
311gh

Get column and index names:

df.columns

Out:

Index([‘col2’, ‘col3′], dtype=’object’)

df.index

Out:

RangeIndex(start=0, stop=4, step=1)

Sort_values

df.sort_values(by=’col2′)

col2col3
011ab
311gh
122cd
233ef

How do you read from and store data in a csv, excel and html file in pandas?

Reading a csv file into a dataframe:

df = pd.read_csv(‘filename’)

storing a dataframe as a csv file:

df.to_csv(‘filename’,index=False)

Reading a excel file into a dataframe

pd.read_excel(‘Excel_filename.xlsx’,sheetname=’Sheet1′)

storing a dataframe as excel file:

df.to_excel(‘Excel_filename.xlsx’,sheet_name=’Sheet1′)

Reading a html file into a dataframe

df = pd.read_html(‘http://www.fdic.gov/bank/individual/failed/banklist.html’)

What is concatenation? Explain with examples.

Concatenation glues together two or more DataFrames. Dimensions should match along the axis of the df you are concatenating on.

Lets create 2 dataframes df1 and df2

import pandas as pd

df1 = pd.DataFrame({‘A’: [‘A0’, ‘A1’, ‘A2’, ‘A3’],

                        ‘B’: [‘B0’, ‘B1’, ‘B2’, ‘B3’],

                        ‘C’: [‘C0’, ‘C1’, ‘C2’, ‘C3’],

                        ‘D’: [‘D0’, ‘D1’, ‘D2’, ‘D3’]},

                        index=[0, 1, 2, 3])

df2 = pd.DataFrame({‘A’: [‘A4’, ‘A5’, ‘A6’, ‘A7’],

                        ‘B’: [‘B4’, ‘B5’, ‘B6’, ‘B7’],

                        ‘C’: [‘C4’, ‘C5’, ‘C6’, ‘C7’],

                        ‘D’: [‘D4’, ‘D5’, ‘D6’, ‘D7’]},

                         index=[4, 5, 6, 7])

df1

 ABCD
0A0B0C0D0
1A1B1C1D1
2A2B2C2D2
3A3B3C3D3

df2

ABCD
4A4B4C4D4
5A5B5C5D5
6A6B6C6D6
7A7B7C7D7

Concatenating two dataframes:

pd.concat([df1,df2])

ABCD
0A0B0C0D0
1A1B1C1D1
2A2B2C2D2
3A3B3C3D3
4A4B4C4D4
5A5B5C5D5
6A6B6C6D6
7A7B7C7D7

pd.concat([df1,df2],axis=1)

ABCDABCD
0A0B0C0D0NaNNaNNaNNaN
1A1B1C1D1NaNNaNNaNNaN
2A2B2C2D2NaNNaNNaNNaN
3A3B3C3D3NaNNaNNaNNaN
4NaNNaNNaNNaNA4B4C4D4
5NaNNaNNaNNaNA5B5C5D5
6NaNNaNNaNNaNA6B6C6D6
7NaNNaNNaNNaNA7B7C7D7

Explain dataframes joins with example.

Joining is a convenient method for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame.

left = pd.DataFrame({‘A’: [‘A0’, ‘A1’, ‘A2’],

                     ‘B’: [‘B0’, ‘B1’, ‘B2’]},

                      index=[‘K0’, ‘K1’, ‘K2’])

right = pd.DataFrame({‘C’: [‘C0’, ‘C2’, ‘C3’],

                    ‘D’: [‘D0’, ‘D2’, ‘D3’]},

                      index=[‘K0’, ‘K2’, ‘K3’])

left.join(right)

ABCD
K0A0B0C0D0
K1A1B1NaNNaN
K2A2B2C2D2

left.join(right, how=’outer’)

 A B CD
K0A0B0C0D0
K1A1B1NaNNaN
K2A2B2C2D2
K3NaNNaNC3D3

Explain merge operation.

The merge function merge DataFrames together using a similar logic as merging SQL Tables together. For example:

left = pd.DataFrame({‘key1’: [‘K0’, ‘K0’, ‘K1’, ‘K2’],

                     ‘key2’: [‘K0’, ‘K1’, ‘K0’, ‘K1’],

                        ‘A’: [‘A0’, ‘A1’, ‘A2’, ‘A3’],

                        ‘B’: [‘B0’, ‘B1’, ‘B2’, ‘B3’]})

right = pd.DataFrame({‘key1’: [‘K0’, ‘K1’, ‘K1’, ‘K2’],

                               ‘key2’: [‘K0’, ‘K0’, ‘K0’, ‘K0’],

                                  ‘C’: [‘C0’, ‘C1’, ‘C2’, ‘C3’],

                                  ‘D’: [‘D0’, ‘D1’, ‘D2’, ‘D3’]})

Inner merge:

pd.merge(left, right, on=[‘key1’, ‘key2’])

ABkey1key2CD
0A0B0K0K0C0D0
1A2B2K1K0C1D1
2A2B2K1K0C2D2

Outer merge:

pd.merge(left, right, how=’outer’, on=[‘key1’, ‘key2’])

     ABkey1key2CD
0A0B0K0K0C0D0
1A1B1K0K1NaNNaN
2A2B2K1K0C1D1
3A2B2K1K0C2D2
4A3B3K2K1NaNNaN
5NaNNaNK2K0C3D3

Right merge:

pd.merge(left, right, how=’right’, on=[‘key1’, ‘key2’])

  A   Bkey1key2CD
0A0B0K0K0C0D0
1A2B2K1K0C1D1
2A2B2K1K0C2D2
3NaNNaNK2K0C3D3

Left merge:

pd.merge(left, right, how=’left’, on=[‘key1’, ‘key2’])

 ABKey 1key2CD
0A0B0K0K0C0D0
1A1B1K0K1NaNNaN
2A2B2K1K0C1D1
3A2B2K1K0C2D2
4A3B3K2K1NaNNaN

Explain groupby.

import pandas as pd

data = {‘Company’:[‘ABC’,’DEF’,’GHI’,’JKL’,’MNO’,’XYZ’],

       ‘Person’:[‘OM’,’JAI’,’RAM’,’SHYAM’,’AMY’,’SARA’],

       ‘Sales’:[200,325,229,124,470,350]}

df = pd.DataFrame(data)

df

:

CompanyPersonSales
0GOOGLEOM200
1GOOGLEJAI325
2MSFTSONA229
3MSFTMONA124
4FBRAM470
5FBSARA350

Using groupby we can get sales data of companies

by_comp = df.groupby(“Company”)

by_comp.mean()

CompanySales
FB410.0
GOOGLE262.5
MSFT176.5

OBJECTIVE QUIZ

In the following Python code, typing which Python command will give the user the CEO of Facebook?

import pandas as pd

companies_ceo = {
‘Amazon’ : ‘Jeff Bezos’
‘Apple’ : ‘Tim Cook‘,
‘SpaceX’: ‘Elon Musk‘
‘Facebook’: ‘Mark Zuckerberg’
‘Netflix’: ‘Reed Hastings’
}
companies_ceo_series= pd.Series(companies_ceo)

companies_ceo_series[‘Facebook’]

companies_ceo_series[4]

companies_ceo_series[3]

companies_ceo_series.values[4]

Ans:-

companies_ceo_series[‘Facebook’]

companies_ceo_series[3]

In the following Python code, typing what command will create a DataFrame called “companies_ceo” whose first column has all the entries of the ‘companies’ list and whose second column has all the entries of the ‘ceo’ list, with the column names as the names of the respective variables?

import pandas as pd
companies = {
‘Amazon’
‘Apple’
‘SpaceX’
‘Facebook’
‘Netflix’
}

ceo = {
‘Jeff Bezos’
‘Tim Cook‘,
‘Elon Musk‘
‘Mark Zuckerberg’
‘Reed Hastings’
}

companies_ceo_tuple = list (zip(companies, ceo)) companies_ceo = pd.dataframe(companies_ceo_tuple, columns=[‘companies’, ‘ceo’])

companies_ceo = list (zip(companies,ceo))

companies_ceo= pd.dataframe(companies, ceo)

companies_ceo_tuple = list (zip(companies, ceo)) companies_ceo = pd.dataframe(companies_ceo_tuple)

Ans:- companies_ceo_tuple = list (zip(companies, ceo)) companies_ceo = pd.dataframe(companies_ceo_tuple, columns=[‘companies’, ‘ceo’])

What happens when we call the stack () function on a Pandas DataFrame

It will create a new DataFrame from two DataFrames by stacking them side by side

It will create a new DataFrame from two DataFrames by stacking one on top of the other

It will create a new DataFrame such that a single row in the original DataFrame is stacked into multiple rows in the new DataFrame depending on the number of columns for each row in the original DataFrame

It will create a new DataFrame such that a single row in the new DataFrame is a combination of multiple rows in the original DataFrame depending on the number of columns for each row in the original DataFrame

Ans:- It will create a new DataFrame such that a single row in the original DataFrame is stacked into multiple rows in the new DataFrame depending on the number of columns for each row in the original DataFrame

Consider the following Python code. What command would you use to iterate through the “companies_ceo” DataFrame and print the list of all the CEOs in this DataFrame?

import pandas as pd

companies = {
‘Company’ : [‘Facebook’, ‘Apple’, ‘Amazon’, ‘Netflix’],

            'CEO' : [‘Mark Zuckerberg’, 'Jeff Bezos’, 'Tim Cook', ',’Reed Hastings’ ],

            }

companies_ceo = pd.DataFrame(companies_ceo)

for row in companies_ceo.iterrows(): print(row[1])

for row in companies_ceo.iterfields(): print(row.CEO)

for row in companies_ceo.iterrecords(): print(row.CEO)

for row in companies_ceo.itertuples(): print(row.CEO)

Ans:-

for row in companies_ceo.iterrows(): print(row[1])

for row in companies_ceo.itertuples(): print(row.CEO)

Which of the following formats does Pandas not support natively when exporting the contents of a Dataframe?

JSON

CSV

JPEG

HTML

Ans:- JPEG

Let’s say you have created a Pandas DataFrame called “unsorted” and you want to sort the contents of this DataFrame column wise in alphabetical order of the header name. Then, which function would you call on the “unsorted” DataFrame to do so?

unsorted.sort_index()

unsorted.sort_col(axis=1)

unsorted.index_sort()

unsorted.sort_index(axis=1)

Ans:- unsorted.sort_index(axis=1)

Match the following functions that you can call on a Pandas DataFrame correctly with what they do

Answer Options:
A:.isnull()
B:.notnull()
C:.fillna(0)
D:.dropna()

Returns a Boolean array containing true or false values and returns the value in a cell as true if it contains NaN

A

B

C

D
Ans:- A

All the rows which contain a NaN value in any cell of that row are removed

A

B

C

D
Ans:- D

Returns a Boolean array containing true or false values and returns the value in a cell as true if it does not contain NaN

A

B

C

D
Ans:- B

Every cell in the Dataset which has a NaN value will be replaced with 0

A

B

C

D
Ans:- C

Match the following statements related to the .xs function in Pandas DataFrame with their correct Boolean values.

Answer Options:
A:The .xs function is used when our Pandas DataFrame makes use of a MultiIndex
B:By default, the .xs function only takes a look at values in the first level index
C:The .xs function cannot be used to return a cross section of columns

False

A

B

C
Ans:- C

True

A

B

C
Ans:- A, B

Let’s say you have imported Python as pd and have instantiated two DataFrames called “frame_1” and “frame_2” with the exact same schema. What command will you use to combine these two DataFrames into a single DataFrame and make sure that the combined DataFrame has its own unique index?

pd.concat( [frame_2, frame_1,] )

pd.concat( [frame_2, frame_1], ignore_index = True )

pd.concat( [frame_1, frame_2], ignore_index = True )

pd.concat( [frame_1, frame_2] )

Ans:-

pd.concat( [frame_2, frame_1], ignore_index = True )

pd.concat( [frame_1, frame_2], ignore_index = True )

The ‘how’ argument in the Pandas merge function allows us to specify what kind of join operation we want to perform on the given Pandas DataFrames. What are the valid values that we can give for this argument?

inner

left

outer

right

Ans:-

inner

left

outer

right

Some statements related to working with SQL Databases in Python are given below. Match them with their correct Boolean values.

Answer Options:
A:The sqlite3 library in Python allows us to create Databases on our local file system
B:Once we have created a table, we can use sqlite3’s .execute() function to recreate the same table with the same table name so that we have duplicates of a table
C:All the changes that we make to an SQL database on a Jupyter notebook by connecting with it, will be committed to the database only after we execute sqlite3’s .commit() function

False

A

B

C
Ans:- B

True

A

B

C
Ans:- A,C

Which of these correctly match the following libraries in the Numpy ecosystem with what that library is used for?

be used once.
Answer Options:
A:Statsmodel
B:Scikit-learn
C:Bokeh

Used to perform statistical operations

A

B

C
Ans:A

Specifically meant for machine learning, data mining, and data analysis

A

B

C
Ans:- B

Data visualization tool used for large datasets

A

B

C
Ans:- C

Match the following statements related to the iloc indexer in Pandas with the correct boolean values.

Answer Options:
A:The iloc indexer is similar to the loc indexer and can be used to access records located at a particular index in a Pandas DataFrame
B:The column headers can be passed as input arguments in the form of a string to the iloc function without any errors
C:When we pass 2:6 as input argument to the iloc function, we get all details of the records located in the second index all the way up to the 5th index of the DataFrame

True

A

B

C
Ans:- A
False

A

B

C
Ans:- B,C

Let’s say you have saved a dataset in a pandas DataFrame called “dataset” which has tons of records and you only want to access the details of the records in only the 5th, 8th and 14th index. Which of these Python commands can you use to do so?

dataset.loc[5,8,14]

dataset(5,8,14, 

dataset.loc[[5,8,14],:]

dataset(5,8,14)

Question: Let’s say you have a pandas DataFrame called “panda” which has 8 rows in total and you want to remove the last row from this DataFrame. Which of these Python commands would you use to do so?

panda.drop(panda.index[8])

panda.delete(panda.column[8])

panda.drop(panda.column[7])

panda.drop(panda.index[7])

Ans:- panda.drop(panda.index[7])

Question: Which of these statements related to the pivot function in Pandas is true?

The pivot function is used in aggregating data from various columns of a DataFrame to create a new DataFrame

The Pivot function summarizes the details of each column in a DataFrame

The combination of the row index and the column header must be unique in order to generate a pivot table

Ans:-

The Pivot function summarizes the details of each column in a DataFrame

The combination of the row index and the column header must be unique in order to generate a pivot table

Question: When using the pyplot library to plot graphs on the data stored in a dataframe, what would you specify the “kind” of your chart to be, if you want to plot a horizontal bar graph?

bar

pie

hist

barh

Ans:- barh

Question: What information about the column of a pandas dataframe is returned when you call the describe() function on it?

Sum of all the values in the column

The number of records

Basic statistical measures

The number of cells with NaN values

Ans:-Basic statistical measures

Question: Let’s say you have a dataframe called “df” with columns “c1”, “c2” and “c3”. You want to remove the column “c2” from this dataframe. Which of these operations can help you do that?

df.drop([“c2”], axis=0)

remove df [“c2”]

del df [“c2”]

df.drop([“c2”], axis = 1)

Ans:- remove df [“c2”]

Question: Match the following statements related to Pandas DataFrames with the correct boolean values.

Answer Options:
A:All the data within a particular column in a Pandas DataFrame must be of the same data type
B:Data in different columns of a Pandas DataFrame cannot be of different data types
C:Once a Pandas DataFrame has been created, it is not possible to add a new column to this DataFrame

False

A

B

C
Ans:- B, C
True

A

B

C
Ans:- A

Question: Match the following statements related to the concept of multiIndex in Pandas with the correct Boolean values.

Answer Options:
A:MultiIndex lets the user effectively store and manipulate higher dimensional data in a 2-dimensional tabular structure
B:The MultiIndex for a row is some composite key made up of exactly one column
C:MultiIndex is useful when we have large datasets where using numeric indexes to refer to each record is unintuitive

False

A

B

C
Ans:- B
True

A

B

C
Ans:- A,C

Question: In Pandas, what does a series object represent?

A single string character

A single integer

A 2-D matrix of data

A vector of data

Ans:- A vector of data

Question: Let’s say you have a dataframe called “stock” which has the price and other details of a particular stock. “stock” has a column called “price”, which has the daily closing price of the stock over the last 5 weeks.
Which of these commands would help extract the returns of the stock price?

stock[“price”].change()

stock[“price”].returns()

stock[“price”].pct_change()

stock[“price”].diff()

Ans:- stock[“price”].pct_change()

What would be the output of the following program?

import pandas as pd
pd.Timestamp(‘01/01/2000’) \

  • pd.Timedelta(‘2days 2hours 1minute 1second’)

Timestamp (‘2000-01-03 02:01:01’)

Timestamp (‘2000-03-01 02:01:01’)

Timestamp (‘2000-01-03 01:01:02’)

Timestamp (‘2000-03-01 01:01:02’)

Ans:- Timestamp (‘2000-01-03 02:01:01’)

Let’s say you have a dataframe called df which has NaN values in the first few records of the dataframe which is then followed by valid values. You want to fill those values with the first observed value of that column.

Which of these functions would you call on df to do so?

ffill()

sfill()

efill()

bfill()

Ans:- bfill()

How would you access the element 3 from the “pandas_series” variable?

import pandas as pd
pandas_series = pd.Series([1, 2, 3])
pandas_series.index = [‘1’, ‘2’, ‘3’]

pandas_series [‘2’]

pandas_series [2]

pandas_series [‘3’]

pandas_series [3]

Ans:-

pandas_series [2]

pandas_series [‘3’]

Let’s say you have two pandas series called “one” and “two” and you want to print all those values in “one” that are also there in “two”.

Then which of these commands would you use to do so?

one[~one.isin(two)]

one.isin(two)

two[two.isin(one)]

one[one.isin(two)]

Ans:-

two[two.isin(one)]

one[one.isin(two)]

What would be the output of the following line of code?

import pandas as pd
pandas_series = pd.Series([1, 2, 2, 3, 3, 99])
pandas_series.unique()

array([1, 2, 3, 99])

4

6

array([2, 3])

Ans:- array([1, 2, 3, 99])

In a Pandas dataframe, what is each column represented by?

Python Dictionary

Python list

Pandas Series

Numpy Array

Ans:- Pandas Series

Let’s say you have a dataframe called df with 6 columns and with 220 records. You want a subset of this dataframe which includes the first 4 columns from the left and the records from index 100 to 110.

Which of these commands can you use to do so?

df.iloc[100:111, 0:4]

df.loc[100:111, 0:4]

df.iloc[100:110, 0:3]

df.loc[100:10, 0:3]

Ans:- df.iloc[100:111, 0:4]

What will be the values in “new_string” variable at the end of this program?

string = pd.Series([‘Ace’, ‘Fault’, ‘Error’])

new_string = string.str.swapcase()

[‘ace’, fault’, ‘error’]

[‘aCE’, ‘fAULT’, ‘eRROR’]

[‘Ace’, ‘Fault’, ‘Error’]

[‘ACE’, FAULT’, ‘ERROR’]

Ans:- [‘aCE’, ‘fAULT’, ‘eRROR’]

Let’s say you have a pandas series called “series” which has the contents: [‘ae’, ‘bf’, ‘cg’].

What function would you call to find the letter “e” in the series and the starting index of the matched substring?

str.findall(‘e’)

str.match( ‘e’)

str.get(‘e’)

str.find(‘e’)

Ans:- str.find(‘e’)

What will be the values in new_string at the end of this program?

string = pd.Series([‘top’, ‘bottom’, side])

new_string = string.apply(lambda x:x.upper())

[TOP, ‘BOTTOM’, SIDE’]

[‘top’, bottom’, ‘side’]

[‘Top’, Bottom’, ‘Side’]

[‘tOP’, ‘bOTTOM’, ‘sIDE]

Ans:- [TOP, ‘BOTTOM’, SIDE’]

What function would you use to convert the values in “string” to integer form?

import pandas as pd
string = pd.Series([‘1’, ‘2’, ‘3’])

to_numeric(string)

string.to_numeric

string.pd.to_numeric

pd.to_numeric(string)

Ans:- pd.to_numeric(string)

Let’s say you have a dataframe called “df” and you want to find the minimum value in each row of this dataframe.

Which of these commands would you use to do so?

df (min, axis = 1)

df (min, axis = 0)

df.apply (min, axis = 1)

df.apply (min, axis = 0)

Ans:- df.apply (min, axis = 1)

Question

Which of these built in methods of a dataframe would you use to return the number of NA values in each column of your dataframe?

df.isna()

df.na()

df.isna().sum()

df.sum()

Ans:- df.isna().sum()

In the context of a pandas dataframe, what does the dropna( ) function do?

All the cells which have the value NA are replaced with a 0

Calculates the number of records which have even a single field with an NA

All the cells which have the value NA are deleted

All the records which have even a single field with NA have are removed

Ans:-

All the records which have even a single field with NA have are removed

In the context of a pandas dataframe, which of these functions returns the median of a column?

quantile(0.75)

mean()

quantile(0.5)

median()

Ans:-

quantile(0.5)

median()

Question: Which of these statements related to the Pandas Series object are true?

Since Numpy is built on top of Pandas, it is very straightforward to convert a Pandas Series object into a Numpy array

Pandas Series object is similar to a Python list

Once we create a Pandas Series object, an index representing the positions for each of the data points is automatically created for the list

Ans:-

Pandas Series object is similar to a Python list

Once we create a Pandas Series object, an index representing the positions for each of the data points is automatically created for the list

Question: Let’s say you have a pandas DataFrame called “frame” and you want to export this DataFrame along with its index as a CSV file called “data_frame” located in the datasets folder of our workspace.

data_frame.to_csv(‘datasets/ frame.csv’)

data_frame.csv(‘datasets/ frame.csv’)

frame.to_csv(‘datasets/data_frame.csv’)

frame.csv(‘datasets/data_frame.csv’)

Ans:- frame.to_csv(‘datasets/data_frame.csv’)