Pandas Helper Functions
Jimmy Rousseau
Author: Jimmy Rousseau | Published: 8/20/2023

Some important and useful helper functions in Pandas

Pandas provides many helper functions that can be used to summarize data, transform data, and apply custom logic.

Information from this post heavily borrowed from the authors Kaggle Learning Article here

Summary functions

  • describe(): This function provides a high-level summary of the attributes of a column, such as the mean, standard deviation, minimum, and maximum values.

  • mean(): This function returns the mean of a column.

  • unique(): This function returns a list of the unique values in a column.

  • value_counts(): This function returns a count of the number of times each unique value appears in a column.

Maps

  • map(): This function takes a function as input and applies it to each value in a Series or DataFrame.

  • apply(): This function is similar to map(), but it takes a function that takes a row or column as input and returns a transformed version of that row or column. Here are some examples of how these functions can be used:

Examples

count 129971.000000 mean 88.447138 ... 75% 91.000000 max 100.000000 Name: points, Length: 8, dtype: float64

This method generates a high-level summary of the attributes of the given column. It is type-aware, meaning that its output changes based on the data type of the input. The output above only makes sense for numerical data; for string data here's what we get:

count 103727 unique 19 top Roger Voss freq 25514 Name: taster_name, dtype: object

To see a list of unique values and how often they occur in the dataset, we can use the value_counts() method:

Roger Voss 25514 Michael Schachner 15134 ... Fiona Adams 27 Christina Pickard 6 Name: taster_name, Length: 19, dtype: int64

To calculate the average wine rating, you could use the following code:

To get a list of all the unique wine regions, you could use the following code:

To count the number of times each wine region appears, you could use the following code:

To combine the country and region information for each wine, you could use the following code:

To remean the wine ratings to 0, you could use the following code:

The function you pass to map() should expect a single value from the Series (a point value, in the above example), and return a transformed version of that value. map() returns a new Series where all the values have been transformed by your function.

apply() is the equivalent method if we want to transform a whole DataFrame by calling a custom method on each row.

These are just a few examples of the many helper functions that Pandas provides. For more information, please see the Pandas documentation: https://pandas.pydata.org/docs/.