When working with data from a pandas DataFrame, oftentimes we want to select a range of cells rather than specific ones. To do this, we can use slice objects.
Slice objects indicates how an object should be sliced by specifying:
start - where the slice should begin
stop - where the slice should end
step - the width of the slice (e.g. slicing every second item, every third item, etc...)
To explore ways of slicing a DataFrame with slice objects, let's look at a DataFrame of instruments:
One way to generate a slice object is with the slice function. There are three possible parameters: start, stop, and step. They follow the format: slice(start = 0, stop, step = 1)
If not specified, the start parameter is set to 0 by default and the step parameter is set to 1 by default. However, the stop parameter doesn't have any default value, so a value must be specified for the slice function to run:
Remember that in almost all cases in Python, the stop value is not inclusive. Since we specified our stop value as 2, the output includes rows at index 0 and 1, but not the row at index 2.
Using the slice Function With All Three Parameters
Let's say we are interested in learning how to play an instrument, but have no prior musical experience. To start, we only want to look at every other non-string instrument in our DataFrame of instruments:
The first non-string instrument in df appears in the row at index 3 (our start value). We want to view every instrument up to and including the ninth row, so our stop value is 10. And, since we want to view every other instrument, our step value is 2.
The more common way to generate a slice object is to use indexing syntax which follows the format: x[start:stop:step] (where x can be a DataFrame, a list, etc...).
Indexing syntax is similar to the slice function. However, one notable difference for this format is that all three parameters are optional.
Using All Default Values in a DataFrame slice
Unlike the slice function, we don't have to specify any parameters when using indexing syntax. Not including any parameter is called an empty slice. Inputting an empty slice will output the entire DataFrame:
Remember, the slice function requires at least the stop parameter. With the indexing syntax, we have the option to specify a single parameter, and it can be either the start, stop, or step. The generic format is: df[start:stop], when one : is used df[start:stop:step], when two : are used
The default values remain the same (start=0 by default, step=1 by default) and the default value of end is the length of the sliced DataFrame. Example of one parameter:
import pandas as pd\n \n# Creating a DataFrame with "instrument", "family", and "material" columns\ndf = pd.DataFrame([\n {"instrument": "harp", "family": "strings", "material": "wood"},\n {"instrument": "violin", "family": "strings", "material": "spruce"},\n {"instrument": "guitar", "family": "strings", "material": "mahogany"},\n {"instrument": "clarinet", "family": "woodwind", "material": "mpingo"},\n {"instrument": "recorder", "family": "woodwind", "material": "boxwood"},\n {"instrument": "flute", "family": "woodwind", "material": "silver"},\n {"instrument": "xylophone", "family": "percussion", "material": "wood"},\n {"instrument": "marimba", "family": "percussion", "material": "rosewood"},\n {"instrument": "trombone", "family": "brass", "material": "brass"},\n {"instrument": "trumpet", "family": "brass", "material": "brass"}])\n# starts=7\n# default values for `end` and `step=1`\n# => slice contains only index 7 and larger:\ndf[7:]
After some consideration, we conclude that practicing four instruments requires too many hours of practice. Instead, we decide to narrow down our options to either a string instrument or a percussion instrument. To do this, we can combine two different slices of our DataFrame using concatenation:
When generating slice objects, we can use negative numbers. For DataFrames, inputting a negative number will start from the bottom index and work upward.
For example, say we want to look at the last row of our DataFrame: