Creating a DataFrame from a TSV file using Pandas


Many datasets are provided in a tab-separated value file format (file extension .tsv). While mostly used to read files with comma-separated values, pd.read_csv can also handle tab-separated files. With this function, there are two primary ways to read in a file.

Reading in a TSV file from a URL

Many datasets are publicly available online, and accessible by url. One example of this is the Callisto Crater Database provided by the Lunar Planetary Institute, a partner of NASA. This data details the different craters on Callisto, a moon of Jupiter.

To load this dataset, input the URL as a string to the read_csv function. To make sure the dataset is read properly, we will use the sep parameter. Usually csvs are separated by commas, so the default is sep=','. However, our dataset is separated by tabs, meaning we will specify the separator to be '\t'.

Additionally, for this dataset, the columns are not in the first row of data. To fix this, we will set the header parameter to be 1, so it reads the column headers from the second row in the file.

import pandas as pd
df = pd.read_csv("https://www.lpi.usra.edu/research/cc/ccraters", sep="\t", header=1)
df
NameCrater TypeLatitudeLongitudecentral domecentral pit/ringrimrim (predicted)pedestalejectaringsTerrace WidthRimwall WidthRim HeightDepthDepth MethodTerrainImageResolutionNote
0ValhallaMV1556NaNNaNNaN980.0NaN~1800~3800NaNNaNNaNNaNNaNdcnumerous1.0numerous furrows, ridges, central bright floor...
1AsgardMV32142NaNNaNNaN660.0NaN~1400~1880NaNNaNNaNNaNNaNdc20606.213.0numerous furrows, central bright floor deposit...
2ZS05:232MV-5232NaNNaNNaNNaNNaNNaN~550NaNNaNNaNNaNNaNdc20617.211.7furrows
3ZN34:358MV34358NaNNaNNaNNaNNaNNaN~350NaNNaNNaNNaNNaNdc16421.431.7furrows; Galileo target
4ZS64:350PA-64350NaNNaNNaN295.0~450NaNNaNNaNNaNNaNNaNNaNdc16418.062.7oblique view; Galileo target
...............................................................
125ZN11:321CP11321NaN845.0NaNNaNNaNNaNNaNNaNNaN0.88shdc16421.271.8NaN
126ZN44:001CP441NaN944.0NaNNaNNaNNaNNaNNaNNaNNaNNaNcm16426.341.0see Anarr
127ZN39:262CP39262NaN844.0NaNNaNNaNNaNNaNNaNNaNNaNNaNdc20616.571.7NaN
128ZN28:016CP2816NaN940.0NaNNaNNaNNaNNaNNaNNaNNaNNaNdc16424.301.1NaN
129ZN68:339CP68339NaN738.0NaNNaNNaNNaNNaNNaNNaNNaNNaNdc16426.461.0NaN
Loading a TSV file from a URL

Reading in a TSV file from your local computer

In this next example, we will continue to work with the same dataset as above, the Callisto Crater Database. Once this database is downloaded, the default file name is ccraters.tsv.

To read in a TSV file from your computer:

  • Make sure the Excel file is in the same folder as the python file you are working with.
  • Specify the file name as a string into the read_csv function.
import pandas as pd
df = pd.read_csv("ccraters.tsv", sep="\t", header=1)
df
NameCrater TypeLatitudeLongitudecentral domecentral pit/ringrimrim (predicted)pedestalejectaringsTerrace WidthRimwall WidthRim HeightDepthDepth MethodTerrainImageResolutionNote
0ValhallaMV1556NaNNaNNaN980.0NaN~1800~3800NaNNaNNaNNaNNaNdcnumerous1.0numerous furrows, ridges, central bright floor...
1AsgardMV32142NaNNaNNaN660.0NaN~1400~1880NaNNaNNaNNaNNaNdc20606.213.0numerous furrows, central bright floor deposit...
2ZS05:232MV-5232NaNNaNNaNNaNNaNNaN~550NaNNaNNaNNaNNaNdc20617.211.7furrows
3ZN34:358MV34358NaNNaNNaNNaNNaNNaN~350NaNNaNNaNNaNNaNdc16421.431.7furrows; Galileo target
4ZS64:350PA-64350NaNNaNNaN295.0~450NaNNaNNaNNaNNaNNaNNaNdc16418.062.7oblique view; Galileo target
...............................................................
125ZN11:321CP11321NaN845.0NaNNaNNaNNaNNaNNaNNaN0.88shdc16421.271.8NaN
126ZN44:001CP441NaN944.0NaNNaNNaNNaNNaNNaNNaNNaNNaNcm16426.341.0see Anarr
127ZN39:262CP39262NaN844.0NaNNaNNaNNaNNaNNaNNaNNaNNaNdc20616.571.7NaN
128ZN28:016CP2816NaN940.0NaNNaNNaNNaNNaNNaNNaNNaNNaNdc16424.301.1NaN
129ZN68:339CP68339NaN738.0NaNNaNNaNNaNNaNNaNNaNNaNNaNdc16426.461.0NaN
Loading a TSV file from your computer

More examples are provided in our read_csv guide. Remember, you need to add sep="\t" if your file is in CSV format:

Pandas Documentation

The full documentation for read_csv is available in the pandas documentation.