Creating a DataFrame from a TSV file using Pandas

Many datasets are provided in a tab-separated value file format (file extension .tsv). While mostly used to read files with comma-separated values, pd.read_csv can also handle tab-separated files. With this function, there are two primary ways to read in a file.

Reading in a TSV file from a URL

Many datasets are publicly available online, and accessible by url. One example of this is the Callisto Crater Database provided by the Lunar Planetary Institute, a partner of NASA. This data details the different craters on Callisto, a moon of Jupiter.

To load this dataset, input the URL as a string to the read_csv function. To make sure the dataset is read properly, we will use the sep parameter. Usually csvs are separated by commas, so the default is sep=','. However, our dataset is separated by tabs, meaning we will specify the separator to be '\t'.

Additionally, for this dataset, the columns are not in the first row of data. To fix this, we will set the header parameter to be 1, so it reads the column headers from the second row in the file.

import pandas as pd
df = pd.read_csv("https://www.lpi.usra.edu/research/cc/ccraters", sep="\t", header=1)
df

	Name	Crater Type	Latitude	Longitude	central dome	central pit/ring	rim	rim (predicted)	pedestal	ejecta	rings	Terrace Width	Rimwall Width	Rim Height	Depth	Depth Method	Terrain	Image	Resolution	Note
0	Valhalla	MV	15	56	NaN	NaN	NaN	980.0	NaN	~1800	~3800	NaN	NaN	NaN	NaN	NaN	dc	numerous	1.0	numerous furrows, ridges, central bright floor...
1	Asgard	MV	32	142	NaN	NaN	NaN	660.0	NaN	~1400	~1880	NaN	NaN	NaN	NaN	NaN	dc	20606.21	3.0	numerous furrows, central bright floor deposit...
2	ZS05:232	MV	-5	232	NaN	NaN	NaN	NaN	NaN	NaN	~550	NaN	NaN	NaN	NaN	NaN	dc	20617.21	1.7	furrows
3	ZN34:358	MV	34	358	NaN	NaN	NaN	NaN	NaN	NaN	~350	NaN	NaN	NaN	NaN	NaN	dc	16421.43	1.7	furrows; Galileo target
4	ZS64:350	PA	-64	350	NaN	NaN	NaN	295.0	~450	NaN	NaN	NaN	NaN	NaN	NaN	NaN	dc	16418.06	2.7	oblique view; Galileo target
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
125	ZN11:321	CP	11	321	NaN	8	45.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.88	sh	dc	16421.27	1.8	NaN
126	ZN44:001	CP	44	1	NaN	9	44.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	cm	16426.34	1.0	see Anarr
127	ZN39:262	CP	39	262	NaN	8	44.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	dc	20616.57	1.7	NaN
128	ZN28:016	CP	28	16	NaN	9	40.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	dc	16424.30	1.1	NaN
129	ZN68:339	CP	68	339	NaN	7	38.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	dc	16426.46	1.0	NaN

Loading a TSV file from a URL

Reading in a TSV file from your local computer

In this next example, we will continue to work with the same dataset as above, the Callisto Crater Database. Once this database is downloaded, the default file name is ccraters.tsv.

To read in a TSV file from your computer:

Make sure the Excel file is in the same folder as the python file you are working with.
Specify the file name as a string into the read_csv function.

import pandas as pd
df = pd.read_csv("ccraters.tsv", sep="\t", header=1)
df

	Name	Crater Type	Latitude	Longitude	central dome	central pit/ring	rim	rim (predicted)	pedestal	ejecta	rings	Terrace Width	Rimwall Width	Rim Height	Depth	Depth Method	Terrain	Image	Resolution	Note
0	Valhalla	MV	15	56	NaN	NaN	NaN	980.0	NaN	~1800	~3800	NaN	NaN	NaN	NaN	NaN	dc	numerous	1.0	numerous furrows, ridges, central bright floor...
1	Asgard	MV	32	142	NaN	NaN	NaN	660.0	NaN	~1400	~1880	NaN	NaN	NaN	NaN	NaN	dc	20606.21	3.0	numerous furrows, central bright floor deposit...
2	ZS05:232	MV	-5	232	NaN	NaN	NaN	NaN	NaN	NaN	~550	NaN	NaN	NaN	NaN	NaN	dc	20617.21	1.7	furrows
3	ZN34:358	MV	34	358	NaN	NaN	NaN	NaN	NaN	NaN	~350	NaN	NaN	NaN	NaN	NaN	dc	16421.43	1.7	furrows; Galileo target
4	ZS64:350	PA	-64	350	NaN	NaN	NaN	295.0	~450	NaN	NaN	NaN	NaN	NaN	NaN	NaN	dc	16418.06	2.7	oblique view; Galileo target
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
125	ZN11:321	CP	11	321	NaN	8	45.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.88	sh	dc	16421.27	1.8	NaN
126	ZN44:001	CP	44	1	NaN	9	44.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	cm	16426.34	1.0	see Anarr
127	ZN39:262	CP	39	262	NaN	8	44.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	dc	20616.57	1.7	NaN
128	ZN28:016	CP	28	16	NaN	9	40.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	dc	16424.30	1.1	NaN
129	ZN68:339	CP	68	339	NaN	7	38.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	dc	16426.46	1.0	NaN

Loading a TSV file from your computer

More examples are provided in our read_csv guide. Remember, you need to add sep="\t" if your file is in CSV format:

Guide: Creating a DataFrame from a CSV file using Pandas

Pandas Documentation

The full documentation for read_csv is available in the pandas documentation.