This document shows the contents of the GNSS-IR data files distributed by the PSMSL, and how to use them in Python. Here we are using the Pandas data analysis toolbox, and we'll also need to have the matplotlib library available to create the plots.
import pandas as pd
We're going to be using data from Newlyn in south-west England here. First, we read in the data file. This is a zip files contains a csv file, and pandas will read the csv file within the zip seamlessly.
We're ignoring the header lines beginning with a # (although these contain valuable metadata, so do consider them later), and telling python to use the time column as an index for the data returned, and parse the date strings into python datetime objects.
data = pd.read_csv('https://psmsl.org/data/gnssir/data/main/10049.zip', comment='#', index_col='time', parse_dates=True)
Let's take a look at what the data looks like by showing the first few rows.
data.head()
Full information on these fields is available on the PSMSL website, but we'll give some details here.
Each row of the data represents the analysis from one pass of a GNSS satellite over the location. We process the reflections off the surface during this pass, and use this to calculate a height - here called raw_height, given in metres.
But there's some complications. Firstly, you'll notice that there's three entries for the timestep '2014-01-01T00:34:45'. This isn't an error - each GNSS satellite transmits on a number of different frequencies (recorded in the 'signal' column), which we process separately. So you can get multiple heights for a single timestep.
Also, as the satellite passes overhead, the sea won't be stationary, usually it'll either be rising or falling with the tide. This causes errors in the raw height, which we've tried to account for in the adjusted_height field - more on that in a minute.
For convenience, we've also included our best estimate of the astronomical tide at the site.
Let's take a look at one day of data
day = data.loc['2021-07-11']
day['raw_height'].plot(style='.')
That looks a bit scruffy, but you can sort of see the two paths but let's try the adjusted height
day['raw_height'].plot(style='.', legend=True)
day['adjusted_height'].plot(style='.', legend=True)
Much better! We call this adjustment Ḣ - you can investigate it by looking at the difference between the two series. Now let's compare that to the estimates of the tidal cycle.
day['adjusted_height'].plot(marker='.',color='#ff7f0e',linestyle='none', legend=True)
day['fitted_tide'].plot(marker='.',color='#2ca02c',linestyle='none', legend=True)
Next, let's compare this data with some data from a the traditional tide gauge at Newlyn. We'll get the data from IOC Sea Level Monitoring Facility. Note that data from that site is real time data that isn't quality controlled - a better source for data from Newlyn is the British Oceanographic Data Centre, but here I've picked a day when there's no issues with the data.
We need to do a bit of work to convert the tide gauge data into a form we can plot in python. We'll be reading it in using the requests library, and using the standard datetime library to convert the date strings.
import requests
import datetime
url = 'http://ioc-sealevelmonitoring.org/service.php?query=data&code=newl2×tart=2021-07-11×top=2021-07-12&format=json'
r = requests.get(url)
json = r.json()
t = [datetime.datetime.strptime(x['stime'].rstrip(),'%Y-%m-%d %H:%M:%S') for x in json]
h = [x['slevel'] for x in json]
tide_gauge_data = pd.Series(h,index=t)
Now we'll plot the adjusted GNSS-IR against the tide gauge data
day['adjusted_height'].plot(marker='.',linestyle='none',legend=True,label='GNSS-IR')
tide_gauge_data.plot(linestyle='-',legend=True,label='Tide Gauge')
The shape looks correct, but the vertical datum is clearly wrong. We've used the direct data from the GNSS to express it as heights above the reference ellipsoid. But with some detective work, we can compare them directly.
In this case, I happen to know that datum is given to Admiralty Chart Datum (ACD), as is the Newlyn data on the main PSMSL site. And we can get this information from the Newlyn datum page
ACD is 3.899m above RLR RLR is 46.034m above the ellipsoid
So adding both these values to the tide gauge data should put it in the same reference frame as the GNSS-IR data.
day['adjusted_height'].plot(marker='.',linestyle='none',legend=True,label='GNSS-IR')
shifted_tide_gauge_data = tide_gauge_data + 3.899 + 46.034
shifted_tide_gauge_data.plot(linestyle='-',legend=True,label='Tide Gauge')
Much, much better! There still looks to be a slight offset, most likely due to uncertainties about the exact location of the antenna phase centre.
The header of the data files and the site information page contain the estimate of the ellipsoidal height for the GNSS we've used along with other important information - see the metadata explanation page for further information.
As mentioned above, each of the given heights is from a particular frequency from one satellite passing over the site. We've provided the identifier of the satellite (the PRN number), and the channel analysed (the 'signal' variable).
You can also use the 'prn' field to identify which of the GNSS constellations the satellite belonged to, as they're divided into groups of 100 as follows:
The constellations and frequencies available at each site will change over time, as it depends of the equipment installed and the information and data format provided by the supplier.
Let's take a look at what's been available from Newlyn for each constellation by counting the number of observations available each day. First, add a 'constellation' variable to the data frame by dividing 'prn' by 100 and rounding down, and split into the different constellations.
data['constellation'] = data['prn'] // 100
gps = data[data['constellation'] == 0]['adjusted_height']
glonass = data[data['constellation'] == 1]['adjusted_height']
galileo = data[data['constellation'] == 2]['adjusted_height']
beidou = data[data['constellation'] == 3]['adjusted_height']
Next, count and plot the number of observations each week for each constellation
gps_count = gps.resample('W').count()
glonass_count = glonass.resample('W').count()
galileo_count = galileo.resample('W').count()
beidou_count = beidou.resample('W').count()
gps_count.plot(label='GPS', legend=True)
glonass_count.plot(label='GLONASS', legend=True)
galileo_count.plot(label='Galileo', legend=True)
beidou_count.plot(label='BeiDou', legend=True)
So originally almost all data comes from GPS, (apart from short campaigns with other constellations, mainly GLONASS), then GLONASS and Galileo are introduced, and finally BeiDou later. This is typical - the number of observations available, and hence frequency of observations, will increase with time.
We also provide azimuth and elevation variables - these give the average position of the satellite during the pass overhead when it's producing good data. These are included in case you wish to dig further into differences between reflections off certain parts of the water visible to the GNSS, (e.g. inside and outside a harbour).
As an example, here's a histogram of the elevations at Newlyn - we need to explicitly use the matplotlib plotting library here.
import matplotlib.pyplot as plt
elevation_hist = plt.hist(data['elevation'], 100)
So the mean elevation is usually about 12.5 degrees, although you can see a couple of lower peaks in the plot, suggesting there's regular low elevation passes too.
Let's try a polar histogram of the mean azimuth. We need to convert degrees to radians, and make sure it's plotted with north at the top and increasing azimuths moving clockwise.
import math
fig, ax = plt.subplots(subplot_kw={"projection": "polar"})
# But we want north at the top and to go clockwise
ax.set_theta_offset(math.pi/2)
ax.set_theta_direction(-1)
azimuth_histogram = plt.hist(data['azimuth'] / 180.0 * math.pi,72)
So the largest number of passes occur to the northwest