This document demonstrates how we calculate the daily means displayed in the plots on the site pages (e.g. Brest) from the distributed high frequency data. For simplicity, we're using the Pandas data analysis toolbox, although similar code could be written without it.
import pandas as pd
We'll use Brest as an example - this code loads the data into a data frame, then we'll look at the first few rows.
data = pd.read_csv('https://psmsl.org/data/gnssir/data/main/10175.zip', comment='#', index_col='time', parse_dates=True)
data.head()
raw_height | adjusted_height | fitted_tide | prn | signal | azimuth | elevation | |
---|---|---|---|---|---|---|---|
time | |||||||
2013-01-01 00:10:00 | 48.834 | 48.801 | 48.654 | 121 | 1 | 162.049 | 14.870 |
2013-01-01 00:10:00 | 48.912 | 48.879 | 48.654 | 121 | 2 | 162.049 | 14.870 |
2013-01-01 00:15:25 | 48.825 | 48.886 | 48.663 | 15 | 1 | 163.440 | 11.382 |
2013-01-01 00:15:25 | 48.881 | 48.943 | 48.663 | 15 | 2 | 163.440 | 11.382 |
2013-01-01 00:24:02 | 48.837 | 49.019 | 48.690 | 22 | 1 | 260.133 | 11.550 |
See the general example notebook for an overview of what the columns mean. We're going to create daily averages by taking a simple mean of all observations on each day. But to prevent that being biased by how the observations are distributed over the tidal cycle, we're going to do it on residual data (data with the tide removed). We'll also used the adjusted height variable, which accounts for vertical movement of the water as the satellite passes overhead. You'll notice this operation will result in values around zero - don't worry, we'll add the mean sea level value back in later.
data['residual'] = data['adjusted_height'] - data['fitted_tide']
data['residual'].head()
time 2013-01-01 00:10:00 0.147 2013-01-01 00:10:00 0.225 2013-01-01 00:15:25 0.223 2013-01-01 00:15:25 0.280 2013-01-01 00:24:02 0.329 Name: residual, dtype: float64
Before creating the average, we're going to need to separate the data into the different signals used. There's uncertainty over the exact location measurements are taken from in the GNSS sensor (the antenna phase centre) and this can location can be different for signals of different frequencies.
We'll create an extra variable dictating which GNSS constellation the constellation belongs to (0=GPS, 1=GLONASS, 2=Galileo, 3=BeiDou - remember the 'prn' variable gives us this information)
data['constellation'] = (data['prn'] // 100)
Now, we want to group the variables by constallation and signal - this will give a separate data frame for each combination of constellation and signal. We'll display how many cases we've got for each case
grouper = data.groupby(['constellation','signal'])
grouper['residual'].count()
constellation signal 0 1 32393 2 59248 5 15801 1 1 44970 2 50893 2 1 12823 5 17899 6 6019 7 15131 8 5780 3 2 4194 6 5658 7 657 Name: residual, dtype: int64
So lots of GPS and GLONASS data, a small amount from Galileo, and a tiny bit of BeiDou. Next we calculate the daily averages, considering each constellation/signal group separately
We'll do some rearranging - we'll convert this into a dictionary where the keys are the (constellation, signal) pairs, and the values are the residual values for this frequency. We'll show the top of the GPS L2 channel, that's (0,2)
grouped_data = {x:y['residual'] for x,y in grouper}
grouped_data[(0,2)].head()
time 2013-01-01 00:15:25 0.280 2013-01-01 00:24:02 0.252 2013-01-01 00:34:52 0.305 2013-01-01 03:06:17 0.090 2013-01-01 04:13:27 0.269 Name: residual, dtype: float64
Now, we'll take the daily mean of each frequency in turn - again we'll display GPS L2 as an example. The groupby function just splits the data into daily chunks and takes the mean of each chunk.
grouped_daily_means = {x:grouped_data[x].groupby(pd.Grouper(freq='1D')).mean() for x in grouped_data}
grouped_daily_means[(0,2)]
time 2013-01-01 0.072087 2013-01-02 -0.089381 2013-01-03 -0.145706 2013-01-04 -0.155955 2013-01-05 -0.209450 ... 2021-08-07 0.193375 2021-08-08 0.093263 2021-08-09 0.098267 2021-08-10 0.001000 2021-08-11 -0.062250 Freq: D, Name: residual, Length: 3145, dtype: float64
Time for some more rearranging - lets convert this into one data frame, with each column representing one of the channels
daily_means = pd.DataFrame(grouped_daily_means)
daily_means.head()
0 | 1 | 2 | 3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 5 | 1 | 2 | 1 | 5 | 6 | 7 | 8 | 2 | 6 | 7 | |
time | |||||||||||||
2013-01-01 | 0.047077 | 0.072087 | NaN | 0.103316 | 0.059714 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2013-01-02 | -0.123200 | -0.089381 | NaN | -0.078947 | -0.108478 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2013-01-03 | -0.163111 | -0.145706 | NaN | -0.078048 | -0.086765 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2013-01-04 | -0.155438 | -0.155955 | NaN | -0.128389 | -0.144182 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2013-01-05 | -0.190313 | -0.209450 | NaN | -0.183200 | -0.163000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
The display of the column headers is a bit unclear there, but never mind. You'll also see that at the start of the record, we only have data from the main two channels of GPS and GLONASS. Next we want to account for the slight differences in antenna phase centre for different channels by comparing them to a reference channel. We've used GPS L2 to do this, as we know it's present in all the records we've processed.
daily_above_reference = daily_means.sub(daily_means[(0,2)],axis=0)
daily_above_reference.head()
0 | 1 | 2 | 3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 5 | 1 | 2 | 1 | 5 | 6 | 7 | 8 | 2 | 6 | 7 | |
time | |||||||||||||
2013-01-01 | -0.025010 | 0.0 | NaN | 0.031229 | -0.012373 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2013-01-02 | -0.033819 | 0.0 | NaN | 0.010434 | -0.019097 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2013-01-03 | -0.017405 | 0.0 | NaN | 0.067658 | 0.058941 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2013-01-04 | 0.000517 | 0.0 | NaN | 0.027566 | 0.011773 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2013-01-05 | 0.019137 | 0.0 | NaN | 0.026250 | 0.046450 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
And we just take the average of all the rows in this table to create an average offset for each channel in the table. Obviously it'll be zero for GPS L2.
channel_offset = daily_above_reference.mean()
channel_offset
0 1 -0.019434 2 0.000000 5 -0.003012 1 1 -0.001982 2 0.002267 2 1 -0.023632 5 -0.006386 6 0.007127 7 -0.003123 8 0.000877 3 2 -0.029690 6 0.006919 7 0.006083 dtype: float64
Subtract these values from the relevant column to put all the values in approximately the same reference frame as the GPS L2 data
adjusted_daily_means = daily_means.sub(channel_offset)
adjusted_daily_means.head()
0 | 1 | 2 | 3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 5 | 1 | 2 | 1 | 5 | 6 | 7 | 8 | 2 | 6 | 7 | |
time | |||||||||||||
2013-01-01 | 0.066511 | 0.072087 | NaN | 0.105298 | 0.057447 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2013-01-02 | -0.103766 | -0.089381 | NaN | -0.076965 | -0.110746 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2013-01-03 | -0.143677 | -0.145706 | NaN | -0.076065 | -0.089032 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2013-01-04 | -0.136003 | -0.155955 | NaN | -0.126407 | -0.146449 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2013-01-05 | -0.170878 | -0.209450 | NaN | -0.181218 | -0.165267 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
And finally we can calculate our daily means!
gnss_daily_anomaly = adjusted_daily_means.mean(axis=1)
gnss_daily_anomaly.head()
time 2013-01-01 0.075336 2013-01-02 -0.095214 2013-01-03 -0.113620 2013-01-04 -0.141203 2013-01-05 -0.181703 Freq: D, dtype: float64
We want to get it in the reference frame of the original data - this information is actually stored in the original file header, but I've just copied the values here. It works like this:
ellipsoidal_height = 67.899
reflector_height = 16.802
gnss_daily_mean = gnss_daily_anomaly + ellipsoidal_height - reflector_height
gnss_daily_mean.plot()
<AxesSubplot:xlabel='time'>
Finally, let's compare that with some tide gauge data. The best place to obtain quality controlled tide gauge data from Brest is from the supplying authority, SHOM, but UHLSC have an ERDDAP server we can use to get daily means without any extra effort.
We skip the second line of data using the parameter skiprows=[1] (as python uses zero-indexing) as this line contains the units of the parameters. The sea level is in millimetres, but we'll deal with that in a moment.
uhslc_url = 'https://uhslc.soest.hawaii.edu/erddap/tabledap/global_daily_fast.csv?' \
'sea_level%2Ctime&time%3E=2013-01-01T00%3A00%3A00Z&time%3C=2022-01-01T00%3A00%3A00Z&uhslc_id=822'
tide_gauge_data_mm = pd.read_csv(uhslc_url, parse_dates=True, skiprows=[1], index_col='time')
tide_gauge_data_mm.head()
sea_level | |
---|---|
time | |
2013-01-01 12:00:00+00:00 | NaN |
2013-01-02 12:00:00+00:00 | NaN |
2013-01-03 12:00:00+00:00 | 4042.0 |
2013-01-04 12:00:00+00:00 | 4006.0 |
2013-01-05 12:00:00+00:00 | 3988.0 |
We need to convert that from millimetres to metres, and then apply a similar correction as in the general example notebook to put it in the same reference frame as the GNSS-IR data, then we can plot them both.
In the case of Brest, I can use the information on the PSMSL datum page from Brest and the knowledge that the UHSLC record uses the HZ 1996 datum there to transform the data to approximately the same reference frame as the GNSS-IR data.
tide_gauge_data = tide_gauge_data_mm / 1000 + 2.959 + 44.094
tide_gauge_data.plot(legend=True, label='Tide Gauge', style='C1-')
gnss_daily_mean.plot(legend=True, label='GNSS-IR', style='C0-')
<AxesSubplot:xlabel='time'>
There's still a slight offset, perhaps due to issues around the antenna phase centre of GPS L2 itself, and also the time we've referred the sea level and GNSS-IR data to the GRS80 ellipsoid is slightly different, but you can see the records match well.