Joataman Webpages

Extra-Galactic Hydrogen Line Observations

A System for Observing Neutral Hydrogen Line Emissions in Nearby Galaxies


The results are broken down into a number of smaller sub-activities which are listed in chronological order.  A jump link table is provided below...

Initial Activities

The first tasks were to build up the 21 cm receiver chain and do a 'first light' drift-scan with the antenna pointing such that it crosses the galactic equator in the direction near to the galactic centre.

The data acquisition software used for the first light run (CFRAD2.exe) was provided by Michiel Klaassen.   This console application requires pre-configuration of the dongle via the SDR# application.  While running it collects IQ data from the dongle and every 5 minutes performs a complex FFT and then writes the result out to a text file.

The result from near to the Galactic centre was encouraging...

...where the green line is the CFRAD2.exe result and the blue line is the result from the Leiden/Argentine/Bonn (LAB) survey website.  The similarity is convincing proof that the system is working at least for 'local' (within our Galaxy) hydrogen line signals.

Developing Custom Applications

In order to keep the original data from observations to allow various experiments in the analysis phase, it was decided to use the original 'rtlsdr.exe' console application in the data acquisition role (saves data in raw IQ data format to a file) and write my own analysis software to process those IQ data files 'off-line' after the observation.  The large IQ data files from the longer observation times (up to 1 hour - corresponding to a file size of over 17 GB) required the original 'rtl_sdr.exe' code to be modified to use 64-bit variables.

A data acquisition GUI was written which launches the modified 'rtl_sdr'exe' application to capture data.

Detecting the Magellanic Clouds

While receiving HI signals from within our own galaxy is interesting, of more interest to the author is the detection of extra-galactic HI signals.  While this has been done for northern skies (see F1EHN's excellent description of reception of HI signals from M31 and M33), for NRARAO here in the southern hemisphere a more suitable (and in NRARAO's case - visible) target are the Magellanic Clouds (see details under the 'External Galaxies' tab).

First Try for the Large Magellanic Cloud

The first try for the Large Magellanic Cloud produced this result (the data acquisition run started a little late as the PC crashed due to insufficient hard disk space and so the LMC was moving out of the beam at the beginning of the run)...

This graphical display part of the analysis GUI was not completed at this stage, so results were written out to a .CSV file and simply plotted in Open Office Calc.   The corresponding LAB survey result for the 'late' data RA run gives a rough confirmation...

Acquiring a longer time 'dark frame' allowed a reduction in noise in the result...

...which shows a nice similarity to the LAB survey result above and confirmation of the result.

LMC Results Obtained with Analysis GUI

Data analysis GUI software was completed and the data was processed via that application (no need for using OO Calc) from this point onwards.

 The results of the first and second attempts at receiving HI signals from the LMC were obtained using a coarse setting of 64 FFT bins...

...which once again corresponds nicely to the LAB survey results...

The observation was repeated two days later (this one on time) and gave this result...

...which again looks like the LAB survey for centre of the LMC...

First Try for the Small Magellanic Cloud

The first attempt at detecting the Small Magellanic Cloud produced this result...

..which is similar to the LAB results (but notice how the 'dark frame' subtraction has reduced the foreground intra-galactic HI signal near 0 km/s by about 10 dB)...

Comparison Between 8-bit and 1-bit Data Results

The collection of raw IQ data makes for very large files - approximately 17.3 GB per hour.  To see the effect of compressing that data on the results, data was first processed using the original 8-bit IQ values, then processed again with data in 1-bit form (data taking two values: +1 and -1).   The 1-bit conversion was done by replacing negative values in the 8-bit range (-128 to -1) with -1, and replacing positive 8-bit values (0 to +127) with +1.   The results are shown below; reading left to right - LAB survey result, 8-bit result, 1-bit result...

Comparison of 8-bit and 1-bit Results

As can be seen some noise is added to the result when using 1-bit data, but the essential result is preserved.  Taking into account that the 1-bit data can be packed into the bits of an unsigned 8-bit byte, and thereby reducing the size of the saved data files by a factor of 8, consideration should be given as to whether the small increase in noise is a good trade-off.

Note: separate 'dark frame' files needed to be created for the 8-bit and 1-bit cases.

Improvements in 'Dark Frame' Subtraction and Results Graph Corrections

As mentioned previously, the employment of 'dark frame subtraction' greatly assists in the extraction of weak HI signals  - especially extra-galactic HI signals.

Some further details about the method used for this project at NRARAO can be found here on the 'Data Analysis' page ("Selection of the 'Dark Frame' Method ").

Note that the 'subtraction' term is not used in the mathematical sense (the process is actually a mathematical division), but refers to the notion of the removal of undesired responses.

A comment was also made that the weak HI line signals are like pimples on the back of an elephant and further hidden by the gross amplitude response ripple of the RTL across the passband as seen on the right.  It can be observed that the HI signals in this raw data plot are so small as to be un-detectable.

The raw data is the combination of system noise and sky noise, shaped by the system frequency and amplitude response.  We want to cancel out the system responses and effects of system noise - to reveal the obscured HI line signals.

It is only by dividing this raw data by separate set of raw data taken from another part of the sky with a low level of hydrogen signals (the 'dark frame' data set) that the signals from the first raw data file can be extracted.  If the two raw data files were exactly the same shape then the division would produce a flat horizontal line across the graph.  If, however, the first raw data set has signals which are not present in the second 'dark frame' raw data set, they would emerge and appear in the results graph.

It can readily be seen that the HI signals appearing in the 'dark frame' corrected data are the result of very small variations between two raw data files with very large signals.  Therefore, any perturbations in either file can degrade the quality of the 'dark frame' subtraction process.  In an ideal world the two files should be identical except for the presence of the target signals in the target raw data file.  Unfortunately, this is not the case in practice and possible errors creep in from a range of sources...

Different combinations of these effects cause a 'scoliosis' effect on the graph and some means to 'straighten' out the graph is useful...

Cropping the Left and Right Extremities

The first step to straighten (linearise) the graph is to crop off the top and bottom 12.5% of the spectrum as these areas are well down on the anti-aliasing filter skirt and are next to impossible to rehabilitate.

Getting a Good Dark Frame

The next step is to obtain a good 'dark frame'.  As mentioned previously, if the target was HI line signals from within our own galaxy, the best 'dark frame' will be obtained by collecting data from an area of the sky which has a low level of HI signals - otherwise there will be a large cancellation of the sought after HI line signals from the processing.  This process is complicated for 'local' (intra-galactic) HI line signals because they are present at every possible pointing - it is only possible to find a position in the sky where they are at a minimum.   Generally speaking this is roughly in the direction of the Galactic Poles.

For this project, where extra-galactic signals from the Magellanic Clouds are the target, quite conveniently both Clouds are moving away from us at such a rate as to doppler-shift their band of HI line signals well away from the signals produced by the 'local' HI line radiation within our own galaxy.  This simplifies the selection of the area of sky, as we actually welcome cancellation of the 'local' intra-galactic HI line signals.  Even so, my first choice was not a particularly good one as discussed in the next paragraph...

In order to maximise the probability of having the same RFI in both the target data and the dark frame data (thereby maximising the cancellation of those unwanted signals), both sets of data were taken with the same fixed antenna azimuth and elevation settings - i.e., the dish was not moved between data acquisition sessions.  I used RadioEyes to locate a position in the sky at the same declination (-69 degrees) as the LMC (my principal target cloud) and identified RA = 22H as a 'quiet' spot in HI line signals at that declination.   This 'dark frame' worked well for the first several days of data collection (see 'Results'), but using it for subsequent data runs a week or so later produced problems.  What followed was a week or so of trying to track down the discrepancy between the points in time - I had done some changes to the analysis software in the interim, so I thought I had introduced a 'bug' - and eventually realised that the first runs were done on overcast days where the temperature of the 'dark file' and target data runs was essentially the same.   On subsequent days the weather was sunny with a large difference in heat energy impinging on the RF front end mounted on the antenna (early morning shade versus late afternoon full sun).

The times of the acquisition of dark files, therefore, needed to be as close as possible to the target data acquisition to minimise temperature variations.  Because of the extended nature of the Clouds, this meant that about 2 hours after the passage of the LMC through the antenna beam was a close as it was possible to get in time.   An RA of 7:20:00 was chosen (DEC -69 degrees), which had the added benefit of having a similar level of intra-galactic ('local') HI signals as in the direction of the LMC and so these undesirable signals would tend to cancel out.

It was thought initially that every target LMC data run would have to accompanied with a trailing 'dark frame' data run, but (at least for the 3 days so far analysed) a single 'dark frame' data run was found to be useful for a number of different runs on different days.

This convenience may not hold for the future.

Flattening the Result Graph

A number of steps have been taken in order to correct the results graph which is distorted by the above effects.  One effect noted was that different signal levels into the RTL dongle produced both a variable 'tilt' on the result, and a different 'ripple' across the passband.  This, I presume, is due to differing combinations of noise energy from sources subjected to different passband shapes dependent on where in the RF chain they insert themselves.   To minimise variations between observations it was decided to set the dongle gain at maximum (49.6 dB) and enable the 'digital AGC' option.  A number of tests done indicated that observation to observation variations were much reduced without any discernible loss in sensitivity.

Note: I observed some unexplained gain instability (in 'Auto' presenting itself by a cyclic variation in level) when the dongle was fed high levels of signal in order to get the maximum 'swing of bits' in the recorded data.   For manual gain settings this presented as a sudden drop in gain of some 20 dB or so after 5 or 10 minutes of operation.   Running the dongle at 49.6 dB with digital AGC enabled and adjusting the signal into the dongle such that about only 5 bits were being exercised in the recorded data seemed to eliminate this problem.  This is just an observation and so is not a statement of fact and the effect may be entirely local.

The result graphs for this gain configuration were found to have minimal 'tilt', but a residual near-parabolic 'scoliosis'. 

Controls were introduced to correct for this in the analysis software as shown on the right...

In any situation where data has been 'processed' careful checks must be made to try and ensure results haven't been 'manufactured'.

A valid way of attempting this check is to compare results with the known science (checking against results from professional radio astronomers) combined with checking for repeatability.

For the set of LMC results the results should look similar to this result obtained from the LAB survey on-line data base (using the a beamwidth of 6 degrees for the NRARAO 3m TVRO dish @ 21 cm)...

Result for LMC from LAB Survey Using Beamwidth = 6 Degrees

Three days results were analysed and processed by display linearisation controls...

Results from Three Days for the LMC - 27th, 29th, 30th April 2016

 These results demonstrate a similarity with the LAB survey results (especially when the different aspect ratios of the two sets of graphs are taken into account - see below) and a high degree of repeatability between the three separate days results.

LMC Result for 30th April with LAB Survey Results Superimposed

I am fairly happy with these results - especially the day-to-day repeatability.

Improving S/N by Integrating Daily Runs

In order to gain a better S/N, up to this point the data has been analysed using 64 FFT bins, resulting in about 8 km/s bin resolution.  The plot is the summation of many FFT results.  For 64 bins, 64 data samples are used, so for a data run of 5.37 x 109 samples, nearly 84 million FFT results are summed - provided almost 40 dB S/N improvement over a single FFT result.

It should be noted that the antenna system is a fixed position drift-scan configuration.  Only 40 minutes of data can be collected each day as the target passes through the antenna beamwidth.  Other systems which have tracking ability can sit right on the nose of the target for perhaps several hours and so get a better S/N in one data run than is possible here.

Although 64 FFT bins provide a good improvement to S/N, a problem arises if there are RFI spikes in the result which have not been cancelled out by the 'dark frame'.  The energy of such an RFI spike is spread too far across the velocity scale because of the coarse resolution and so cannot be easily filtered out without distorting the true shape of the plot.

To improve the velocity resolution, the number of bins can be increased.  For the exercise here described, the number of FFT bins has been increased to 512 - a factor of 8 w.r.t. the original 64 bins.  However, this results in the number of FFT results summed being reduced by a factor of 8, with the resultant loss in S/N.

Integrating Daily Runs

To recover the loss in S/N caused by using 512 FFT bins instead of 64 FFT bins, results from successive days can be summed.  This is because the day-to-day variation in doppler shift for the LMC is small and so successive days' velocity results line up with each other. This may not be the case for other targets (e.g., the SMC shows much greater variation in VLSR - hence doppler shift - over time).

The plot of summing 7 days' results is shown on the right which largely regains the loss in S/N - but now has the resolution to see narrow RFI spikes.

  Although many RFI spikes have been cancelled out, it is not a perfect process, as can be seen.

Effect of Mismatch in RFI Patterns

The plot shows the effect of a mismatch between the RFI pattern in the 'dark frame' and the target data.  The negative RFI spike in the bottom left-hand corner is the result of RFI which is present in the 'dark frame', but not in the target data.  Consequently the 'dark frame' applies a cancellation to a non-existent RFI spike in the target data - causing a negative spike in the results.

The other RFI spike near the peak of the plot is the result of an RFI spike in the target data not matched by a spike in the 'dark frame', and so it is not cancelled out.

Some means of mitigating against these effects would improve the quality of the result plots.  Some attempts to pursue this path are described in the next section.

Mitigating Against the Effects of RFI

When it comes to radio astronomy, RFI is a plague on all houses.  It is a particular difficulty here at NRARAO as the antenna is up in the air about 3.5 m from the ground, with the observatory site being near to the edge of a small cliff (70 m high) with a clear take-off towards a city of 5 million.

When RFI spikes remain after 'dark frame' correction, some means of removing them from the plot result is useful.  A simple moving average filter is not appropriate, as that process, while reducing the peak amplitude of the spike, also spreads the energy across adjacent bins - turning a sharp spike into a rectangular 'pulse' shape as shown below...

...which introduces artifacts into the plot.

Fortunately, RFI spikes are what are termed 'out-liers' - that is, they are large deviations from the value of their adjacent FFT bin neighbours.  A median filter is a much better filter for this type of interference - each bin is sorted in a group of bins including itself and a number of adjacent bins (the number made selectable), and the value of the mid-way bin is placed in the current bin being processed.   This median filtering process is very effective in removing RFI spikes in the velocity plot as shown below, which is median filtered, but not smoothed...

...and which, because the RFI spikes have been removed, can now be further processed by a moving average filter...

...which, finally, shows that the 512 FFT bin results has more detail than the 64 FFT bin result, but still has a reasonably smooth profile.

This overall result (integration of successive days results, RFI excision and smoothing) will be helpful for the next, more difficult target - external galaxies at further distances than the Magellanic Clouds.

Matching Plots with the LAB Survey Results

One of the nice things about the above activity is that plots obtained by observation can be verified against the results from an extensive professional radio astronomy survey of neutral hydrogen line emissions. There is an online form which can be filled in with target coordinates, antenna beamwidth and plot velocity limits.

For example, here is a composite plot which superimposes the result (in blue) from the 7-day integration of data (as described in the previous section) with the LAB survey results (in green) using a 6° beamwidth...

...which shows a fairly good correlation between the two.

It can be seen that the match is not perfect, with some small deviations from the LAB Survey result.  This is to be expected as even small changes in the position coordinates (say 1°) entered into the form results in large changes in the returned profile.  This is because the LMC is an extended object (actually larger in angular size than the 6° beamwidth of the antenna) and the resultant profile is the summation of small, separate regions with different velocities.  Moving the 6° beamwidth antenna by just 1° causes many regions to drop out and other regions to appear in the beamwidth.  This would not be the case for a more distant object where the angular spread of the object does not fill the beamwidth of the antenna.

The plot variation with a small (1°) change in declination is shown below...

...which indicates that attempting to get a perfect match for observations of the LMC using a 6° beamwidth antenna with the LAB Survey results is not a sensible endeavour.  To further illustrate the point, the right-most plot shows what the LAB survey returns for a 0.6° beamwidth antenna - meaning that if the 3 m TVRO dish doesn't have a 6° beamwidth as per theory, further deviations from the LAB Survey could be expected.

All in all, I am more than satisfied with the correlation between my observed results and the LAB survey results.

Negative Results with NGC300

After a week or so collecting NGC300 data and integrating there is no verifiable result.  As forewarned in 'External Galaxies', this target appears to be beyond the capabilities of the system described here.  As noted there...

"Looking at the LAB Survey results for a 6° beamwidth, the comparison shows NGC300 is nearly 9 dB down on the brightness temperature of M33...

...and while the radial velocities are virtually mirror-images of each other, NGC300 is closer to the 0 km/s rest velocity - further complicating the extraction of the NGC300 signal from the foreground 'Milky Way' HI signals."

A combination of a weak source, a less-than-optimum system and local RFI are likely the cause of the negative result.

Negative Results with the Magellanic Stream

As mentioned in 'External Galaxies', there is a large 'stream' of HI high velocity clouds (HVCs) associated with the Small and Large Magellanic Clouds which extends over a wide arc in the sky as shown by the green curve below...

...note the SMC and LMC positions at the right-hand end of the curve.

The 3 m TVRO dish was positioned and the data acquisition timed to observe the position in the stream as marked by the purple dot.

A positive result was not expected due to the weak signal, but it was considered to be worth a try.

The selected position in the stream is shown in LAB Survey results for simulated 30 m and 3 m dish diameters...

...while focussing on the range from -300 km/s to -100 km/s shows a peak temperature of about 0.12 K using a 3 m dish...

...which indicates it is much below the results from the LMC... a factor of 17 dB.

So the negative result is hardly a surprise.

Due to the high level of RFI at this location (even at 1.4 GHz), searching for weak signals is not practical.

Improving Agreement with LAB Survey Results (LMC)

By integrating 4 'dark files' to obtain a single, smoother 'dark file' and using this for integrating 10 days of LMC data (spread over an elapsed period of 30 days) a further reduction in noise is achieved...

...resulting a better agreement with the LAB Survey result.

Another good verification of the LMC results.

Improving Agreement with LAB Survey Results (SMC)

By integrating 3 hours of off-target data to obtain a better 'dark file', and using this for an integration of 7 consecutive days of SMC data, a better match with the LAB Survey results is achieved...

...and provides a nice verification of the detection of the HI signals from the SMC.

Overall Conclusion

The detection of extra-galactic HI signals from the Magellanic Clouds (LMC and SMC) at a distance of 160,000 ly is feasible for the amateur radio astronomer.

Detection success of the Magellanic Clouds has been achieved with a 3 metre diameter re-purposed TVRO mesh dish (without tracking), combined with a cheap ($20) DVB-T dongle.

Detections of a much weaker targets (NGC300 and HVCs in the Magellanic Stream) have been unsuccessful - consistent with expectations.