locked Long running WSPR tx crashing WSJT; recreating issue and narrowing down parameters #wsjt-x-crashing #macOS


Stuart Ogawa
 

I am performing long running WSPR tx antenna experiments using 4 different antennas (3 sky loops and 1 vertical)
from 160m to 6m; long duration tx experiments and testing defined over 2.5+ years.

3 months ago I purchased an ICOM 7300 ( recent firmware). Connected this to Mac Mini M1 (2020). OS = Big Sur 11.6.5. 16 gigs of RAM. Never use more than 10 gigs of RAM per the Mac Activity Monitor displaying total RAM used (when running WSJT, chrome, and activity monitor...no other apps running).

Successfully configured Mac WSJT v 2.5.4. Duty cycle set to 90%. No band hopping. 6m only for this particular antenna experiment. Began non stop tx.

The next morning I noticed the WSJT stopped transmitting. I subsequently quit the WSJT and restarted. Over the course of .5 months I have been trying to narrow down and resolve this WSJT transmit failure issue.

I run this setup along with 4 Zachtek WSPR transmitters non stop; the 4 Zachtek's have been operational for the past 2.5+ years. I needed more power than the Zachtek (200 milliwatts) to extend and expand my experiments, hence the purchase of the Icom 7300.

I read through all Mac OS transmit fail issues (in this forum) and a sufficient number of the Windows transmit fail issues (in this forum).

What has not been expressed in these threads, which I may have missed, is actually how many hours, days, or weeks people are transmitting with their Mac or Windows OS and the Icom 7300...nonstop...before a crash occurs. For the past 3 months I typically have to quit and restart WSJT every 15 to 18 hours when running nonstop transmitting at 90% duty cycle. 90% = approx. 20 WSPR transmissions/hour.

I decided to dive deep, understand and express the software issue I am encountering. (I have a sw development and big data/analytics background with deep research).

I am wondering if anyone has encountered or tested the issue as documented below:

-------------

What I have done as a result of the recommendations here as well as found on the net:

* Installed a toroid donut (mix type 43) on the USB cable (to trap RF). No resolution.

* I downgraded from WSJT 2.5.4 to 2.5.2; one member in this forum suggested that 2.5.2 was more stable for the Mac OS. No resolution.

Impact - none of these suggestions resolve the issue when long running WSJT with WSPR

---------

I then performed the following experiments to help identify, narrow, and characterize the issue:

Experiments 1a, b, c, d, e, and f May 7 to May 10 (baseline testing)

Configuration: WSJT 2.5.2, Mac Mini M1 (2020). OS = Big Sur 11.6.5. 10 watts TX.
* WSJT tx set to verbose mode = every transmission logged and displayed in WSJT primary operational screen
* quit all apps on mac except WSJT, Chrome Browser, and Mac Activity Monitor running
* fresh start WSJT; no tx logged in primary log screen
* begin non stop transmission at 90% duty cycle 6m WSPR
** note - when I set tx duty cycle to 90%, then this equates to roughly 19 to 21 WSPR transmissions/hr; I use 20 WSPR tx as an average for all my experiments to help narrow down the issue
* note - when WSJT just starts, Mac Activity Monitor displays approx 150 megs of RAM used by WSJT and 11 threads activated (receiving mode); when tx is running, then 13 threads activated
* I ran this aforementioned test N = 6 times


Results and observations:
* As time unfolds, two specific observations at WSJT failure:

First observation
** as WSPR TX increases over time, the amount of RAM used by WSJT, as captured by the Activity Monitor grows
** critical window observation: in all 6 experiments, when WSJT RAM utilization reaches between 180 to 185 megs per the Activity Monitor, WSJT fails to transmit

Second observation
** in all 6 experiments, critical window observation is that between 285 and 310 logged WSPR TX messages in the primary window WSJT fails

------------

Experiment 2
* same configuration as Experiment 1 with one exception (scientific method approach)
** changed the WSJT preferences display to NOT display TX messages in the primary screen. all other parameters exactly the same
* execute test
* WSJT failed some time < 15 hours; I was at work when failure occurred; cannot provide number of TX messages (probably in a file log)

Interesting failure observation

* Unlike Experiment 1 and derivatives, when WSJT failed, the progress bar displaying tx or rx status stopped in the middle of the progression. I have not seen that in previous experiments and observations.

----------

Experiment 3
* Same configuration as Experiment 1
* Executed same steps as 1
* Waited for WSJT failure to occur
* Once the failure occurred, I performed the following test steps:
** did NOT quit application
** selected the Erase button on the primary screen, which consequently deletes all displayed TX WSPR transmission
** selected the TX button

Outcome
* TX began and then I went to work
* Came home and saw the same outcome in Experiment 2...progress bar stopped midway through the progression; < 15 hours

----------

Has anyone done nonstop transmissions like this and encountered similar issues?

Given there is plenty of Mac RAM per the Activity Monitor, the issue feels like a WSJT sw memory management, allocation, stack overflow issue...even when displayed verbose method is turned off.

Your insights appreciated.

thanks.

-stu
wb6yrw


Stuart Ogawa
 

Starting next set of experiments to understand and characterize the WSJT long running tx failure.

Next set of experiments: 4a, b, c, d, e, f

Same configuration as Experiment 1 with the following exception:
* changed TX percentage (duty tx cycle) from 90% to 50%
* quit WSJTx and began non stop transmission
* will perform this exactly for 4a, b, c, d, e and f without any config changes
* will summarize experiment outcome results early next week

---------

Results and observations from experiment 4a

* average number of wspr transmissions/hr = 14 using 50% tx duty cycle
* WSJT ran for 25 hours before failure
* The progress bar continued operating; hence part of the sw was operational; not a full P0 / Sev 0 (app fully down / locked / complete unresponsive) as documented in experiment 2 and 3 outcomes
* 349 WSPR contacts logged before failure
* WSJT RAM usage grew from 150.1 baseline starting point to 242 megs of RAM used by WSJT before failure

------------

* Observation notes from experiment 4a:
** RAM utilization was materially higher (at 50% tx duty cycle) before failure relative to the 180 to 190 megs of RAM used by WSJT at 90% duty cycle before failure; delta approx 55 more megs of RAM WSPR tx log stored data before failure. Hence, more WSPR transmissions logged before failure. This is an important tell tale.
** The reduced TX percentage (from 90% to 50%) of course means that less Tx data being created and stored. One could rationalize that I was able to run longer hours before failure - mean time before failure - (25 hrs as opposed to 14 to 18 hrs) because WSJT at 50% duty cycle had less WSPR Tx log entries for the same time frame
** I was able to log 349 WSPR WSJT contacts at 50% duty cycle (experiment 4a) versus approx 280 to 330 WSPR WSJT (experiments 1 to 3) contacts at 90% duty cycle; not a big delta between these (so far). Yet I wonder if the program garbage collection algo and applied method is correct; will I get more contacts using 50% duty cycle for the same time because the program has latency doing garbage clean up? While it is to early to tell without more experiments, it is looking like if you don't quit WSJT and transmit up to a certain ceiling, call it 350 entries(for now) in the WSJT primary screen, you will get a tx failure.

---------------

Question

Curious if other WSJT users who eventually have a crash/ failure, regardless of operating system, encounter a WSJT failure when they hit the 330 to 379 transmission entries AND without quitting WSJT. If you quit WSJT before 250 tx entries, you may not ever encounter this crash.


Chuck Moore
 

StuLike you I have an IC-7300. Unfortunately triyng to keep it operating when operating FT-8 it is hit or miss. The consensus has been rf is getting into the USB line andcausing havoc with control of the rig. I am not convinced and think it is actuallyrf getting into the audio codecs inside the radio. I have two stations, about 300miles apart and each is equipped with Yaesu radios and the Yaesu SCU-17.  Each operates reliably with full power, no issues, one at 100 watts and the other at 200watts. I can pull the Yaesu rig and SCU-17 in either location and replace them with the IC-7300 and problems start.  At 10 watts out (minimum output) I can make one or two transmissions and boom, Windows plays the cascading downward tones.The screen then displays multiple messages that the audio codecs are not working. Above 10 watts the computers balks soon as the transmitter is keyed.I have tried  multiple fixes to include, toroids (32 material) on power cables, USB cable,coax etc. I tried the Icom recommended Tripp-Lite USB cable with integral ferritecores for RFI suppression. No joy. I can re-insert the Yaesu rigs and SCu-17, and life is back to normal. The Icom was purchased in the fall of 2021 and is currently awaiting my nexttrip to HRO in Virginia where I will place it on consignment. It is the secondIcom radio I purchased in the last decade and both have been disappointing.Please keep us posted on what your solution is.73Chuck WD4HXGOn May 13, 2022, at 6:07 AM, stuartogawa@... wrote:I am performing long running WSPR tx antenna experiments using 4 different antennas (3 sky loops and 1 vertical)from 160m to 6m; long duration tx experiments and testing defined over 2.5+ years.3 months ago I purchased an ICOM 7300 ( recent firmware). Connected this to Mac Mini M1 (2020). OS = Big Sur 11.6.5. 16 gigs of RAM. Never use more than 10 gigs of RAM per the Mac Activity Monitor displaying total RAM used (when running WSJT, chrome, and activity monitor...no other apps running).Successfully configured Mac WSJT v 2.5.4. Duty cycle set to 90%. No band hopping. 6m only for this particular antenna experiment. Began non stop tx.The next morning I noticed the WSJT stopped transmitting. I subsequently quit the WSJT and restarted. Over the course of .5 months I have been trying to narrow down and resolve this WSJT transmit failure issue.I run this setup along with 4 Zachtek WSPR transmitters non stop; the 4 Zachtek's have been operational for the past 2.5+ years. I needed more power than the Zachtek (200 milliwatts) to extend and expand my experiments, hence the purchase of the Icom 7300.I read through all Mac OS transmit fail issues (in this forum) and a sufficient number of the Windows transmit fail issues (in this forum).What has not been expressed in these threads, which I may have missed, is actually how many hours, days, or weeks people are transmitting with their Mac or Windows OS and the Icom 7300...nonstop...before a crash occurs. For the past 3 months I typically have to quit and restart WSJT every 15 to 18 hours when running nonstop transmitting at 90% duty cycle. 90% = approx. 20 WSPR transmissions/hour.I decided to dive deep, understand and express the software issue I am encountering. (I have a sw development and big data/analytics background with deep research). I am wondering if anyone has encountered or tested the issue as documented below:-------------What I have done as a result of the recommendations here as well as found on the net:* Installed a toroid donut (mix type 43) on the USB cable (to trap RF). No resolution.* I downgraded from WSJT 2.5.4 to 2.5.2; one member in this forum suggested that 2.5.2 was more stable for the Mac OS. No resolution.Impact - none of these suggestions resolve the issue when long running WSJT with WSPR---------I then performed the following experiments to help identify, narrow, and characterize the issue:Experiments 1a, b, c, d, e, and f May 7 to May 10 (baseline testing)Configuration: WSJT 2.5.2, Mac Mini M1 (2020). OS = Big Sur 11.6.5. 10 watts TX. * WSJT tx set to verbose mode = every transmission logged and displayed in WSJT primary operational screen* quit all apps on mac except WSJT, Chrome Browser, and Mac Activity Monitor running* fresh start WSJT; no tx logged in primary log screen* begin non stop transmission at 90% duty cycle 6m WSPR** note - when I set tx duty cycle to 90%, then this equates to roughly 19 to 21 WSPR transmissions/hr; I use 20 WSPR tx as an average for all my experiments to help narrow down the issue* note - when WSJT just starts, Mac Activity Monitor displays approx 150 megs of RAM used by WSJT and 11 threads activated (receiving mode); when tx is running, then 13 threads activated* I ran this aforementioned test N = 6 timesResults and observations:* As time unfolds, two specific observations at WSJT failure:First observation** as WSPR TX increases over time, the amount of RAM used by WSJT, as captured by the Activity Monitor grows** critical window observation: in all 6 experiments, when WSJT RAM utilization reaches between 180 to 185 megs per the Activity Monitor, WSJT fails to transmitSecond observation** in all 6 experiments, critical window observation is that between 285 and 310 logged WSPR TX messages in the primary window WSJT fails------------Experiment 2* same configuration as Experiment 1 with one exception (scientific method approach)** changed the WSJT preferences display to NOT display TX messages in the primary screen. all other parameters exactly the same* execute test* WSJT failed some time < 15 hours; I was at work when failure occurred; cannot provide number of TX messages (probably in a file log) Interesting failure observation* Unlike Experiment 1 and derivatives, when WSJT failed, the progress bar displaying tx or rx status stopped in the middle of the progression. I have not seen that in previous experiments and observations.----------Experiment 3* Same configuration as Experiment 1* Executed same steps as 1* Waited for WSJT failure to occur* Once the failure occurred, I performed the following test steps:** did NOT quit application** selected the Erase button on the primary screen, which consequently deletes all displayed TX WSPR transmission** selected the TX buttonOutcome* TX began and then I went to work* Came home and saw the same outcome in Experiment 2...progress bar stopped midway through the progression; < 15 hours----------Has anyone done nonstop transmissions like this and encountered similar issues?Given there is plenty of Mac RAM per the Activity Monitor, the issue feels like a WSJT sw memory management, allocation, stack overflow issue...even when displayed verbose method is turned off.Your insights appreciated.thanks.-stuwb6yrw


Michael Black
 

If you look at my QRZ page I have links to USB adaptors which break the shield.  There are both USB-A and USB-B adaptors.  You can put those on most any USB device (not the hub though) and reduce the RFI running around your setup.  This is because everybody ties the USB shield to pin 4 on the usb plug and also tie the common return on power to chassis -- both of which are the wrong thing to do.
Having your station properly grounded also matters so please describe your shack grounding system. Most common mistake I see is the "rod in the ground outside the shack" which is not tied to the main house ground.  And then what that is ground correctly 2nd most common is the shack PC is then grounded to the house ground instead of lifting the ground pin to ground.
Mike W9MDB





On Saturday, May 14, 2022, 06:25:19 AM CDT, Chuck Moore via groups.io <wd4hxg@...> wrote:


Stuart Ogawa
 

Hi Chuck

Thank you for your shared experience. A couple of observations:

* Experiments 1, 2, and 3 I used 90 watts output. In my pre formalized testing documented above (prior to these experiments) I ran 1 watt for the prior 2 months. Hence, power output for did not make a difference for WSPR. Hence, I will not rule out power related output is causing crashes, but I have 168 hours running non stop at 90 watts output, have similar results as 1 watt output....and the variable that changes is WSJT hitting around 180 +/- megs of RAM utilization before a crash occurs.

* Your experience leads me to think that there could in fact be two different areas of concern. I do not run FT8 at all, and WSPR does not crash until 180 +/- megs of RAM. Perhaps RF is impacting, regardless of power, the USB signal for FT8, WSJT, and the transceiver. I cannot speak to that. I noticed that most users discussed FT8 crashing but no one posting about WSPR crashing.

Michael,

Thank you. I will look at your page. It is inexpensive to add those type of USB adapters. I will buy and implement those as one of the backlog of experiments to run after the Experiment 4 series.

As for ground. I operate my lab in my garage. Behind my garage wall is where the main water main feeds the house with copper pipe. I happened to have spare RG8U, so I water hose clapped the outer RG8U shield directly to the copper pipe. This feed goes into my lab. The other end of the feed line connects to a backplane plate, and I connect my gear (Icom 7300, antenna tuner (LDG AT 1000 Proii), and antenna switch) to this plate.

I think I have a reasonable ground setup, but I am open to other suggestions.

------------

Experiment 4b - results so far.....this is curious

7:30PM PST

* 21hrs, 45 minutes have elapsed since experiment 4b began
* WSJT WSPR still transmitting; 90 watts output, 50% duty cycle
* 308 WSJT WSPR transmissions successfully logged

Observations
* Activity RAM at this juncture is 167.3 megs utilized; 30.9% LESS RAM than expected at this point
* I would have expected approximately 213.7 megs utilized with 308 WSJT WSPR transmissions successfully logged at this point; simple proportion math based on recorded outcomes from prior experiments where I logged WSJT RAM usage and number of WSJT WSPR transmissions until failure.
* I would have normally expected and predicted WSJT failure due to RAM utilization and number of TX WSPR logged in the next 2 to 3 hours based on prior recorded values

No configuration changes have been made between Experiment 4a and 4b...just a quit and restart of WSJT.

No explanation. Very peculiar.

Continuing to let experiment 4b run its course.

------------

Side observation:

* It reached 90+ F at our home here in the Silicon Valley today; garage was and still is about low 80's right now.

* Despite this closed environment temperature and running 90 watts output at 50% duty cycle for past 22+ hours, the Icom 7300 temp bar graph stabilizes at 30% from zero/off temp position. Yes, this device runs pretty cool in the grand scheme of things.

* It appears there are three fan speeds (very low, low and moderate) up to the 30% temp bar range; best guess from sight reading the temp bar graph is 20% along the bar graph low speed occurs; climbing beyond the 20% it appears fan moves to moderate speed. Doesn't really bother me because I just come to the lab to check on things and then walk away after recording values and observations.