Exspans Systems Inc Logo home
 
Forum
Sign up Calendar Latest Topics
 
 
 


Reply
  Author   Comment   Page 1 of 3      1   2   3   Next
rakesh

Member
Registered:
Posts: 52
Reply with quote  #1 
Hi,

Today i encountered a situation where Automan product itself crashed and none of the Automan scripts working for entire day. The task failed with CPU timeout(S322).

IEF196I IEF472I AUTOMAN AUTOMAN - COMPLETION CODE - SYSTEM=322

To overcome this situation, i started the task with TIME=1400.

But i got a doubt how other shops are managing if Automan product itself crashes for some reason.

Thank you
0
Zamin

Avatar / Picture

Member
Registered:
Posts: 67
Reply with quote  #2 
I am guessing that you did not read the setup manual, or use the example JCL.
0
syster

Junior Member
Registered:
Posts: 3
Reply with quote  #3 
The time parameter is pretty basic and one most system programmers learn about the first time they make the mistake of not including it. Now you know, you will not make that mistake again. When AutoMan detects most x22 abends it tries to shut everything down as quickly as possible. Other than 322 and 522 due to failure to include the time parameter, it is susceptible to A22 as a result of a force cancel. It will not respond to a 222. What other crashes, that are not affecting the entire system, is AutoMan susceptible to?
0
Grazillda

Avatar / Picture

Member
Registered:
Posts: 48
Reply with quote  #4 
A few years ago I asked why they did not modify the address space control internally to stop timeouts and so not need the TIME parameter. I was told then that there was always someone who needed an exception, so they would have needed some sort of override parameter and it was easier to use the existing facility than it was to provide an override. It is easy enough to issue a STOP command when you want it down, so they could modify it internally as far as I am concerned. The only advantage I can see to using the TIME parameter is that it gives a way of creating and testing serious failure. It is probably a bit of a non-issue though, because this kind of thing is caught in the sandbox while testing, a long time before it ever gets into production, where it might affect something important.
0
rakesh

Member
Registered:
Posts: 52
Reply with quote  #5 
Not sure of what other scenarios can cause Automan to crash because i am using it since just few months. How about others here? did you see it crash anytime? If so, how are you managing those situations?
0
Zamin

Avatar / Picture

Member
Registered:
Posts: 67
Reply with quote  #6 
I have had situations with new releases, but not crashes as such. One recent one was a failure to clean up during an error situation, while it was coming up. But that was fixed. Have you tried cancelling it? I think the only way to make it fall over is failure to use the TIME parameter, as in your case, or mischief by forcibly cancelling it. When used properly it seems rock solid to me.

BTW, Grazillda I agree with the decision to use the TIME parameter. It exists, it gives a way of testing exactly the scenario of a catastrophic failure and it bypasses the need for more complex control block mangling.
0
zboxassist

Member
Registered:
Posts: 89
Reply with quote  #7 
AutoMan could be enhanced to enable the ARM service to automatically restart AUTOMAN.


__________________
zboxassist
0
Grazillda

Avatar / Picture

Member
Registered:
Posts: 48
Reply with quote  #8 
I FORCEd AutoMan once. I think it was V2.9. I never read the messages manual about expansion messages, so ignored the red highlighted text about not automating expansion messages, and it went into some sort of loop. I should have read the messages manual first before trying to send an email when one of those occurred. It went down and came back up OK. I have not had to do that since. What ever the problem was that caused it to do the expansion was fixed in the next revision. The current versions seem to be very solid to me.
0
automan

Avatar / Picture

Moderator
Registered:
Posts: 136
Reply with quote  #9 
The Automatic Restart Manager is a good idea. There are not a lot of ways to make AutoMan halt. I know someone, who shall remain nameless, who inadvertently used the global procid variable from VARDEF PROCNAME=&pname in a stoP statement in a scheduled item, and also had the shutdown verification message automated, so it it came down when they were not expecting it. That was a user mistake, which was soon fixed. The user mistake mentioned in the first comment was also easily fixed. There may still be circumstances under which it needs to be forcibly cancelled. There may be user mistakes, such as writing combinations of command lists and message intercepts, that could induce loops that could require forcible cancellation, but for the most part a simple stoP command will be sufficient. It is not possible to provide a really powerful tool and protect the user from abusing, or misusing it. AutoMan is well protected by STAE type processes, so even if a malicious coder overwrote storage, it would fail fairly gracefully. It is advisable to fully test scenarios in a sandbox before placing them in production. Not doing so is also a user mistake. If someone does find a way to cause it to fail, that is not a simple usage error, you can be sure it will be fixed quite quickly.
0
Zamin

Avatar / Picture

Member
Registered:
Posts: 67
Reply with quote  #10 
Has anyone tried executing an assembler program using the exec command, with bad code in it? I will try later today to see if that will do it.
0
syster

Junior Member
Registered:
Posts: 3
Reply with quote  #11 
Oh no. I can see that there is going to be a whole lot of creative mischief going on.
0
rakesh

Member
Registered:
Posts: 52
Reply with quote  #12 
Today Automan on my server crashed for some reason and i dont see any traces of the reason for crash in logs. But when i try to start it, i am getting below error.

AUT0064S already active
IEF404I AUTOMAN - ENDED - TIME=17.47.18
$HASP395 AUTOMAN ENDED

I tried second restart of the task and it is running fine now.

What might be going wrong here?
0
Grazillda

Avatar / Picture

Member
Registered:
Posts: 48
Reply with quote  #13 
That message means you are trying to run 2 identical copies at the same time, so the second one fails initialization and ends. You have to make sure that the first copy is fully terminated before trying to start it again.

If there are no traces in the log, what are the symptoms? How do you know it "crashed"? Do you mean the same thing as everyone else when you say "crash"?

I have tried all sorts of things to try to make AutoMan go away unceremoniously. It seems to handle errors and recover fine. From the appears of what you have posted here, it just looks like you forgot you had a copy active and tried to restart it. So the error seems to be BKC.
0
Zamin

Avatar / Picture

Member
Registered:
Posts: 67
Reply with quote  #14 
Grazillda, no need to be rude. He is obviously very junior and just learning, so none of this BKC stuff.

If there was an incident there is no possibility there is no log trace. If you are confident there was a crash you should post the log. Someone here will see something.
0
rakesh

Member
Registered:
Posts: 52
Reply with quote  #15 
I found the task was down when i was searching for something else.

I see below errors in log:

AMAN955E AMCMCSCN Abend RC=840C4000 Reason=00000011
IEF196I AMAN955E AMCMCSCN Abend RC=840C4000 Reason=00000011
AMAN995I in SCANREPL PSW=078D2000 9E68877C
IEF196I AMAN995I in SCANREPL PSW=078D2000 9E68877C
AMAN R0=000012B6 1E620F3F 1E65F0BC 1E6206BC
IEF196I AMAN R0=000012B6 1E620F3F 1E65F0BC 1E6206BC
AR0=00000000 00000000 00000000 00000000
IEF196I AR0=00000000 00000000 00000000 00000000
AMAN R4=1DFF214E 00000001 1E620616 1E620EFE
IEF196I AMAN R4=1DFF214E 00000001 1E620616 1E620EFE
AR4=01010037 01010037 01010037 00000000
IEF196I AR4=01010037 01010037 01010037 00000000
AMAN R8=000009CB 1E65F0BC 00000002 00000041
IEF196I AMAN R8=000009CB 1E65F0BC 00000002 00000041
AR8=00000000 00000000 00000000 01010037
IEF196I AR8=00000000 00000000 00000000 01010037
AMAN RC=1E6886A8 1E620580 9E674150 1E620F00
IEF196I AMAN RC=1E6886A8 1E620580 9E674150 1E620F00
ARC=00000000 00000000 00000000 00000000
IEF196I ARC=00000000 00000000 00000000 00000000
AMAN801I Monitor Recovering from Abend
IEF196I AMAN801I Monitor Recovering from Abend
AMAN802S Too many monitor abends. Shutdown ordered
IEF196I AMAN802S Too many monitor abends. Shutdown ordered
AMAN899I Monitor shutdown in progress
IEF196I AMAN899I Monitor shutdown in progress
---------------
This is not a clean shutdown. that might be the reason it failed coming up when i tried Start command.

Sorry i cant recreate the scenario.

I had similar issue a month back which i posted on forum on June 5th
0
Previous Topic | Next Topic
Print
Reply

Quick Navigation:


Create your own forum with Website Toolbox!