DLI EPCDC32 User's Guide
1.9.19.0
AutoPing

AutoPing can monitor a network device and perform a task if the device stops responding. It can also monitor a group of devices, the task will be executed if none of the group members respond. The task is either a list of outlets to reboot or a script to execute.

Common configuration

autoping_common.png
Common AutoPing settings

Be sure to enable AutoPing operation by ticking the "Enable AutoPing" checkbox. Certain reset procedures may turn it off automatically.

The following parameters are used for AutoPing operation:

  • Time between pings: This is the time between each ping check of an address. 60 seconds should be useful for most applications.
  • Ping timeout to reboot: This sets the maximum time that sequential communication attempts may fail. Any failure beyond this time limit will cause the task to be executed. For example, when set to 300 seconds and a time between pings is 30 seconds, if a target system fails to any pings for 330 seconds, the task will be executed. The ping that occurred after 300 seconds came at 330 seconds and still failed. Since occasional network overloads and missed packets can occur during normal network operation, be sure to choose a reasonable time. AutoPing may handle certain failures immediately instead of waiting for the timeout if configured to (see below).
  • Ping responses to enable autoping: To ensure a reliable connection, autoping will only be enabled after this many successful pings. We do not recommend changing this (10 is default) unless you must configure your controller before connecting it to the target devices.
  • Times to attempt reboot consecutively: If you have an unreliable target device, limit the number of times it will be rebooted by entering that value here. For example, entering 5 will execute the task up to 5 times before giving up. A successful ping will reset the counter.
  • Total times to attempt reboot: similar to above, but limit the total number of times the device will be rebooted, even with successful pings in between.
  • Device reboot delay: After rebooting a device with a cold-boot power-off, a waiting period should occur before the IP address is re-checked by AutoPing. This delay allows the device to reboot. Windows and Linux servers can force automatic file system checks which may take several minutes to complete. Enter a safe value here, for example entering 600 would cause the power controller to start checking the server for normal operation 10 minutes after reboot. If a script is to be triggered, any delays contained in the code being executed should be considered in determining the delay setting here so that the thread completes before the delay elapses. This timer starts at the execution of the thread started.
  • Handle failures immediately instead of waiting for timeout: Enabling this feature may make sense for handling certain AutoPing target types which may return an explicit error (TCP RST, HTTP 500, etc.) by invoking the task immediately instead of waiting for the timeout to pass (during which the error condition could have disappeared and no action would have been taken). Consider the setup and AutoPing action when enabling this option (e.g. you shouldn't enable it if the AutoPing action is to power cycle a server, you need to shut one of its services down temporarily for maintenance and it's the only target of the AutoPing entry).
  • Activate enabled entries without trial on service restoration: By default enabled entries still need to wait for a certain number of successful ping responses on initial power-up before AutoPing actions are taken to make sure the targets have come online as well (in the assumption that they might have suffered a power failure as well and may need time to recover). This option can be used to disable this additional check.

Ping target configuration

To actually use AutoPing, add one or more AutoPing targets (IP addresses) to the list. The button is used to remove a target from the list.

Below is an example autoping configuration with four targets:

autoping_individual.png
Individual AutoPing settings

The checkbox to the left of the IP address is used to start/stop target monitoring. Confirm your action with button. This button is also used to link a list of outlets or a script line to the autoping target.

The current AutoPing item and target state is indicated as follows:

ap_explanation_individual.png
AutoPing target and item state indication

You can select the outlets to perform trigger action on by ticking their respective checkboxes.

You can select a scripting action to perform when the AutoPing item triggers (by default the selected outlets are cycled). The action must be a function defined in the scripting server, like

function action_to_perform()
    .
    .
    .
end

or e.g.

function action_to_perform(selected_outlets)
    .
    .
    .
end

In the second form the argument selected_outlets (any other name will do) will receive a table of the 1-based indices of outlets selected (e.g. {1,3,6}). The order of outlets in the table is unspecified; use table.sort in the script function if you rely on a particular order.

The stats column shows some statistics:

  • TX — the number of pings sent to the target IP address;
  • RX — the number of pongs received back so far;
  • HIT — the number of times the trigger action was executed.

On the sample image, three targets are being monitored (74.125.87.103, 67.122.199.250, and 192.168.0.93). 192.168.0.93 seems to be a very reliable/well-connected device: 823 pings were sent to it and 822 pongs received back. Chances are very good, the 823rd pong will arrive soon. The reboot task (script function toggle_stuff_and_log) was never executed.

Looks like 192.168.0.92 failed hard. The task (cycle outlets 3,5,6) was executed 5 times in a row but the target did not respond. Monitoring was automatically disabled.

74.125.87.103 and 67.122.199.250 form a group, the trigger task will be performed if they both lose 5 sequential packets simultaneously. This has happened 2 times so far. Monitoring a group of several external spatially separated reliable IP addresses (in this example they belong to Google and Digital Loggers respectively) may become very useful to detect a stuck ADSL modem or some other no-Internet condition.

Action on local network failures

AutoPing is designed to control operation of remote hosts. You usually don't want to e.g. cycle power to all servers if you turn on same subnet restriction. So AutoPing tries not to trigger if there might be a problem local to the unit itself. For example, if you detach the Ethernet cable from the unit, you'll see messages similar to the following:

kernel: eth0: link down
config.net: Interface "eth0" is down
autoping: ping x.y.z.t: no usable route to host, ..., not considered a failure

and no actions will be performed. A similar situation will occur if you reconfigure the controller to use a new IP network from which old addresses are unreachable.

Use the link:// scheme to check for local link loss.

Advanced ping targets

AutoPing targets don't have to be IP addresses. If you enter a hostname, it will be resolved before sending each request. If the name resolution fails, it is assumed to be a local error and, as described above, no action is taken. If a name is resolved to multiple IP addresses, a random one is chosen.

AutoPing defaults to checking targets using the ICMP protocol by default. A variety of other ping target kinds can be used if you specify a URL instead of simply an IP address or hostname. Supported URL schemes include:

  • icmp — this is explicit specification of the "regular" ping protocol, e.g. icmp://192.168.0.1 is equivalent to 192.168.0.1 (note that no trailing slash is used);
  • link — this allows to check if the physical link is present on the wired (link://eth0) or wireless (link://wlan0) interface (which is useful as higher-level targets will usually ignore link loss);
  • tcp — this causes AutoPing to try to establish a TCP connection to the given port, e.g. tcp://192.168.0.1:22 can be used to check that there's a service listening on TCP port 22 (usually SSH) of 192.168.0.1 (note that no trailing slash is used);
  • http and https — this causes AutoPing to perform a HTTP/HTTPS GET request for the given URL, e.g. http://www.digital-loggers.com/index.html can be used to check that the web server is responding and can serve its main page.

AutoPing events

The most often encountered AutoPing events are:

  • pinging ... (timeout)
  • ping ... succeeded (time)
  • ping ... failed (time)

The time is request round-trip time, in seconds. Note that it's purely informative and can't be used as a measure of target response time unless it has order of hundreds of milliseconds and above.

Several failures in a row trigger AutoPing actions which are reported with corresponding events:

  • item ... (addresses...) failed [failures/max]
  • item ... (addresses...) failed over (max) times in a row, disabling

As described above, local network failures don't count toward failure count, but generate these notifications instead:

  • no usable route to host, possibly due to local network outage, not considered a failure (when a request isn't being sent)
  • ping ... not received (time), possibly due to local network outage, not considered a failure (when an outage occurred after a request has been sent)

The events associated with item trial before enabling are self-explanatory:

  • item ... (addresses...) enable approved
  • item ... (addresses...) enable cancelled
  • item ... (addresses...) trial restarted due to address list changes