Troubleshooting Logs
The Connectivity Platform offers live log monitoring capabilities, enabling real-time scrutiny of Device/Solution Data/Status as well as Error/Warning messages.
Devices Logs
The real-time device connectivity logs provide insights on the traffic incoming to the connectivity component and enable troubleshooting of device miss behavior.
Connectivity logs are JSON formatted with following structure.
Key | Description |
---|---|
connectionId | Identifier associated with the active connection. Use to build the tracking_id for internal events (see Solution logs). |
direction | INBOUND or OUTBOUND |
domainId | Public sub-domain (empty if "protocol=api") |
event | Event type (See Event types table) |
id | Device Identity |
ipAddress | IP address of the client (empty if "protocol=api") |
key | The path targeted by the event (eg. topic, resource or signal id) |
message | The description of the event |
protocol | Protocol used: http, mqtt, api (from scripting) |
time | Time of the event in date time format |
timestamp | Time of the event in microseconds |
Event Types
Filter | Type | Description |
---|---|---|
Data | data_in | Messages received from the device. |
Device Status | deleted / expired / provisioned | Device state changes |
Connection Status | connect / disconnect | Connection or disconnection events, if disconnected, will also associate with a disconnect_reason |
Error / Warning | debug | Unexpected events happing during connection, authentication or message parsing. |
Error Logs
Connectivity error logs will provide understanding the reason why a client fails to connect or to publish data to the platform. Most notable errors are detailed bellow:
Message | Description | What to do |
---|---|---|
MQTT permission denied | don't have enough permission | Give permission on UI device page |
MQTT subscription denied | don't have enough permission | Set permission in device page |
[ScriptAsync] Dispatcher publish failed | Script trigger failed | No action needed, it will keep retrying in background |
Message persist_status: timestamp too far in the future | timestamp is too new | should not publish future data, timestamp is in microsecond |
Message persist_status: timestamp too far in the past | timestamp is too old | should not publish data that is 10 years ago, timestamp is in microsecond |
Error: values should be map | When publish batch data, values should be a map | Use map for values |
Error: timestamp is not number | When publish batch data, timestamp should be a valid number | Use valid timestamp |
Error: timestamp and values are required. | when publish batch data, both keys need to be provided | Follow expected data format to publish data |
Error: signal not exists | signal is not created before publish data | Need to publish to config_io to define the signal first |
Error: resource not found | Resource is not created before publish data | Need to create resource on UI first |
Error: invalid resource value | Resource value does not match definition | Need to match resource definition: number or boolean or string |
Error: invalid data_in | data_in is not a valid map after parsing into json format | need to publish a valid json map |
Error: invalid data_in json parsing error | data_in is not a valid json string | Need to publish a valid json string |
Error: invalid signal value | data_in is not in valid format | data_in should contains key value pair, key is signal name, value is signal value |
Error: invalid config_io | config_io is not a valid json map | need to publish valid config_io |
Error: invalid config_io json parsing error | config_io is not a valid json string | need to publish valid config_io |
Message process: Signal definition should be a map. | signal definition is not a valid map | publish map definition |
Error: invalid config_io value | signal definition is not valid value | need to be a json map |
MQTT save_signal_error payload too large | publish payload is too large | send a smaller payload, less than 1MB |
MQTT rate limit error | inflight messages more than max size | send with a slower pace |
save data to db has an error | save data to db has an error | try again later |
MQTT bad_payload_format | payload is not a valid format | should be a valid json map |
MQTT invalid_format | resource value format does not match resource defined type | publish expected data type |
MQTT invalid payload not valid json | payload is not a valid json array | batch publish should be a json array |
MQTT invalid topic | publish topic is invalid | publish to valid topic |
MQTT publisher_failed | event trigger failed | try again later |
MQTT Connection failed | connection reached rate limit | try again later |
unsupported_topic | topic is not valid | need to publish to supported topics |
Solution Logs
Note
Currently solution logs are only available in the logging system (See Historical Logs). Real time view in the portal shall be added in a later version.
Solution logs are JSON formatted with following structure.
Key | Description |
---|---|
Solution_id | The solution namespace Id in scope of the log event |
Severity | A numerical entry matching Syslog log severity standards. From 0=Emergency, to 7=debug |
Timestamp | Unix timestamp of the event in milliseconds |
Type | Type of the event defining the content of the data field (See below) |
Service | The IoT service ID |
Service_type | The single character referencing the type of service |
Event | The event ID |
Tracking_id | Operation ID causing the event log |
Message | A human readable textual information about the event |
Data | Additional contextual information in JSON format |
Code | A numerical status code following HTTP standard, reflecting the execution result of the operation |
Solution Log types
Found in the "type" key of the log entries.
Type | Description | Example |
---|---|---|
config | Configuration events | Adding a service configuration to a solution. |
event | Runtime event trigger | Incoming message from device is transmitted to scripting for processing. |
call | Service operation request from scripting | Solution script calls the connectivity service to list devices. |
script | Logs emitted by the solution script using the "print" function |
Configuration Logs
Every modification in a solution object is subject to structured log to provide troubleshooting and audit capability on the system. Configuration logs follow the same structure as other type of solution logs in JSON format (See Solution Logs structure table).
The key 'Data' carries diverse definitions depending on log type:
- Data: Additional contextual information
The log is then emitted in the system Standard Out interface. This structure enables the logging system to index solution event and perform monitoring, statistical and alerting based on the user requirements. See the Configuration Specification document for additional information.
Routing Logs
For user to troubleshoot IoT solutions, logs are emitted to reflect runtime executions in the system for both event handling and service calls. The created logs follow a consistent structure with configuration logs (See Solution Configuration Logs) and the scripting logs (See Script Logs) as follow:
The key 'Data' carries diverse definitions depending on log type:
- Data: Additional contextual information as follow:
- Elapsed: The duration in microseconds required for the request execution,
- Processing_time: The amount of processing time in microseconds required for the request execution,
- The processing of the request could be on-hold and therefore use fewer computing resources that the elapsed time. Or in the opposite, a request could be executed by parallel processes and use more processing time than the clock time.
- Data_in: The payload size in bytes part of the request,
- Data_out: The payload size in bytes part of the response,
- Memory_usage: The byte size of memory used by the scripting execution. For event triggers only.
This information enables statistical reporting, monitoring and alerting of the IoT Solution. Runtime execution is processing heavy, and a typical instance of the Service API is expected to handle up to 10000 executions per seconds. Production of dedicated execution log for each execution would not be viable from a resource cost perspective. To limit this issue, produced logs are aggregated and buffered prior to be emitted.
Scripting Logs
To enable troubleshooting, script developer leverages the Lua native “print” function which would traditionally output the log in the application Standard-Out. As running in a virtual machine, the output of the script is handled by the scripting engine and therefore log entry emitted by the script must be relayed to the infrastructure logging system. Due to potential high log throughput, a user can create a recursive loop of “print” in a script and thousands of scripts can be running in parallel, logs are not forwarded to the Engine Standard-Out channel and instead are written into dedicated log files following the same approaches as solution routing logs (See Routing logs) where a file storage folder location is configured for the logging agent to read the files directly. Logs files are limited to 100MB and a maximum of 10 files are kept. Each log entry follows the common structure (See Solution Logs structure table).
The key 'Data' carries diverse definitions depending on log type:
- Data: Additional parameters provided by the script log function
As the engine cannot aggregate logs produced by the user script without potentially losing its meaning, logs are not aggregated. This can potentially cause a large number of logs to be emitted which can challenge the system resources. To prevent this issue, by default, only logs of severity “Error” or above are emitted and lower severity entries are discarded. To enable user granular troubleshooting of low severity log entry, the Script Engine exposes an API operation to temporarily enable the emissions of all logs for a parametrized duration of maximum 2 hours.
Historical Logs
Logs are emitted by the platform in the infrastructure logs. Therefor if configured, you should be able to access them through your infrastructure log system (A link is integrated in the platform user interface). A typical log system is Elasticsearch or OpenSearch with Kibana.
In the log system, you can search and find Solution logs and device error logs (Data message are not emitted in the logs system).
Query desired log
You can now refine your log search by filtering for specific criteria, such as device ID, over a defined period.
Note
Keep in mind, selecting a broad time range may return a substantial amount of log hits.
Narrow down your filter search as follow:
- The desired infrastructure namespace
- The desired time window
- The desired solution namespace by its solution ID (Application or connector)
- The desired platform component (see Components table) using the key
kubernetes.container_name
(subject to difference based on log system)
Example for searching ingest information about a device MYDEVICE01
in a solution ID k972nv8vb8hk0000
.
Query: ("k972nv8vb8hk0000.m2" AND kubernetes.container_name: haproxy) OR (kubernetes.container_name: connectivity_hub AND (k972nv8vb8hk0000 OR MYDEVICE01))
Components of interest
Following notable components name can be used to search for relevant logs:
Component ID | Description | What logs |
---|---|---|
pegasus_api, pegasus_dispatcher, pegasus_script_engine | Core components of the platform | Solution logs |
bizapi | The administration backend component | Logs relevant to the platform admonistration |
sphinx_api | The WebServices gateway component | Logs relevant to Solution APIs |
connectivity_hub | The connectivity component (Device2) | Device logs |
haproxy | Ingress reverse proxy | Connection logs coming from Admin portal, Devices & clients of the Web-service Service. Each connection ending is registered. A normal closure logs includes "--". See HAProxy documentation for other closure codes |