Troubleshooting Logs

The Connectivity Platform offers live log monitoring capabilities, enabling real-time scrutiny of Device/Solution Data/Status as well as Error/Warning messages.

Devices Logs

The real-time device connectivity logs provide insights on the traffic incoming to the connectivity component and enable troubleshooting of device miss behavior.

Connectivity logs are JSON formatted with following structure.

Key	Description
connectionId	Identifier associated with the active connection. Use to build the tracking_id for internal events (see Solution logs).
direction	INBOUND or OUTBOUND
domainId	Public sub-domain (empty if "protocol=api")
event	Event type (See Event types table)
id	Device Identity
ipAddress	IP address of the client (empty if "protocol=api")
key	The path targeted by the event (eg. topic, resource or signal id)
message	The description of the event
protocol	Protocol used: http, mqtt, api (from scripting)
time	Time of the event in date time format
timestamp	Time of the event in microseconds

Event Types

Filter	Type	Description
Data	data_in	Messages received from the device.
Device Status	deleted / expired / provisioned	Device state changes
Connection Status	connect / disconnect	Connection or disconnection events, if disconnected, will also associate with a disconnect_reason
Error / Warning	debug	Unexpected events happing during connection, authentication or message parsing.

Error Logs

Connectivity error logs will provide understanding the reason why a client fails to connect or to publish data to the platform. Most notable errors are detailed bellow:

Message	Description	What to do
MQTT permission denied	don't have enough permission	Give permission on UI device page
MQTT subscription denied	don't have enough permission	Set permission in device page
[ScriptAsync] Dispatcher publish failed	Script trigger failed	No action needed, it will keep retrying in background
Message persist_status: timestamp too far in the future	timestamp is too new	should not publish future data, timestamp is in microsecond
Message persist_status: timestamp too far in the past	timestamp is too old	should not publish data that is 10 years ago, timestamp is in microsecond
Error: values should be map	When publish batch data, values should be a map	Use map for values
Error: timestamp is not number	When publish batch data, timestamp should be a valid number	Use valid timestamp
Error: timestamp and values are required.	when publish batch data, both keys need to be provided	Follow expected data format to publish data
Error: signal not exists	signal is not created before publish data	Need to publish to config_io to define the signal first
Error: resource not found	Resource is not created before publish data	Need to create resource on UI first
Error: invalid resource value	Resource value does not match definition	Need to match resource definition: number or boolean or string
Error: invalid data_in	data_in is not a valid map after parsing into json format	need to publish a valid json map
Error: invalid data_in json parsing error	data_in is not a valid json string	Need to publish a valid json string
Error: invalid signal value	data_in is not in valid format	data_in should contains key value pair, key is signal name, value is signal value
Error: invalid config_io	config_io is not a valid json map	need to publish valid config_io
Error: invalid config_io json parsing error	config_io is not a valid json string	need to publish valid config_io
Message process: Signal definition should be a map.	signal definition is not a valid map	publish map definition
Error: invalid config_io value	signal definition is not valid value	need to be a json map
MQTT save_signal_error payload too large	publish payload is too large	send a smaller payload, less than 1MB
MQTT rate limit error	inflight messages more than max size	send with a slower pace
save data to db has an error	save data to db has an error	try again later
MQTT bad_payload_format	payload is not a valid format	should be a valid json map
MQTT invalid_format	resource value format does not match resource defined type	publish expected data type
MQTT invalid payload not valid json	payload is not a valid json array	batch publish should be a json array
MQTT invalid topic	publish topic is invalid	publish to valid topic
MQTT publisher_failed	event trigger failed	try again later
MQTT Connection failed	connection reached rate limit	try again later
unsupported_topic	topic is not valid	need to publish to supported topics

Solution Logs

Note

Currently solution logs are only available in the logging system (See Historical Logs). Real time view in the portal shall be added in a later version.

Solution logs are JSON formatted with following structure.

Key	Description
Solution_id	The solution namespace Id in scope of the log event
Severity	A numerical entry matching Syslog log severity standards. From 0=Emergency, to 7=debug
Timestamp	Unix timestamp of the event in milliseconds
Type	Type of the event defining the content of the data field (See below)
Service	The IoT service ID
Service_type	The single character referencing the type of service
Event	The event ID
Tracking_id	Operation ID causing the event log
Message	A human readable textual information about the event
Data	Additional contextual information in JSON format
Code	A numerical status code following HTTP standard, reflecting the execution result of the operation

Solution Log types

Found in the "type" key of the log entries.

Type	Description	Example
config	Configuration events	Adding a service configuration to a solution.
event	Runtime event trigger	Incoming message from device is transmitted to scripting for processing.
call	Service operation request from scripting	Solution script calls the connectivity service to list devices.
script	Logs emitted by the solution script using the "print" function	<solution defined>

Configuration Logs

Every modification in a solution object is subject to structured log to provide troubleshooting and audit capability on the system. Configuration logs follow the same structure as other type of solution logs in JSON format (See Solution Logs structure table).

The key 'Data' carries diverse definitions depending on log type:

Data: Additional contextual information

The log is then emitted in the system Standard Out interface. This structure enables the logging system to index solution event and perform monitoring, statistical and alerting based on the user requirements. See the Configuration Specification document for additional information.

Routing Logs

For user to troubleshoot IoT solutions, logs are emitted to reflect runtime executions in the system for both event handling and service calls. The created logs follow a consistent structure with configuration logs (See Solution Configuration Logs) and the scripting logs (See Script Logs) as follow:

The key 'Data' carries diverse definitions depending on log type:

Data: Additional contextual information as follow:
- Elapsed: The duration in microseconds required for the request execution,
- Processing_time: The amount of processing time in microseconds required for the request execution,
  - The processing of the request could be on-hold and therefore use fewer computing resources that the elapsed time. Or in the opposite, a request could be executed by parallel processes and use more processing time than the clock time.
- Data_in: The payload size in bytes part of the request,
- Data_out: The payload size in bytes part of the response,
- Memory_usage: The byte size of memory used by the scripting execution. For event triggers only.

This information enables statistical reporting, monitoring and alerting of the IoT Solution. Runtime execution is processing heavy, and a typical instance of the Service API is expected to handle up to 10000 executions per seconds. Production of dedicated execution log for each execution would not be viable from a resource cost perspective. To limit this issue, produced logs are aggregated and buffered prior to be emitted.

Scripting Logs

To enable troubleshooting, script developer leverages the Lua native “print” function which would traditionally output the log in the application Standard-Out. As running in a virtual machine, the output of the script is handled by the scripting engine and therefore log entry emitted by the script must be relayed to the infrastructure logging system. Due to potential high log throughput, a user can create a recursive loop of “print” in a script and thousands of scripts can be running in parallel, logs are not forwarded to the Engine Standard-Out channel and instead are written into dedicated log files following the same approaches as solution routing logs (See Routing logs) where a file storage folder location is configured for the logging agent to read the files directly. Logs files are limited to 100MB and a maximum of 10 files are kept. Each log entry follows the common structure (See Solution Logs structure table).

The key 'Data' carries diverse definitions depending on log type:

Data: Additional parameters provided by the script log function

As the engine cannot aggregate logs produced by the user script without potentially losing its meaning, logs are not aggregated. This can potentially cause a large number of logs to be emitted which can challenge the system resources. To prevent this issue, by default, only logs of severity “Error” or above are emitted and lower severity entries are discarded. To enable user granular troubleshooting of low severity log entry, the Script Engine exposes an API operation to temporarily enable the emissions of all logs for a parametrized duration of maximum 2 hours.

Historical Logs

Logs are emitted by the platform in the infrastructure logs. Therefor if configured, you should be able to access them through your infrastructure log system (A link is integrated in the platform user interface). A typical log system is Elasticsearch or OpenSearch with Kibana.

In the log system, you can search and find Solution logs and device error logs (Data message are not emitted in the logs system).

Query desired log

You can now refine your log search by filtering for specific criteria, such as device ID, over a defined period.

Note

Keep in mind, selecting a broad time range may return a substantial amount of log hits.

Narrow down your filter search as follow:

The desired infrastructure namespace
The desired time window
The desired solution namespace by its solution ID (Application or connector)
The desired platform component (see Components table) using the key kubernetes.container_name (subject to difference based on log system)

Example for searching ingest information about a device MYDEVICE01 in a solution ID k972nv8vb8hk0000.

Query: ("k972nv8vb8hk0000.m2" AND kubernetes.container_name: haproxy) OR (kubernetes.container_name: connectivity_hub AND (k972nv8vb8hk0000 OR MYDEVICE01))

Components of interest

Following notable components name can be used to search for relevant logs:

Component ID	Description	What logs
pegasus_api, pegasus_dispatcher, pegasus_script_engine	Core components of the platform	Solution logs
bizapi	The administration backend component	Logs relevant to the platform admonistration
sphinx_api	The WebServices gateway component	Logs relevant to Solution APIs
connectivity_hub	The connectivity component (Device2)	Device logs
haproxy	Ingress reverse proxy	Connection logs coming from Admin portal, Devices & clients of the Web-service Service. Each connection ending is registered. A normal closure logs includes "--". See HAProxy documentation for other closure codes