meta data for this page
Software interface
There are several software interfaces available to monitor the status of the RECS®|Box system. These are the Management WebGUI, a Redfish API and a proprietary REST API providing XML based monitoring and management functionality.
Management WebGUI
The Management WebGUI is established on every RECS®|Box unit. Accessible by any known browser on the assigned IP address and the default port 80. The following views are dependent on the device and assembly.
In general these symbols have the following meaning on every page:
Everything is OK. Also indicated by a green line in a graph. | |
Warnung. Something is wrong, but the system is still fully functional. The system has to be checked so the problem doesn't get worse. Indicated by a yellow line in a graph. | |
Critical Error. The system must be checked immediately and maybe has to be shut down to prevent hardware damage. indicated by a red line in a graph. |
On the left side is a menu, which can be toggled by clicking the menu button in the upper left corner of the screen. The menu contains the following items:
Dashboard: General overview of the managed system, installed nodes and health status
Management: Power control and monitoring for all nodes and fans
Network: VLAN-Configuration and of management network
Composition: Configuration of PCIe resources
Users: User management
Settings: System-wide configuration settings
Time: System time settings
Firmware: Firmware updates and overview of software versions
Logs: Logs from the management software about system health and java messages.
Dashboard
The Dashboard is seen first when opening the WebGUI and displays the summarized system health status.
Management
In this view, nodes can be turned on or off with a quick menu, which opens when clicking on the gear symbol of a node.
Multiple nodes can be controled at once via the panel “Batch-Control Nodes”.
The view also shows fan monitoring data and allows a detailed look at the temperature map of the system's baseboard.
By clicking on a node label, the respective Node Management view is opened.
Furthermore the view displays the summarized system health status.
Node Management
This view features controlling the power state of the selected node and monitoring its detailed status values and graphs.
It is also possible to change KVM settings or open a console to the node.
If the node is running and the RECSDeamon is installed on it, even more detailed data is shown.
Network
The network view allows changing the settings of the managment port. This port is used to access the webinterface and all APIs.
In addition to that, VLANs of the node network can be configured and assigned to the ports of the nodes and the backpanel.
Composition
This view allows the configuration of the PCIe resources in the form of composed nodes.
A composed node is a reserved bundle of resources, which utilize PCIe functions.
A wizard leads through the process of creating such composed nodes.
Users
This view features the user management. Users can be created, edited or deleted.
Additionally, IPMI passwords can be set.
Settings
This view allows changing system-wide preferences (e.g. regarding the interfaces of the system).
Time
Here, the system time can be set either manually or using NTP.
Firmware
This view shows the currently installed versions of the firmware and management software.
Furthermore, it is possible to update those software components.
Logs
In the System Events tab of this view, the status changes of the sensors, fan and boards can be seen.
In the Java Messages tab , all messages regarding the software can be found.
Several filters can be set for both tabs at the top.
The whole log can be downloaded as a ZIP file containing the individual logfiles.
Redfish API
The management software also features a Redfish API.
The documentation can be seen at Github.
REST API
Access
The REST API is accessible via the management IP-Address or the hostname of the system. The basic URL of the API has the format https://host/REST/
.
Accessing the REST API requires HTTP Basic authentication. The authenticated user has to be in the “Admin” or “User” group to be able to execute the POST/PUT management calls.
Components
The REST API makes all hardware components in the cluster available as XML trees in software. The following components are supported by the API:
Attribute | Description |
---|---|
rcu | A RECS Computing Unit (RCU) represents the overall system |
backplane | A backplane holds sensors and controls fans |
baseboard | A baseboard can be equipped with zero or more nodes |
node | A single node |
RCU
The main entrypoint of this API is the RECS Computing Unit (RCU).
Request:
curl -X GET -k -i https://host/REST/rcu
Response:
<rcu name="RCUMaster (192.168.XX.YY)" fanSpeed="100" fanProfile="Manual" health="OK" ip="192.168.XX.YY" lastSensorUpdate="1701160258937" type="RECS|Box Deneb" id="RCU_10995770589198"> <temperature> <sensor name="Backplane 1 temp. 0" unit="°C" health="OK">26,2</sensor> <sensor name="Backplane 1 temp. 1" unit="°C" health="OK">32,1</sensor> <sensor name="Backplane 1 temp. 2" unit="°C" health="OK">23,0</sensor> <sensor name="Backplane 1 temp. 3" unit="°C" health="OK">28,0</sensor> <sensor name="Backplane 1 temp. 4 (PCIe-Switch)" unit="°C" health="OK">47,2</sensor> <sensor name="Backplane 1 temp. 5 (Ethernet-Switch)" unit="°C" health="OK">36,1</sensor> <sensor name="Backplane 2 temp. 0" unit="°C" health="OK">26,0</sensor> <sensor name="Backplane 2 temp. 1" unit="°C" health="OK">32,0</sensor> <sensor name="Backplane 2 temp. 2" unit="°C" health="OK">23,1</sensor> <sensor name="Backplane 2 temp. 3" unit="°C" health="OK">28,1</sensor> <sensor name="Backplane 2 temp. 4 (PCIe-Switch)" unit="°C" health="OK">47,0</sensor> <sensor name="Backplane 2 temp. 5 (Ethernet-Switch)" unit="°C" health="OK">36,1</sensor> <sensor name="Backplane 3 temp. 0" unit="°C" health="OK">32,2</sensor> <sensor name="Backplane 3 temp. 1" unit="°C" health="OK">44,2</sensor> <sensor name="Backplane 3 temp. 2" unit="°C" health="OK">26,1</sensor> <sensor name="Backplane 3 temp. 3" unit="°C" health="OK">36,0</sensor> <sensor name="Backplane 3 temp. 4 (PCIe-Switch)" unit="°C" health="OK">74,0</sensor> <sensor name="Backplane 3 temp. 5 (Ethernet-Switch)" unit="°C" health="OK">52,1</sensor> <sensor name="Node average temperature" unit="°C" health="OK">38.279927272086105</sensor> <sensor name="Node highest temperature" unit="°C" health="OK">62.24058727899749</sensor> <sensor name="RCU infrastructure highest temperature" unit="°C" health="OK">74.00064867543587</sensor> </temperature> <backplane>RCU_10995770589198_BP_1</backplane> <backplane>RCU_10995770589198_BP_2</backplane> <backplane>RCU_10995770589198_BP_3</backplane> <baseboard>RCU_10995770589198_BB_1</baseboard> <baseboard>RCU_10995770589198_BB_2</baseboard> <baseboard>RCU_10995770589198_BB_3</baseboard> <baseboard>RCU_10995770589198_BB_4</baseboard> <baseboard>RCU_10995770589198_BB_6</baseboard> <baseboard>RCU_10995770589198_BB_7</baseboard> <baseboard>RCU_10995770589198_BB_8</baseboard> <baseboard>RCU_10995770589198_BB_9</baseboard> <fan>RCU_10995770589198_Fan_DENEB_1</fan> <fan>RCU_10995770589198_Fan_DENEB_2</fan> <fan>RCU_10995770589198_Fan_DENEB_3</fan> <node>RCU_10995770589198_BB_1_0</node> <node>RCU_10995770589198_BB_1_2</node> <node>RCU_10995770589198_BB_1_3</node> <node>RCU_10995770589198_BB_1_4</node> <node>RCU_10995770589198_BB_1_5</node> <node>RCU_10995770589198_BB_1_6</node> <node>RCU_10995770589198_BB_1_7</node> <node>RCU_10995770589198_BB_1_8</node> <node>RCU_10995770589198_BB_1_9</node> <node>RCU_10995770589198_BB_1_10</node> <node>RCU_10995770589198_BB_1_11</node> <node>RCU_10995770589198_BB_1_12</node> <node>RCU_10995770589198_BB_1_13</node> <node>RCU_10995770589198_BB_1_14</node> <node>RCU_10995770589198_BB_1_15</node> <node>RCU_10995770589198_BB_2_0</node> <node>RCU_10995770589198_BB_2_1</node> <node>RCU_10995770589198_BB_2_2</node> <node>RCU_10995770589198_BB_3_0</node> <node>RCU_10995770589198_BB_3_1</node> <node>RCU_10995770589198_BB_3_2</node> <node>RCU_10995770589198_BB_4_0</node> <node>RCU_10995770589198_BB_4_1</node> <node>RCU_10995770589198_BB_4_2</node> <node>RCU_10995770589198_BB_6_0</node> <node>RCU_10995770589198_BB_6_1</node> <node>RCU_10995770589198_BB_6_2</node> <node>RCU_10995770589198_BB_7_0</node> <node>RCU_10995770589198_BB_7_1</node> <node>RCU_10995770589198_BB_7_2</node> <node>RCU_10995770589198_BB_7_3</node> <node>RCU_10995770589198_BB_7_4</node> <node>RCU_10995770589198_BB_7_5</node> <node>RCU_10995770589198_BB_7_6</node> <node>RCU_10995770589198_BB_7_8</node> <node>RCU_10995770589198_BB_7_9</node> <node>RCU_10995770589198_BB_7_10</node> <node>RCU_10995770589198_BB_7_11</node> <node>RCU_10995770589198_BB_7_12</node> <node>RCU_10995770589198_BB_7_13</node> <node>RCU_10995770589198_BB_7_14</node> <node>RCU_10995770589198_BB_7_15</node> <node>RCU_10995770589198_BB_8_0</node> <node>RCU_10995770589198_BB_8_1</node> <node>RCU_10995770589198_BB_8_2</node> <node>RCU_10995770589198_BB_8_3</node> <node>RCU_10995770589198_BB_8_4</node> <node>RCU_10995770589198_BB_8_5</node> <node>RCU_10995770589198_BB_8_6</node> <node>RCU_10995770589198_BB_8_7</node> <node>RCU_10995770589198_BB_8_9</node> <node>RCU_10995770589198_BB_8_10</node> <node>RCU_10995770589198_BB_8_11</node> <node>RCU_10995770589198_BB_8_12</node> <node>RCU_10995770589198_BB_8_13</node> <node>RCU_10995770589198_BB_8_14</node> <node>RCU_10995770589198_BB_8_15</node> <node>RCU_10995770589198_BB_9_0</node> <node>RCU_10995770589198_BB_9_1</node> <node>RCU_10995770589198_BB_9_2</node> <node>RCU_10995770589198_BB_9_3</node> <node>RCU_10995770589198_BB_9_4</node> <node>RCU_10995770589198_BB_9_5</node> <node>RCU_10995770589198_BB_9_6</node> <node>RCU_10995770589198_BB_9_7</node> <node>RCU_10995770589198_BB_9_8</node> <node>RCU_10995770589198_BB_9_10</node> <node>RCU_10995770589198_BB_9_11</node> <node>RCU_10995770589198_BB_9_12</node> <node>RCU_10995770589198_BB_9_13</node> <node>RCU_10995770589198_BB_9_14</node> <node>RCU_10995770589198_BB_9_15</node> <power> <sensor name="RCU total power usage" unit="W" health="OK">2024.3027830888711</sensor> <sensor name="RCU infrastructure power usage" unit="W" health="OK">59.600615599470274</sensor> <sensor name="RCU power usage (Node)" unit="W" health="OK">467.0563244508067</sensor> <sensor name="RCU power usage (PEG)" unit="W" health="OK">1497.6458430385942</sensor> </power> </rcu>
Attributes:
Attribute | Description | Unit | Data type |
---|---|---|---|
name | Name of the RCU | - | String |
fanSpeed | Current speed setting of the fans in the RCU | % | Integer |
fanProfile | Current fan profileof the RCU | % | Integer |
health | Health status of the RCU (OK, Warning, Critical) | - | String |
ip | IP address of the RCU | - | String |
kvmNode | ID of the node to which the KVM system is switched (optional) | - | String |
lastSensorUpdate | Timestamp of the last sensor update | ms | Long |
type | Type of the RCU | - | String |
id | ID for referencing the component | - | String |
Nested elements:
Element | Description | Unit | Data type |
---|---|---|---|
temperature | List of temperature sensors | °C | Double |
backplane | ID of the backplanes which are installed in the RCU | - | String |
baseboard | ID of the baseboards which are installed in the RCU | - | String |
fan | IDs of fans, which are installed in the RCU | - | String |
node | IDs of nodes, which are installed in the RCU | - | String |
power | List of power sensors | W | Double |
Backplane
Request:
curl -X GET -k -i https://host/REST/backplane/RCU_10995770589198_BP_1
Response:
<backplane rcuPosition="1" health="OK" lastSensorUpdate="1701160658937" id="RCU_10995770589198_BP_1"> <current/> <fan>RCU_10995770589198_Fan_DENEB_1</fan> <fan>RCU_10995770589198_Fan_DENEB_2</fan> <fan>RCU_10995770589198_Fan_DENEB_3</fan> <temperature> <sensor name="Backplane 1 temp. 0" unit="°C" health="OK">26,1</sensor> <sensor name="Backplane 1 temp. 1" unit="°C" health="OK">32,0</sensor> <sensor name="Backplane 1 temp. 2" unit="°C" health="OK">23,0</sensor> <sensor name="Backplane 1 temp. 3" unit="°C" health="OK">28,1</sensor> <sensor name="Backplane 1 temp. 4 (PCIe-Switch)" unit="°C" health="OK">47,2</sensor> <sensor name="Backplane 1 temp. 5 (Ethernet-Switch)" unit="°C" health="OK">36,2</sensor> </temperature> </backplane>
Attributes:
Attribute | Description | Unit | Data type |
---|---|---|---|
rcuPosition | Position of the backplane inside the RCU | - | Integer |
health | Health status of the backplane (OK, Warning, Critical) | - | String |
lastSensorUpdate | Timestamp of the last sensor update | ms | Long |
id | ID for referencing the component | - | String |
Nested elements:
Element | Description | Unit | Data type |
---|---|---|---|
fan | IDs of fans, which are associated to the backplane | - | String |
temperature | List of temperature sensors | °C | Double |
The API offers backplaneList, which returns a list of the IDs of all backplanes within the system.
<backplaneList>
<backplane>RCU_10995770589198_BP_1</backplane> <backplane>RCU_10995770589198_BP_2</backplane> <backplane>RCU_10995770589198_BP_3</backplane>
</backplaneList>
Baseboard
Request:
curl -X GET -k -i https://host/REST/baseboard/RCU_10995770589198_BB_3
Response:
<baseboard type="COM Express" expansionBoardInserted="false" rcuPosition="3" health="OK" lastSensorUpdate="1701161214932" id="RCU_10995770589198_BB_3"> <node>RCU_10995770589198_BB_3_0</node> <node>RCU_10995770589198_BB_3_1</node> <node>RCU_10995770589198_BB_3_2</node> <power> <sensor name="Baseboard 3 infrastructure power" unit="W" health="OK">7,51</sensor> <sensor name="Baseboard 3 power usage (Node + PEG)" unit="W" health="OK">54.91125326153526</sensor> <sensor name="Baseboard 3 power usage (Node)" unit="W" health="OK">54.91125326153526</sensor> <sensor name="Baseboard 3 power usage (PEG)" unit="W" health="OK">0.0</sensor> </power> <temperature> <sensor name="Baseboard 3 temp. 0" unit="°C" health="OK">42,4</sensor> <sensor name="Baseboard 3 temp. 1" unit="°C" health="OK">41,4</sensor> <sensor name="Baseboard 3 temp. 2" unit="°C" health="OK">43,6</sensor> <sensor name="Baseboard 3 temp. 3" unit="°C" health="OK">25,0</sensor> <sensor name="Baseboard 3 temp. 4" unit="°C" health="OK">36,5</sensor> <sensor name="Baseboard 3 temp. 5" unit="°C" health="OK">43,4</sensor> <sensor name="Baseboard 3 temp. 6" unit="°C" health="OK">46,4</sensor> <sensor name="Baseboard 3 temp. 7" unit="°C" health="NONE">255,0</sensor> <sensor name="Baseboard 3 temp. 8 (PCIe-Switch)" unit="°C" health="OK">50,1</sensor> <sensor name="Baseboard 3 temp. 9 (Ethernet-Switch)" unit="°C" health="OK">45,1</sensor> </temperature> <voltage> <sensor name="Baseboard 3 voltage (12 V Input)" unit="V" health="OK">11,90</sensor> </voltage> </baseboard>
Attributes:
Attribute | Description | Unit | Data type |
---|---|---|---|
type | Type of the baseboard | - | String |
expansionBoardInserted | Indicates, if an expansion board is available | - | Boolean |
rcuPosition | Position of the baseboard inside the RCU | - | Integer |
health | Health status of the baseboard (OK, Warning, Critical) | - | String |
lastSensorUpdate | Timestamp of the last sensor update | ms | Long |
id | ID for referencing the component | - | String |
Nested elements:
Element | Description | Unit | Data type |
---|---|---|---|
fan | IDs of fans, which are associated to the baseboard | - | String |
node | IDs of nodes, which are installed on the baseboard | - | String |
power | List of power sensors | W | |
temperature | List of temperature sensors | °C | Double |
voltage | List of voltage sensors | V | Double |
The API offers baseboardList, which returns a list of the IDs of all baseboards within the system.
<baseboardList>
<baseboard>RCU_10995770589198_BB_1</baseboard> <baseboard>RCU_10995770589198_BB_2</baseboard> <baseboard>RCU_10995770589198_BB_3</baseboard> <baseboard>RCU_10995770589198_BB_4</baseboard> <baseboard>RCU_10995770589198_BB_6</baseboard> <baseboard>RCU_10995770589198_BB_7</baseboard> <baseboard>RCU_10995770589198_BB_8</baseboard> <baseboard>RCU_10995770589198_BB_9</baseboard>
</baseboardList>
Node
Request:
curl -X GET -k -i https://host/REST/node/RCU_10995770589198_BB_3_0
Response:
<node baseboardPosition="0" name="Node 1-1" type="Jetson" maxPowerUsage="21" powerState="On" health="OK" lastSensorUpdate="1701161458933" id="RCU_10995770589198_BB_1_0"> <baseboard>RCU_10995770589198_BB_1</baseboard> <deamon/> <power> <sensor name="Overall Node 1-1 power" unit="W" health="OK">20.280571457632558</sensor> <sensor name="Node 1-1 power" unit="W" health="OK">20,28</sensor> </power> <processor type="GPU" cores="256" threads="0" maxSpeedMHz="1120" manufacturer="NVIDIA" model="Pascal GP10B" /> <processor instructionSet="ARM-A64" architecture="ARM" type="CPU" cores="2" threads="2" maxSpeedMHz="2000" manufacturer="NVIDIA" model="Denver 2" /> <processor instructionSet="ARM-A64" architecture="ARM" type="CPU" cores="4" threads="4" maxSpeedMHz="2000" manufacturer="ARM" model="Cortex-A57" partNumber="1SX280LN3F43E2VG" /> <temperature> <sensor name="Node 1-1 inlet temperature" unit="°C" health="OK">20.001274723105443</sensor> <sensor name="Node 1-1 outlet temperature" unit="°C" health="OK">23.06943277191329</sensor> </temperature> <voltage> <sensor name="Baseboard 1 voltage (12 V Input)" unit="V" health="OK">11,91</sensor> </voltage> </node>
Attributes:
Attribute | Description | Unit | Data type |
---|---|---|---|
baseboardPosition | Position of the node on the baseboard | - | Integer |
name | Name of the node | - | String |
type | Type of the node | - | String |
maxPowerUsage | Maximum power the node can draw | W | Integer |
powerState | Power state of the node (Off, On, Soft-off, Standby, Hibernate) | - | String |
health | Health status of the node (OK, Warning, Critical) | - | String |
lastSensorUpdate | Timestamp of the last sensor update | ms | Long |
id | ID for referencing the component | - | String |
macAddressCompute | MAC address of the NIC connected to the compute network (optional) | - | String |
macAddressMgmt | MAC address of the NIC connected to the management network (optional) | - | String |
Nested elements:
Element | Description | Unit | Data type |
---|---|---|---|
baseboard | ID of the baseboard hosting the node | - | String |
deamon | List of deamon sensors (optional) | - | Mixed |
power | List of power sensors | W | Double |
processor | List of processors of this node with detailed information | - | - |
temperature | List of temperature sensors | °C | Double |
voltage | List of voltage sensors | V | Double |
The API offers nodeList, which returns a list of the IDs of all nodes within the system.
Request:
curl -X GET -k -i https://host/REST/node
Response:
<nodeList> <node>RCU_10995770589198_BB_1_0</node> <node>RCU_10995770589198_BB_1_2</node> <node>RCU_10995770589198_BB_1_3</node> <node>RCU_10995770589198_BB_1_4</node> <node>RCU_10995770589198_BB_1_5</node> <node>RCU_10995770589198_BB_1_6</node> <node>RCU_10995770589198_BB_1_7</node> <node>RCU_10995770589198_BB_1_8</node> <node>RCU_10995770589198_BB_1_9</node> <node>RCU_10995770589198_BB_1_10</node> <node>RCU_10995770589198_BB_1_11</node> <node>RCU_10995770589198_BB_1_12</node> <node>RCU_10995770589198_BB_1_13</node> <node>RCU_10995770589198_BB_1_14</node> <node>RCU_10995770589198_BB_1_15</node> <node>RCU_10995770589198_BB_2_0</node> <node>RCU_10995770589198_BB_2_1</node> <node>RCU_10995770589198_BB_2_2</node> <node>RCU_10995770589198_BB_3_0</node> <node>RCU_10995770589198_BB_3_1</node> <node>RCU_10995770589198_BB_3_2</node> <node>RCU_10995770589198_BB_4_0</node> <node>RCU_10995770589198_BB_4_1</node> <node>RCU_10995770589198_BB_4_2</node> <node>RCU_10995770589198_BB_6_0</node> <node>RCU_10995770589198_BB_6_1</node> <node>RCU_10995770589198_BB_6_2</node> <node>RCU_10995770589198_BB_7_0</node> <node>RCU_10995770589198_BB_7_1</node> <node>RCU_10995770589198_BB_7_2</node> <node>RCU_10995770589198_BB_7_3</node> <node>RCU_10995770589198_BB_7_4</node> <node>RCU_10995770589198_BB_7_5</node> <node>RCU_10995770589198_BB_7_6</node> <node>RCU_10995770589198_BB_7_8</node> <node>RCU_10995770589198_BB_7_9</node> <node>RCU_10995770589198_BB_7_10</node> <node>RCU_10995770589198_BB_7_11</node> <node>RCU_10995770589198_BB_7_12</node> <node>RCU_10995770589198_BB_7_13</node> <node>RCU_10995770589198_BB_7_14</node> <node>RCU_10995770589198_BB_7_15</node> <node>RCU_10995770589198_BB_8_0</node> <node>RCU_10995770589198_BB_8_1</node> <node>RCU_10995770589198_BB_8_2</node> <node>RCU_10995770589198_BB_8_3</node> <node>RCU_10995770589198_BB_8_4</node> <node>RCU_10995770589198_BB_8_5</node> <node>RCU_10995770589198_BB_8_6</node> <node>RCU_10995770589198_BB_8_7</node> <node>RCU_10995770589198_BB_8_9</node> <node>RCU_10995770589198_BB_8_10</node> <node>RCU_10995770589198_BB_8_11</node> <node>RCU_10995770589198_BB_8_12</node> <node>RCU_10995770589198_BB_8_13</node> <node>RCU_10995770589198_BB_8_14</node> <node>RCU_10995770589198_BB_8_15</node> <node>RCU_10995770589198_BB_9_0</node> <node>RCU_10995770589198_BB_9_1</node> <node>RCU_10995770589198_BB_9_2</node> <node>RCU_10995770589198_BB_9_3</node> <node>RCU_10995770589198_BB_9_4</node> <node>RCU_10995770589198_BB_9_5</node> <node>RCU_10995770589198_BB_9_6</node> <node>RCU_10995770589198_BB_9_7</node> <node>RCU_10995770589198_BB_9_8</node> <node>RCU_10995770589198_BB_9_10</node> <node>RCU_10995770589198_BB_9_11</node> <node>RCU_10995770589198_BB_9_12</node> <node>RCU_10995770589198_BB_9_13</node> <node>RCU_10995770589198_BB_9_14</node> <node>RCU_10995770589198_BB_9_15</node> </nodeList>
Fan
Request:
curl -X GET -k -i https://host/REST/fan/RCU_10995770589198_Fan_TRECS_1
Response:
<fan position="DENEB_1" installed="true" nominalSpeed="100" rpm="11760" health="OK" lastSensorUpdate="0" id="RCU_10995770589198_Fan_DENEB_1" />
Attributes:
Attribute | Description | Unit | Data type |
---|---|---|---|
position | Position of the fan | - | String |
installed | Indicates, if the fan is installed | - | Boolean |
nominalSpeed | Nominal speed of the fan | % | Integer |
rpm | Actual rotational speed of the fan | rpm | Integer |
health | Health status of the fan (OK, Warning, Critical) | - | String |
lastSensorUpdate | Timestamp of the last sensor update | ms | Long |
id | ID for referencing the component | - | String |
The API offers fanList, which returns a list of the IDs of all fans within the system.
Request:
curl -X GET -k -i https://host/REST/fan
Response:
<fanList> <fan>RCU_10995770589198_Fan_DENEB_1</fan> <fan>RCU_10995770589198_Fan_DENEB_2</fan> <fan>RCU_10995770589198_Fan_DENEB_3</fan> </fanList>
Endpoints
The resources are split into monitoring resources (for pure information gathering) and management resources (for changing the system configuration or state).
Monitoring
For monitoring the following resources are available:
Attribute | Description | HTTP Method |
---|---|---|
/rcu | Returns information about the RCU | GET |
/backplane | Returns a baseboardList with all backplane IDs of the RCU | GET |
/backplane/{backplane_id} | Returns information about the backplane with the given ID | GET |
/baseboard | Returns a baseboardList with all baseboard IDs of the RCU | GET |
/baseboard/{baseboard_id} | Returns information about the baseboard with the given ID | GET |
/baseboard/{baseboard_id}/node | Returns a nodeList with all node IDs that are installed on the baseboard with the given ID | GET |
/node | Returns a nodeList with all node IDs of the RCU | GET |
/node/{node_id} | Returns information about the node with the given ID | GET |
/fan | Returns a fanList with all fan IDs of the RCU | GET |
/fan/{fan_id} | Returns information about the fan with the given ID | GET |
Management
The management of individual components can be found under the “manage” path of the component.
Attribute | Description | HTTP method | Parameter |
---|---|---|---|
/node/{node_id}/manage/power_on | Turns on the node with the given ID and returns updated node | POST | |
/node/{node_id}/manage/power_button | Turns on/off the node with the given ID and returns updated node | POST | |
/node/{node_id}/manage/power_off | Turns off the node with the given ID and returns updated node | POST | |
/node/{node_id}/manage/reset | Resets the node with the given ID and returns updated node | POST | |
/node/{node_id}/manage/sleep | Sets the node with the given ID in sleep condition and returns updated node | POST | |
/node/{node_id}/manage/select_kvm | Switches the KVM port of the RCU to the node with the given ID and returns updated node | PUT | |
/node/{node_id}/manage/set_bootsource | Sets the boot source of the node with the given ID and returns updated node | PUT | source={NONE,HDD,CD,PXE,USBSTICK},persistent={true,false} |
/rcu/manage/set_fans | Sets the overall fan speed of the RCU and returns the current status of the RCU | PUT | percent={value} |
/rcu/manage/set_fan_profile | Sets the fan profile of the RCU and returns the current status of the RCU | PUT | profile={manual,auto} |
/fan/{fan_id} | Sets the speed of the fan with the given ID and returns the current status of the fan | PUT | percent={value} |
Errors
Information about the success or failure of management requests are returned via HTTP status codes. Please have a look at RFC2616 for an overview about the defined HTTP status codes.
Prometheus
A prometheus exporter is built-in and can be enabled. It is accessable at https://host/metrics/
or http://host/metrics/
and needs a http basic authentication.
The big advantage of the Prometheus exporter compared to other APIs is that it dynamically exports its own metrics and thus, additional metrics can be added or removed during runtime after changing or hotplugging hardware. This allows to export only metrics of those microservers that are plugged in. As the RECS has a modular approach and every RECS can be equipped with different carrier blades and microserver configurations, this approach is of high relevance. Using traditional monitoring tools that don’t support the export of dynamic metrics needs regular manual changes of the configuration files which is annoying.
Prometheus Configuration
Prometheus needs very little configuration to automatically parse all information and write it into a database. This makes all metrics easily accessible.
- job_name: 'RECS_Master' scrape_interval: 1s scrape_timeout: 1s static_configs: - targets: ['192.168.0.100'] basic_auth: username: 'user' password: 'password'
Grafana Dashboard
It is recommended to use Grafana as a graphical dashboard to read out these captured metrics. A pre-build Grafana dashboard is publicly available at https://grafana.com/grafana/dashboards/14622. It can be integrated in Grafana using the “Import” function. It automatically reads the available metrics from the database and dynamically adapts to the number of available microservers, see the following picture: