meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
doc_trecs:software_interface [2022/04/26 13:21] vordoc_trecs:software_interface [2023/11/28 08:54] (current) – [Components] bil
Line 1: Line 1:
 ====== Software interface ====== ====== Software interface ======
  
-There are several software interfaces available to monitor the status of the t.RECS<sup>(r)</sup> system. These are the Management Web****GUI and a REST API providing XML based monitoring and management functionality. +There are several software interfaces available to monitor the status of the t.RECS<sup>(r)</sup> system. These are the Management Web****GUI, a Redfish API and a proprietary REST API providing XML based monitoring and management functionality.
  
 ===== Management WebGUI ===== ===== Management WebGUI =====
Line 12: Line 12:
 |{{ :documentation:statuswarning.png?nolink |}} |Warnung. Something is wrong, but the system is still fully functional. The system has to be checked so the problem doesn't get worse. Indicated by a yellow line in a graph.| |{{ :documentation:statuswarning.png?nolink |}} |Warnung. Something is wrong, but the system is still fully functional. The system has to be checked so the problem doesn't get worse. Indicated by a yellow line in a graph.|
 |{{ :documentation:statuscritical.png?nolink |}} |Critical Error. The system must be checked immediately and maybe has to be shut down to prevent hardware damage. indicated by a red line in a graph.| |{{ :documentation:statuscritical.png?nolink |}} |Critical Error. The system must be checked immediately and maybe has to be shut down to prevent hardware damage. indicated by a red line in a graph.|
 +
 +On the left side is a menu, which can be toggled by clicking the menu button in the upper left corner of the screen. The menu contains the following items:
 +
 +[[doc_trecs:software_interface#Dashboard|Dashboard]]: General overview of the managed system, installed nodes and health status\\
 +[[doc_trecs:software_interface#Management|Management]]: Power control and monitoring for all nodes and fans\\
 +[[doc_trecs:software_interface#Network|Network]]: VLAN-Configuration and of management network\\
 +[[doc_trecs:software_interface#Composition|Composition]]: Configuration of PCIe resources\\
 +[[doc_trecs:software_interface#Users|Users]]: User management\\
 +[[doc_trecs:software_interface#Settings|Settings]]: System-wide configuration settings\\
 +[[doc_trecs:software_interface#Time|Time]]: System time settings\\
 +[[doc_trecs:software_interface#Firmware|Firmware]]: Firmware updates and overview of software versions\\
 +[[doc_trecs:software_interface#Logs|Logs]]: Logs from the management software about system health and java messages.\\
  
 ==== Dashboard ==== ==== Dashboard ====
  
-Figure 1 shows the Dashboard of the Management Web****GUI which is seen first when opening the Web****GUI. It is organized into three columns. The first is on the left-hand side and contains the following:+The Dashboard is seen first when opening the Web****GUI and displays the summarized system health status.
  
-[[documentation:software_interface#Overview|Overview:]] General overview of all managed RCU<sup></sup>s, RPU<sup></sup>s, installed nodes and health status\\ +<imgcaption Dashboard View|> 
-[[documentation:software_interface#Management|Management:]] Selection of every managed RCU and RPU in the rack with a sensor view button for the Arneb\\ +{{:doc_trecs:t.recs_dashboard.png?direct|Dashboard View}}</imgcaption>
-[[documentation:software_interface#Global settings|Global settings:]] IP filter and firmware update\\ +
-[[documentation:software_interface#Log Viewer|Log:]] Logs from the management software about system health and java messages. The logs can be downloaded as a zipfile\\+
  
-The second colum contains the buttons and sliders to manipulate the system. While the third colum is mostly for history information like power usage and temperature graphs.+==== Management ====
  
-==== Overview ====+In this view, nodes can be turned on or off with a quick menu, which opens when clicking on the gear symbol of a node.\\ 
 +Multiple nodes can be controled at once via the panel "Batch-Control Nodes".\\ 
 +The view also shows fan monitoring data and allows a detailed look at the temperature map of the system's baseboard.\\ 
 +By clicking on a node label, the respective [[doc_trecs:software_interface#Node Management|Node Management]] view is opened.\\ 
 +Furthermore the view displays the summarized system health status.
  
-All units that are installed in the rack and that are managed by the software are summarized on this page. +<imgcaption Management View|> 
-The total power usage is summed up over all managed units.+{{:doc_trecs:t.recs_management.png?direct|Management View}}</imgcaption>
  
-<imgcaption WebGUI Overview|> 
-{{:doc_trecs:t.recs_dashboard.png?direct&500 |Dashboard}}</imgcaption> 
  
-===Management ====+=== Node Management ===
  
-An overview of the selected unit can be seen in this tabThe fans can be regulated by dragging the slider to the desired percentageAnd multiple nodes can be selected. By klicking on a node the[[documentation:management#node management|Node management]] page of the node is shown.+This view features controlling the power state of the selected node and monitoring its detailed status values and graphs.\\ 
 +It is also possible to change KVM settings or open a console to the node.\\ 
 +If the node is running and the [[documentation:recsdaemon|RECSDeamon]] is installed on it, even more detailed data is shown.
  
-<imgcaption web-gui-rcu-overview|>{{ :documentation:web-gui-rcu-overview.jpg?direct&500 |Adds an ImageCaption tag}}</imgcaption>+<imgcaption Node Management View|> 
 +{{:doc_trecs:t.recs_node-management.png?direct|Node Management View}}</imgcaption>
  
-A quick menu to control a node can be opened by klicking on the gear next to an CXP node. In this menu the node can be switched on and off and the KVM can be switched to the node. 
  
-<imgcaption web-gui-node-control|>{{ :documentation:web-gui-node-control.jpg?nolink&300 |Management pop-pu for Apalis nodes}}</imgcaption>+=== Network ===
  
-<WRAP round tip> +The network view allows changing the settings of the managment port. This port is used to access the webinterface and all APIs.\\ 
-Apalis nodes do not show a management pop-up button due to size constraints.\\ Click on the node button while pressing the "Shift" key to open the management pop-up instead of navigating to the node view. +In addition to that, VLANs of the node network can be configured and assigned to the ports of the nodes and the backpanel.
-</WRAP>+
  
-<WRAP round tip+<imgcaption Network View|
-When pressing the "Shift" key while clicking, the "Select all" and "Select none" buttons select only nodes currently on or nodes currently off, respectively. +{{:doc_trecs:t.recs_network.png?direct|Network View}}</imgcaption>
-</WRAP>+
  
-=== Node management === 
  
-On this page the selected node can be controlled and detailed status values and graphs can be seen.\\ +=== Composition ===
-By klicking on the arrow, pointing downwards in the upper bar next to the nodename, the other nodes of the unit can be chosen.+
  
-<imgcaption web-gui-cxp-node-view|>{{ :documentation:web-gui-cxp-node-view.jpg?direct&500 |Node management}}</imgcaption>+This view allows the configuration of the PCIe resources in the form of composed nodes.\\ 
 +A composed node is a reserved bundle of resources, which utilize PCIe functions.\\ 
 +A wizard leads through the process of creating such composed nodes.
  
-==== Global settings ====+<imgcaption Composition View|> 
 +{{:doc_trecs:t.recs_composition.png?direct|Composition View}}</imgcaption>
  
-All IP<sup></sup>s that are allowed to access the Nagios interface have to be listet here.\\ 
-The firmware for the whole RECS<sup>(r)</sup>%%|%%Box can be uploaded here by klicking on the "Upload Firmware File" button and selecting the file. The update-process starts right after the file was uploaded.\\ 
-For the update process **all modules will be powered off!**\\ 
  
-<imgcaption web-gui-global_settings|>{{ :documentation:web-gui-global_settings.jpg?direct&500 |Golobal settings tab}}</imgcaption>+=== Users ===
  
-==== Log viewer ====+This view features the user management. Users can be created, edited or deleted.\\ 
 +Additionally, IPMI passwords can be set.
  
-In the system healths tab of the log page the status changes of the sensors, fan and boards can be seen.+<imgcaption Users View|> 
 +{{:doc_trecs:t.recs_users.png?direct|Users View}}</imgcaption>
  
-<imgcaption web-gui-log-standart_view|>{{ :documentation:web-gui-log-standart_view.jpg?direct&500 |System health log}}</imgcaption> 
  
-In the java tab of the log page all messages regarding the software can be found.+=== Settings ===
  
-<imgcaption web-gui-log-java_messages-view|>{{ :documentation:web-gui-log-java_messages-view.jpg?direct&500 |Java messages}}</imgcaption>+This view allows changing system-wide preferences (e.g. regarding the interfaces of the system).
  
 +<imgcaption Settings View|>
 +{{:doc_trecs:t.recs_settings.png?direct|Settings View}}</imgcaption>
 +
 +
 +=== Time ===
 +
 +Here, the system time can be set either manually or using NTP.
 +
 +<imgcaption Time View|>
 +{{:doc_trecs:t.recs_time.png?direct|Time View}}</imgcaption>
 +
 +
 +=== Firmware ===
 +
 +This view shows the currently installed versions of the firmware and management software.\\
 +Furthermore, it is possible to update those software components.
 +
 +<imgcaption Firmware View|>
 +{{:doc_trecs:t.recs_firmware.png?direct|Firmware View}}</imgcaption>
 +
 +
 +=== Logs ===
 +
 +In the System Events tab of this view, the status changes of the sensors, fan and boards can be seen.\\
 +In the Java Messages tab , all messages regarding the software can be found.\\
 Several filters can be set for both tabs at the top.\\ Several filters can be set for both tabs at the top.\\
-At the bottom the whole log can be downloaded as a ZIP file containing the individual logfiles.+The whole log can be downloaded as a ZIP file containing the individual logfiles. 
 + 
 +<imgcaption Logs View|> 
 +{{:doc_trecs:t.recs_logs.png?direct|Logs View}}</imgcaption> 
  
 ===== Redfish API ===== ===== Redfish API =====
  
-The documentation of the RECS<sup>(r)</sup>%%|%%Box Redfish API can be seen at [[https://christmann.github.io/recs-redfish-api/index.html|Github]].+The management software also features a Redfish API.\\ 
 +The documentation can be seen at [[https://christmann.github.io/recs-redfish-api/index.html|Github]].
  
 ===== REST API ===== ===== REST API =====
Line 86: Line 127:
 ==== Access ==== ==== Access ====
  
-The RECS<sup>(r)</sup>%%|%%Box Management API is accessible via the IP-Address or the hostname of the TOR-Master of the cluster. The basic URL of the API has the format ''https://TOR-Master/REST/'' or ''http://TOR-Master/REST/''.+The REST API is accessible via the management IP-Address or the hostname of the system. The basic URL of the API has the format ''https://host/REST/''.
  
 Accessing the REST API requires HTTP Basic authentication. The authenticated user has to be in the "Admin" or "User" group to be able to execute the POST/PUT management calls. Accessing the REST API requires HTTP Basic authentication. The authenticated user has to be in the "Admin" or "User" group to be able to execute the POST/PUT management calls.
Line 92: Line 133:
 ==== Components ==== ==== Components ====
  
-The RECS<sup>(r)</sup>%%|%%Box Management API makes all hardware components in the cluster available as XML trees in software. The following components are supported by the API: \\+The REST API makes all hardware components in the cluster available as XML trees in software. The following components are supported by the API: \\
  
 ^ Attribute ^ Description ^ ^ Attribute ^ Description ^
-|''node'' | A single node| +|''rcu'' |A RECS Computing Unit (RCU) represents the overall system|
-|''backplane'' |A backplane can be equipped with zero or more baseboards|+
 |''baseboard'' |A baseboard can be equipped with zero or more nodes| |''baseboard'' |A baseboard can be equipped with zero or more nodes|
-|''rcu'' |A RECS<sup>(r)</sup>%%|%%Box Computing Unit (RCU) can be equipped with zero or more baseboards| +|''node'' |A single node|
-|''rack'' |A rack consists of several RCU****s|+
  
-Many resources also return lists of components. These are named according to the scheme <component name>List (e.g. nodeList, rcuList) and contain the elements of the list.+=== RCU ===
  
-Example of a backplaneList: +The main entrypoint of this API is the RECS Computing Unit (RCU).
- +
-<code xml><backplaneList> +
-<backplane position="1" id="RCU_84055620466592_BP_1" infrastructurePower="0.0"  +
-lastSensorUpdate="1465470151268"> +
-<temperatures>24.0</temperatures> +
-<temperatures>25.0</temperatures> +
-<temperatures>26.0</temperatures> +
-<temperatures>27.0</temperatures> +
-<temperatures>28.0</temperatures> +
-</backplane> +
-</backplaneList></code> +
- +
-=== Node ===+
  
-Example XML:+Request: 
 +<code bash> 
 +curl -X GET -k -i https://host/REST/rcu 
 +</code>
  
-<code xml><node baseboardPosition="0maxPowerUsage="44actualNodePowerUsage="32.426884399865166 +Response: 
-actualPEGPowerUsage="15.12053962324833actualPowerUsage="47.54742402311349architecture="x86"  +<code xml> 
-baseboardId="RCU_84055620466592_BB_1" health="OK" id="RCU_84055620466592_BB_1_0inletTemperature="20.0"  +<rcu name="RCUMaster (192.168.XX.YY)fanSpeed="100fanProfile="Manualhealth="OK" ip="192.168.XX.YYkvmNode="RCU_10995770589198_BB_1_0" lastSensorUpdate="1700812747947" type="t.RECSid="RCU_10995770589198"> 
-lastSensorUpdate="1465470151268macAddressCompute="70:b3:d5:56:40:48outletTemperature="20.0state="1"  +  <temperature> 
-highestTemperature="20.0" voltage="12.072700851453936"/></code>+    <sensor name="Node average temperature" unit="°C" health="OK">35.60748455616409</sensor> 
 +    <sensor name="Node highest temperatureunit="°Chealth="OK">35.60748455616409</sensor> 
 +    <sensor name="RCU infrastructure highest temperatureunit="°Chealth="OK">43.02157752743136</sensor> 
 +  </temperature> 
 +  <baseboard>RCU_10995770589198_BB_1</baseboard> 
 +  <fan>RCU_10995770589198_Fan_TRECS_1</fan> 
 +  <fan>RCU_10995770589198_Fan_TRECS_2</fan> 
 +  <fan>RCU_10995770589198_Fan_TRECS_3</fan> 
 +  <fan>RCU_10995770589198_Fan_TRECS_4</fan> 
 +  <fan>RCU_10995770589198_Fan_TRECS_5</fan> 
 +  <fan>RCU_10995770589198_Fan_TRECS_6</fan> 
 +  <node>RCU_10995770589198_BB_1_0</node> 
 +  <node>RCU_10995770589198_BB_1_1</node> 
 +  <node>RCU_10995770589198_BB_1_2</node> 
 +  <power> 
 +    <sensor name="RCU total power usage" unit="Whealth="OK">50.67613209098747</sensor> 
 +    <sensor name="RCU infrastructure power usage" unit="W" health="OK">0.0</sensor> 
 +    <sensor name="RCU power usage (Node)" unit="Whealth="OK">50.67613209098747</sensor> 
 +  </power> 
 +</rcu> 
 +</code>
  
-The following table shows the possible attributes (some are optional) and their meaning: \\+Attributes: \\
  
 ^ Attribute ^ Description ^ Unit ^ Data type ^ ^ Attribute ^ Description ^ Unit ^ Data type ^
-|''id''|Unique ID for referencing the component|-|String| +|''name'' |Name of the RCU|-|String| 
-|''actualPowerUsage'' |Actual power consumption of a node (Node + PEG)|W|Double| +|''fanSpeed'' |Current speed setting of the fans in the RCU|%|Integer| 
-|''actualNodePowerUsage'' |Actual power consumption of a node (Node only)|W|Double| +|''fanProfile'' |Current fan profileof the RCU|%|Integer| 
-|''actualPEGPowerUsage'' |Actual power consumption of a PEG card|W|Double| +|''health'' |Health status of the RCU (OK, Warning, Critical)|-|String| 
-|''maxPowerUsage'' |Maximum power the node can draw|W|Integer| +|''ip'' |IP address of the RCU|-|String
-|''baseboardId'' |ID of the baseboard which hosts the node|-|String| +|''kvmNode'' |ID of the node to which the KVM system is switched (optional)|-|String|
-|''baseboardPosition'' |Position of the node on the baseboard|-|Integer| +
-|''state'' |Power state of the node (0=Off, 1=On, 2=Soft-off, 3=Standby, 4=Hibernate)|-|Integer+
-|''architecture'' |Architecture (x86, arm, UNKNOWN)|-|String+
-|''health'' |Health status of the node (OK, Warning, Critical)|-|String| +
-|''inletTemperature'' |Temperature of the inlet air|°C|Double+
-|''outletTemperature'' |Temperature of the outlet air|°C|Double| +
-|''highestTemperature'' |Highest temperature measured on the node's baseboard|°C|Double| +
-|''voltage'' |Supply voltage of the baseboard|V|Double|+
 |''lastSensorUpdate'' |Timestamp of the last sensor update|ms|Long| |''lastSensorUpdate'' |Timestamp of the last sensor update|ms|Long|
-|''macAddressCompute'' |MAC address of the NIC connected to the compute network (optional)|-|String| +|''type'' |Type of the RCU|-|String| 
-|''macAddressMgmt'' |MAC address of the NIC connected to the management network (optional)|-|String|+|''id'' |ID for referencing the component|-|String|
  
-In accordance to the component node the API offers nodeList which returns multiple instances of node.+Nested elements: \\
  
-=== Backplane ===+^ Element ^ Description ^ Unit ^ Data type ^ 
 +|''temperature'' |List of temperature sensors|°C|Double| 
 +|''baseboard'' |ID of the baseboard which is installed in the RCU|-|String| 
 +|''fan'' |ID****s of fans, which are installed in the RCU|-|String| 
 +|''node'' |ID****s of nodes, which are installed in the RCU|-|String| 
 +|''power'' |List of power sensors|W|Double| 
 + 
 + 
 +=== Baseboard ===
  
-Example XML:+Request: 
 +<code bash> 
 +curl -X GET -k -i https://host/REST/baseboard/RCU_10995770589198_BB_1 
 +</code>
  
-<code xml><backplane position="1" id="RCU_84055620466592_BP_1infrastructurePower="0.0"  +Response: 
-lastSensorUpdate="1465470151268"> +<code xml> 
-<temperatures>24.0</temperatures+<baseboard type="t.RECS" expansionBoardInserted="false" health="OK" lastSensorUpdate="1700825809191" rcuPosition="1" id="RCU_10995770589198_BB_1"
-<temperatures>25.0</temperatures+  <fan>RCU_10995770589198_Fan_TRECS_1</fan> 
-<temperatures>26.0</temperatures+  <fan>RCU_10995770589198_Fan_TRECS_2</fan> 
-<temperatures>27.0</temperatures+  <fan>RCU_10995770589198_Fan_TRECS_3</fan> 
-<temperatures>28.0</temperatures+  <fan>RCU_10995770589198_Fan_TRECS_4</fan> 
-</backplane></code>+  <fan>RCU_10995770589198_Fan_TRECS_5</fan> 
 +  <fan>RCU_10995770589198_Fan_TRECS_6</fan> 
 +  <node>RCU_10995770589198_BB_1_0</node> 
 +  <node>RCU_10995770589198_BB_1_1</node> 
 +  <node>RCU_10995770589198_BB_1_2</node> 
 +  <power> 
 +    <sensor name="Baseboard infrastructure power" unit="W" health="OK">0,00</sensor> 
 +    <sensor name="Baseboard power usage (Node + PEG)" unit="W" health="OK">120.57697622394205</sensor> 
 +    <sensor name="Baseboard power usage (Node)unit="W" health="OK">120.57697622394205</sensor
 +  </power> 
 +  <temperature> 
 +    <sensor name="Baseboard temp0" unit="°C" health="OK">23,0</sensor
 +    <sensor name="Baseboard temp. 1" unit="°C" health="OK">25,1</sensor> 
 +    <sensor name="Baseboard temp2" unit="°C" health="OK">20,0</sensor
 +    <sensor name="Baseboard temp. 3" unit="°C" health="OK">48,8</sensor> 
 +    <sensor name="Baseboard temp4" unit="°C" health="OK">23,0</sensor
 +    <sensor name="Baseboard temp. 5" unit="°C" health="OK">22,2</sensor> 
 +    <sensor name="Baseboard temp6" unit="°C" health="OK">20,0</sensor
 +    <sensor name="Baseboard temp. 7 (PCIe-Switch)" unit="°C" health="OK">40,2</sensor> 
 +    <sensor name="Baseboard temp8 (Ethernet-Switch)" unit="°C" health="OK">43,1</sensor
 +  </temperature> 
 +  <voltage> 
 +    <sensor name="Baseboard voltage (12 V Node 1)" unit="V" health="OK">12,05</sensor> 
 +    <sensor name="Baseboard voltage (12 V Node 2)" unit="V" health="OK">12,14</sensor> 
 +    <sensor name="Baseboard voltage (12 V Node 3)" unit="V" health="OK">12,12</sensor> 
 +  </voltage> 
 +</baseboard> 
 +</code>
  
-The attributes have the following meaning: \\+Attributes: \\
  
 ^ Attribute ^ Description ^ Unit ^ Data type ^ ^ Attribute ^ Description ^ Unit ^ Data type ^
-|''id'' |Unique ID for referencing the component|-|String| +|''type'' |Type of the baseboard|-|String| 
-|''position'' |Position of the backplane in the RECS<sup>(r)</sup>%%|%%Box Computing Unit|-|Integer+|''expansionBoardInserted'' |Indicates, if an expansion board is available|-|Boolean
-|''infrastructurePower'' |Power usage of the infrastructure components on the backplane|W|Double|+|''health'' |Health status of the baseboard (OK, Warning, Critical)|-|String|
 |''lastSensorUpdate'' |Timestamp of the last sensor update|ms|Long| |''lastSensorUpdate'' |Timestamp of the last sensor update|ms|Long|
-|''temperatures'' |List of temperatures measured on the backplane|°C|Double|+|''rcuPosition'' |Position of the baseboard inside the RCU|-|Integer| 
 +|''id'' |ID for referencing the component|-|String|
  
-In accordance to the component backplane the API offers backplaneList which returns multiple instances of backplane.+Nested elements: \\
  
-=== Baseboard ===+^ Element ^ Description ^ Unit ^ Data type ^ 
 +|''fan'' |ID****s of fans, which are associated to the baseboard|-|String| 
 +|''node'' |ID****s of nodes, which are installed on the baseboard|-|String| 
 +|''power'' |List of power sensors|W|Double 
 +|''temperature'' |List of temperature sensors|°C|Double| 
 +|''voltage'' |List of voltage sensors|V|Double|
  
-Example XML: 
  
-<code xml><baseboard rcuPosition="6" baseboardType="APLS" id="RCU_84055620466592_BB_6" infrastructurePower="9.8"  +=== Node ===
-lastSensorUpdate="1465470151268" rcuId="RCU_84055620466592"> +
-<nodeId>RCU_84055620466592_BB_6_1</nodeId> +
-<nodeId>RCU_84055620466592_BB_6_2</nodeId> +
-<nodeId>RCU_84055620466592_BB_6_3</nodeId> +
-<temperatures>20.0</temperatures> +
-<temperatures>20.0</temperatures> +
-<temperatures>20.0</temperatures> +
-<temperatures>20.0</temperatures> +
-<temperatures>20.0</temperatures> +
-</baseboard></code>+
  
-The attributes have the following meaning: \\+Request: 
 +<code bash> 
 +curl -X GET -k -i https://host/REST/node/RCU_10995770589198_BB_1_0 
 +</code> 
 + 
 +Response: 
 +<code xml> 
 +<node baseboardPosition="0" health="OK" lastSensorUpdate="1700825860193" name="Node 1" type="COM-HPC Server" maxPowerUsage="44" powerState="On" id="RCU_10995770589198_BB_1_0"> 
 +  <baseboard>RCU_10995770589198_BB_1</baseboard> 
 +  <deamon/> 
 +  <power> 
 +    <sensor name="Overall Node 1 power" unit="W" health="OK">121.1986045856893</sensor> 
 +    <sensor name="Node 1 power" unit="W" health="OK">121,2</sensor> 
 +  </power> 
 +  <processor instructionSet="x86-64" architecture="x86" type="CPU" cores="4" threads="8" maxSpeedMHz="2800" manufacturer="Intel" model="Xeon E3-1505M v5" /> 
 +  <temperature> 
 +    <sensor name="Node 1 inlet temperature" unit="°C" health="OK">23.0416142616874</sensor> 
 +    <sensor name="Node 1 outlet temperature" unit="°C" health="OK">48.733755007535635</sensor> 
 +  </temperature> 
 +  <voltage> 
 +    <sensor name="Baseboard voltage (12 V Node 1)" unit="V" health="OK">12,09</sensor> 
 +  </voltage> 
 +</node> 
 +</code> 
 + 
 +Attributes: \\
  
 ^ Attribute ^ Description ^ Unit ^ Data type ^ ^ Attribute ^ Description ^ Unit ^ Data type ^
-|''id'' |Unique ID for referencing the component|-|String| +|''baseboardPosition'' |Position of the node on the baseboard|-|Integer
-|''rcuId'' |Unique ID of the RECS<sup>(r)</sup>%%|%%Box Computing Unit hosting the baseboard|-|String+|''health'' |Health status of the node (OK, Warning, Critical)|-|String|
-|''rcuPosition'' |Position of the baseboard inside the RECS<sup>(r)</sup>%%|%%Box Computing Unit|-|Integer| +
-|''infrastructurePower'' |Power usage of the infrastructure components on the baseboard|W|Double|+
 |''lastSensorUpdate'' |Timestamp of the last sensor update|ms|Long| |''lastSensorUpdate'' |Timestamp of the last sensor update|ms|Long|
-|''baseboardType'' |Type of the baseboard (CXPAPLS)|-|String| +|''name'' |Name of the node|-|String| 
-|''nodeId'' |List of ID****s of the nodes installed on the baseboard|-|String| +|''type'' |Type of the node|-|String| 
-|''temperatures'' |List of temperatures measured on the backplane|°C|Double|+|''maxPowerUsage'' |Maximum power the node can draw|W|Integer| 
 +|''powerState'' |Power state of the node (OffOn, Soft-off, Standby, Hibernate)|-|String| 
 +|''id''|ID for referencing the component|-|String| 
 +|''macAddressCompute'' |MAC address of the NIC connected to the compute network (optional)|-|String| 
 +|''macAddressMgmt'' |MAC address of the NIC connected to the management network (optional)|-|String|
  
-In accordance to the component baseboard the API offers baseboardList which returns multiple instances of baseboard.+Nested elements: \\
  
-=== RCU ===+^ Element ^ Description ^ Unit ^ Data type ^ 
 +|''baseboard'' |ID of the baseboard hosting the node|-|String| 
 +|''deamon'' |List of deamon sensors (optional)|-|Mixed| 
 +|''power'' |List of power sensors|W|Double| 
 +|''processor'' |List of processors of this node with detailed information|-|-| 
 +|''temperature'' |List of temperature sensors|°C|Double| 
 +|''voltage'' |List of voltage sensors|V|Double|
  
-Example XML:+The API offers nodeList, which returns a list of the IDs of all nodes within the system.
  
-<code xml><rcu rcuType="ANTARES" fanSpeed="60" fanProfile="adjust_by_temperature" rackId="RCK_1" name="RECSMaster (RCU) on 192.168.56.195" rackPosition="0" id="RCU_84055620466592" lastSensorUpdate="1465470151268"+Request: 
-<backplaneId>RCU_84055620466592_BP_1</backplaneId> +<code bash
-<baseboardId>RCU_84055620466592_BB_1</baseboardId> +curl -X GET -k -i https://host/REST/node 
-<baseboardId>RCU_84055620466592_BB_2</baseboardId> +</code>
-<baseboardId>RCU_84055620466592_BB_3</baseboardId> +
-<baseboardId>RCU_84055620466592_BB_4</baseboardId> +
-<baseboardId>RCU_84055620466592_BB_5</baseboardId> +
-<baseboardId>RCU_84055620466592_BB_6</baseboardId> +
-</rcu></code>+
  
-The attributes have the following meaning\\+Response: 
 +<code xml> 
 +<nodeList> 
 +  <node>RCU_10995770589198_BB_1_0</node> 
 +  <node>RCU_10995770589198_BB_1_1</node> 
 +  <node>RCU_10995770589198_BB_1_2</node> 
 +</nodeList> 
 +</code>
  
-^ Attribute ^ Description ^ Unit ^ Data type ^ 
-|''id'' |Unique ID for referencing the component|-|String| 
-|''rackId'' |ID of the rack which hosts the RECS<sup>(r)</sup>%%|%%Box Computing Unit|-|String| 
-|''rackPosition'' |Position of the RECS<sup>(r)</sup>%%|%%Box Computing Unit in the rack|-|Integer| 
-|''name'' |Name of the RECS<sup>(r)</sup>%%|%%Box Computing Unit|-|String| 
-|''ip'' |IP address of the RECS<sup>(r)</sup>%%|%%Box Computing Unit|-|String| 
-|''rcuType'' |Type of the RECS<sup>(r)</sup>%%|%%Box Computing Unit (SIRIUS, ARNEB, ANTARES)|-|String| 
-|''kvmNode'' |ID of the node to which the KVM system is switched (optional)|-|String| 
-|''fanSpeed'' |Current speed setting of the fans in the RECS<sup>(r)</sup>%%|%%Box Computing Unit|%|Integer| 
-|''fanProfile'' |Current fan profileof the RECS<sup>(r)</sup>%%|%%Box Computing Unit|%|Integer| 
-|''lastSensorUpdate'' |Timestamp of the last sensor update|ms|Long| 
-|''backplaneId'' |List of ID****s of backplanes which are installed in the RECS<sup>(r)</sup>%%|%%Box Computing Unit|-|String| 
-|''baseboardId'' |List of ID****s of baseboards which are installed in the RECS<sup>(r)</sup>%%|%%Box Computing Unit|-|String| 
- 
-In accordance to the component rcu the API offers rcuList which returns multiple instances of rcu. 
  
-=== Rack ===+=== Fan ===
  
-Example XML:+Request: 
 +<code bash> 
 +curl -X GET -k -i https://host/REST/fan/RCU_10995770589198_Fan_TRECS_1 
 +</code>
  
-<code xml><rack description="Default rack" id="RCK_1"+Response: 
-<rcuId>RCU_84055620466592</rcuId+<code xml> 
-</rack></code>+<fan position="TRECS_1" installed="true" nominalSpeed="100" rpm="11766" health="OK" lastSensorUpdate="0" id="RCU_10995770589198_Fan_TRECS_1" /> 
 +</code>
  
-The attributes have the following meaning: \\+Attributes: \\
  
 ^ Attribute ^ Description ^ Unit ^ Data type ^ ^ Attribute ^ Description ^ Unit ^ Data type ^
-|''id'' |Unique ID for referencing the component|-|String| +|''position'' |Position of the fan|-|String| 
-|''description ''|Description of the rack|-|String+|''installed'' |Indicates, if the fan is installed|-|Boolean
-|''rcuId ''|List of ID****s of RECS<sup>(r)</sup>%%|%%Box Computing Units which are installed in the rack|-|String|+|''nominalSpeed'' |Nominal speed of the fan|%|Integer| 
 +|''rpm'' |Actual rotational speed of the fan|rpm|Integer| 
 +|''health'' |Health status of the fan (OK, Warning, Critical)|-|String| 
 +|''lastSensorUpdate'' |Timestamp of the last sensor update|ms|Long| 
 +|''id''|ID for referencing the component|-|String|
  
-In accordance to the component rack the API offers rackList which returns multiple instances of rack.+The API offers fanList, which returns a list of the IDs of all fans within the system.
  
-==== Resources ====+Request: 
 +<code bash> 
 +curl -X GET -k -i https://host/REST/fan 
 +</code> 
 + 
 +Response: 
 +<code xml> 
 +<fanList> 
 +  <fan>RCU_10995770589198_Fan_TRECS_1</fan> 
 +  <fan>RCU_10995770589198_Fan_TRECS_2</fan> 
 +  <fan>RCU_10995770589198_Fan_TRECS_3</fan> 
 +  <fan>RCU_10995770589198_Fan_TRECS_4</fan> 
 +  <fan>RCU_10995770589198_Fan_TRECS_5</fan> 
 +  <fan>RCU_10995770589198_Fan_TRECS_6</fan> 
 +</fanList> 
 +</code> 
 + 
 + 
 +==== Endpoints ====
  
 The resources are split into monitoring resources (for pure information gathering) and management resources (for changing the system configuration or state). The resources are split into monitoring resources (for pure information gathering) and management resources (for changing the system configuration or state).
Line 261: Line 381:
  
 ^ Attribute ^ Description ^ HTTP Method ^ ^ Attribute ^ Description ^ HTTP Method ^
-|''/node'' |Returns a nodeList with all nodes of the cluster|GET| +|''/rcu'' |Returns information about the RCU|GET| 
-|''/node/{node_id}'' |Returns information about the node with the given ID|GET| +|''/baseboard'' |Returns a baseboardList with all baseboard IDs of the RCU|GET|
-|''/baseboard'' |Returns a baseboardList with all baseboards of the cluster|GET|+
 |''/baseboard/{baseboard_id}'' |Returns information about the baseboard with the given ID|GET| |''/baseboard/{baseboard_id}'' |Returns information about the baseboard with the given ID|GET|
-|''/baseboard/{baseboard_id}/node'' |Returns a nodeList with all nodes that are installed on the baseboard with the given ID|GET| +|''/baseboard/{baseboard_id}/node'' |Returns a nodeList with all node IDs that are installed on the baseboard with the given ID|GET| 
-|''/backplane'' |Returns a backplaneList with all backplanes of the cluster|GET| +|''/node'' |Returns a nodeList with all node IDs of the RCU|GET| 
-|''/backplane/{backplane_id}'' |Returns information about the backplane with the given ID|GET| +|''/node/{node_id}'' |Returns information about the node with the given ID|GET| 
-|''/rcu'' |Returns an rcuList with all RECS<sup>(r)</sup>%%|%%Box Computing Units of the cluster|GET| +|''/fan'' |Returns a fanList with all fan IDs of the RCU|GET| 
-|''/rcu/{rcu_id}'' |Returns information about the RECS<sup>(r)</sup>%%|%%Box Computing Unit with the given ID|GET| +|''/fan/{fan_id}'' |Returns information about the fan with the given ID|GET|
-|''/rcu/{rcu_id}/baseboard'' |Returns a baseboardList with all baseboards that are installed in the RECS<sup>(r)</sup>%%|%%Box Computing Unit with the given ID|GET| +
-|''/rcu/{rcu_id}/backplane'' |Returns a backplaneList with all backplanes that are installed in the RECS<sup>(r)</sup>%%|%%Box Computing Unit with the given ID|GET| +
-|''/rcu/{rcu_id}/node'' |Returns a nodeList with all nodes that are installed in the RECS<sup>(r)</sup>%%|%%Box Computing Unit with the given ID|GET| +
-|''/rack'' |Returns a rackList with all racks of the cluster|GET| +
-|''/rack/{rack_id}'' |Returns information about the rack with the given ID|GET| +
-|''/rack/{rack_id}/rcu'' |Returns a rcuList with all RECS<sup>(r)</sup>%%|%%Box Computing Units that are installed in the rack with the given ID|GET|+
  
 === Management === === Management ===
Line 282: Line 395:
  
 ^ Attribute ^ Description ^ HTTP method ^ Parameter ^ ^ Attribute ^ Description ^ HTTP method ^ Parameter ^
-|''/node/{node_id}/manage/power_on'' |Turns on the node with the given ID and returns updated node XML|POST| | +|''/node/{node_id}/manage/power_on'' |Turns on the node with the given ID and returns updated node|POST| | 
-|''/node/{node_id}/manage/power_off'' |Turns off the node with the given ID and returns updated node XML|POST| | +|''/node/{node_id}/manage/power_button'' |Turns on/off the node with the given ID and returns updated node|POST| | 
-|''/node/{node_id}/manage/reset'' |Resets the node with the given ID and returns updated node XML|POST| | +|''/node/{node_id}/manage/power_off'' |Turns off the node with the given ID and returns updated node|POST| | 
-|''/node/{node_id}/manage/select_kvm'' |Switches the KVM port of the RECS<sup>(r)</sup>%%|%%Box Computing Unit containing the node to the node with the given ID and returns updated node XML|PUT| | +|''/node/{node_id}/manage/reset'' |Resets the node with the given ID and returns updated node|POST| | 
-|''/rcu/{rcu_id}/manage/set_fans'' |Sets the overall fan speed of the RCU with the given ID and returns the curent status of the RCU|PUT|percent={value}| +|''/node/{node_id}/manage/sleep'' |Sets the node with the given ID in sleep condition and returns updated node|POST| | 
-|''/rcu/{rcu_id}/manage/set_fan_profile'' |Sets the fan profile of the RCU with the given ID and returns the curent status of the RCU (Possible values: manual, increase_by_temperature, adjust_by_temperature)|PUT|percent={value}|+|''/node/{node_id}/manage/select_kvm'' |Switches the KVM port of the RCU to the node with the given ID and returns updated node|PUT| | 
 +|''/node/{node_id}/manage/set_bootsource'' |Sets the boot source of the node with the given ID and returns updated node|PUT|source={NONE,HDD,CD,PXE,USBSTICK},persistent={true,false}| 
 +|''/rcu/manage/set_fans'' |Sets the overall fan speed of the RCU and returns the current status of the RCU|PUT|percent={value}| 
 +|''/rcu/manage/set_fan_profile'' |Sets the fan profile of the RCU and returns the current status of the RCU|PUT|profile={manual,auto}| 
 +|''/fan/{fan_id}'' |Sets the speed of the fan with the given ID and returns the current status of the fan|PUT|percent={value}|
  
 === Errors === === Errors ===
Line 295: Line 412:
 ===== Prometheus ===== ===== Prometheus =====
  
-A prometheus exporter is built-in and can be enabled. It is accessable at ''https://TOR-Master/metrics/'' or ''http://TOR-Master/metrics/'' and needs a http basic authentication. +A prometheus exporter is built-in and can be enabled. It is accessable at ''https://host/metrics/'' or ''http://host/metrics/'' and needs a http basic authentication. 
  
-The big advantage of the Prometheus exporter compared to other APIs is that it dynamically exports its own metrics and thus, additional metrics can be added or removed during runtime after changing or hotplugging hardware. This allows to export only metrics of those microservers that are plugged in. As the RECS<sup>(r)</sup>%%|%%Box has a modular approach and every RECS<sup>(r)</sup>%%|%%Box can be equipped with different carrier blades and microserver configurations, this approach is of high relevance. Using traditional monitoring tools that don’t support the export of dynamic metrics needs regular manual changes of the configuration files which is annoying. +The big advantage of the Prometheus exporter compared to other APIs is that it dynamically exports its own metrics and thus, additional metrics can be added or removed during runtime after changing or hotplugging hardware. This allows to export only metrics of those microservers that are plugged in. As the RECS has a modular approach and every RECS can be equipped with different carrier blades and microserver configurations, this approach is of high relevance. Using traditional monitoring tools that don’t support the export of dynamic metrics needs regular manual changes of the configuration files which is annoying. 
  
 ==== Prometheus Configuration ==== ==== Prometheus Configuration ====