OVH Monitoring

Система мониторинга пингует все время все серверы в каких он включен.
Допустим на сервере перестал работать блок питания и сервер выключился.

Когда Сервер ушел в offline приходит email следующего содержания:
Dear Customer,

Our monitoring system has just detected a fault on your server ns3068700.ip-137-74-4.eu.
The fault was noticed on 2017-09-05 19:46:05

Our team of technicians on site (operational 24/7), has been informed
of the fault and will intervene on your machine.

Please be aware that other interventions may currently be in progress and
an intervention lasts on average 30 minutes per machine.

We are therefore not able to give you more details on the starting time
of the intervention.

You can see a general display of the machines currently in fault and
in intervention across our network at the following address:

status.ovh.ie/vms/

Your server is in rack W01B12

You will receive an email as soon as a technician takes charge of your
server. Meanwhile, you have can reboot it via your manager.

Logs:
— PING ns3068700.ip-137-74-4.eu (137.74.4.43) from 213.186.33.13: 56(84) bytes of data.
From 213.186.33.13: Destination Host Unreachable
From 213.186.33.13: Destination Host Unreachable
From 213.186.33.13: Destination Host Unreachable

— 137.74.4.43 ping statistics — 10 packets transmitted, 0 packets received, +6 errors, 100% packet loss
---------------------

Создается задание сотруднику дата-центра на исправление проблемы.
После работы он пишет отчет:

Dear Customer,

The intervention on ns3068700.ip-137-74-4.eu has been completed.

This operation was closed at 2017-09-05 20:07:26

Here are the details of this operation:
Power Suply replacement
Date 2017-09-05 19:59:10, damian Z made Power Suply replacement:
Diagnosis:
HS power

Actions:
Replacing the power supply. Server restart.

result:
Boot OK. Server on login screen. Ping OK, services started.

И сервер снова online. Это и есть как раз экономия времени.

Сервер работал, но вылетел сетевой коннектор и пропал доступ к серверу через интернет:
Dear Customer,

Our monitoring system has just detected a fault on your server ns300584.ip-91-121-64.eu.
The fault was noticed on 2017-07-09 18:56:07

Our team of technicians on site (operational 24/7), has been informed
of the fault and will intervene on your machine.

Please be aware that other interventions may currently be in progress and
an intervention lasts on average 30 minutes per machine.

We are therefore not able to give you more details on the starting time
of the intervention.

You can see a general display of the machines currently in fault and
in intervention across our network at the following address:

status.ovh.ie/vms/

Your server is in rack 05C03

You will receive an email as soon as a technician takes charge of your
server. Meanwhile, you have can reboot it via your manager.

Logs:
— PING ns300584.ip-91-121-64.eu (91.121.64.171) from 213.186.33.13: 56(84) bytes of data.
From 213.186.33.13: Destination Host Unreachable
From 213.186.33.13: Destination Host Unreachable
From 213.186.33.13: Destination Host Unreachable

— 91.121.64.171 ping statistics — 10 packets transmitted, 0 packets received, +6 errors, 100% packet loss
---------------------

Сотрудник починил и написал отчет:
Dear Customer,

The intervention on ns300584.ip-91-121-64.eu has been completed.

This operation was closed at 2017-07-09 19:10:46

Here are the details of this operation:
Network connector
Date 2017-07-09 19:09:29, marc S made Network connector:
Serveur on login whithout ping.

Up/Down eth0.

Server on login, services started.
Ping ok.

Допустим еще ситуацию когда сервер был отправлен в reboot с консоли и долго перезагружается, больше 5 минут например.

Приходит уведомление что сервер не доступен:
Dear Customer,

Our monitoring system has just detected a fault on your server ns3076628.ip-164-132-207.eu.
The fault was noticed on 2017-10-12 14:14:05

Our team of technicians on site (operational 24/7), has been informed
of the fault and will intervene on your machine.

Please be aware that other interventions may currently be in progress and
an intervention lasts on average 30 minutes per machine.

We are therefore not able to give you more details on the starting time
of the intervention.

You can see a general display of the machines currently in fault and
in intervention across our network at the following address:

status.ovh.ie/vms/

Your server is in rack G137B10

You will receive an email as soon as a technician takes charge of your
server. Meanwhile, you have can reboot it via your manager.

Logs:
— PING ns3076628.ip-164-132-207.eu (164.132.207.51) from 213.186.33.13: 56(84) bytes of data.
From 213.186.33.13: Destination Host Unreachable
From 213.186.33.13: Destination Host Unreachable
From 213.186.33.13: Destination Host Unreachable

— 164.132.207.51 ping statistics — 10 packets transmitted, 0 packets received, +6 errors, 100% packet loss
---------------------

И если он заработал, то приходит уведомление что сервер онлайн и снимается задание у сотрудника на проверку сервера:
Dear Customer,

On 2017-10-12 14:14:05, we noticed a fault on your server and we have
scheduled an intervention in order to fix this fault.

However, on 2017-10-12 14:27:08 our monitoring system did not detect any fault on
your dedicated server ns3076628.ip-164-132-207.eu

We did not intervene on your machine. We do not know the origin of this
fault.

The scheduled intervention has been cancelled from our list.

Logs:
— PING ns3076628.ip-164-132-207.eu (164.132.207.51) from 213.186.33.13: 56(84) bytes of data.
From 213.186.33.13: Host is alive
From 213.186.33.13: Host is alive
From 213.186.33.13: Host is alive

— 164.132.207.51 ping statistics — 10 packets transmitted, 10 packets received, 0 errors, 0% packet loss
---------------------

Еще ситуация когда пользователь сервера испортил конфигурацию ОС и сервер перестал работать.

Уведомление на почту о проверке сервера:
Dear Customer,

Please note that our technical teams will intervene on your server ns3053426.ip-164-132-205.eu in 15 minutes in order to carry out the following intervention:

Diagnosis interface boot (rescue)

Отчет:
Dear Customer,

The intervention on ns3053426.ip-164-132-205.eu has been completed.

This operation was closed at 2017-08-05 13:49:15

Here are the details of this operation:
Diagnosis interface boot (rescue)
Date 2017-08-05 13:37:21, yann M made Diagnosis interface boot (rescue):
Here are the details of the operation performed:
The server gets stuck in BIOS setting during boot.

Actions:
BIOS boot order correction.
Rebooting the server to «rescue» mode (Linux)

result:
Boot OK. Rescue mode accessible.

recommendations:
Configuration / error to be corrected by the customer

И приходят данные на почту для rescue режима:
Dear Customer,

Your server has been started in 'Rescue' mode. This has either been requested by you in the OVH manager or a technician has had to do this because of an error that needs to be resolved in Rescue mode.

This mode means that a basic Linux/BSD system has been launched on your server through the network. This is not the system installed on your server and none of your disks have been mounted.

A web interface is available for you to carry out diagnosis on your server (hard disk, raid, ram, CPU) and to browse your file systems using the following details:
164.132.205.52:444
— user: root
— password: 1Ae0JzEA4z9r

You may connect to your server through SSH with the following details:
— IP: 164.132.205.52
— user: root
Password: 1Ae0JzEA4z9r

You can now carry out the maintenance required to the repair your server.

For example, you can:
— check and update your network configuration files,
— check and decontaminate your firewall if required,
— check and update your LILO or GRUB (or to configure another Netboot via the network)
— launch a manual check of your file systems,
— carry out backup or restoration of data,
— etc.
Подробнее о rescue режиме можно прочитать здесь.

F: Если я отключу прием ICMP пакетов то мониторинг не сможет проверять сервер работает он или нет?
A: Да, верно. Но сотрудники работают через KVM. И они после проверки сервера просто отключат мониторинг.
Придет такое вот уведомление от сотрудника ДЦ после проверки сервера:
Dear Customer,

The intervention on ns3081547.ip-145-239-66.eu has been completed.

This operation was closed at 2017-08-16 13:43:43

Here are the details of this operation:
Monitoring disabled
Date 2017-08-16 13:42:51, sebastien T made Monitoring disabled:
We have detected a firewall which is too restrictive and is preventing our monitoring equipment from pinging your server. We have disabled the monitoring to prevent our technicians from intervening on your server.

Nmap scan report for 2.gra2.ovh.abcd.network (145.239.66.66)
Host is up (0.0049s latency).
Not shown: 998 filtered ports
PORT STATE SERVICE
22/tcp open ssh
80/tcp open http

Nmap done: 1 IP address (1 host up) scanned in 4.85 seconds

Here's a guide for more information:

docs.ovh.ca/en/guides-network-firewall.html#ovh-monitoring

You may re-enable the monitoring of your server from your OVH manager.

0 комментариев

Оставить комментарий




Только зарегистрированные и авторизованные пользователи могут оставлять комментарии.