Built YourServerIsDown.com as a side project that we needed for our startup... anyone else have the issue of not finding out quickly enough if your server went down?
For our app it's super important as if our server goes down, users can download the app but get stuck at the sign in flow. There's subscription services out there that do more in-depth monitoring but this is all we needed.
I listed an alternative solution below for those wanting to build or customize their own solution, ours just gets the job done, is quick to set up, and you can avoid the monthly twilio/sms fees.
Other alternative we received as feedback for those interested: "If any one wants an AWS Native way and assuming it has ALB you can target elb metric 503 via Cloudwatch Alarm and create an output to an SNS topic that goes to Slack, or use AWS chatbot/q, or set number as destination for sms via sns"
I suppose you're right, unless your server goes down more than 50 times. I saw it as credits that expire in a year, would be a bit scary to offer monitoring in perpetuity for $5 if they didn't expire.
Consider changing your landing page to reflect the price (“Only $5 Annually!”). The reason I asked was because the way it is now makes it look like the service is being offered for free, which made me think it was a phishing scheme.
This tool wouldn’t be useful for most (if not) all enterprise services I’ve worked for. For enterprise, you want fully featured synthetics services such as Thousand Eyes, plus an internal monitoring and alerting system.
Also you typically don’t want to expose your health endpoint to the outside world. It’s a security risk.
It's aimed at indie devs/startups shipping ideas quick. Built it for ourselves while we were starting an app under the aws free tier which occasionally went down when usage spiked. Notified us to fix it quickly before losing users that could download the app but not create an account. It can be set up in 30 seconds without needing to code anything, so mainly for coders that want a quick and easy solution.
So not aiming for enterprise on this one, made the pricing quite accessible and with minimal features.
For the health endpoint as long as it only returns a 200 status code (without disclosing info like tokens or resource info/server configurations) then the risk is very minimal.
With that being said, I find these kinds notifications to provide more false positives than correctly detecting downtime. That ends up costing more time checking/double checking.
On the other hand, if you are running a service with no users and you have downtime... did you really have downtime?
If you run a service and you have downtime and no one reports it, did you have downtime?
I don't even check for my services. If something goes down, I'll find out via email from one or more of my customers. It happens very rarely.
If a tree falls in the forest and no one is around to hear it, does it make a sound?
You bring up a good point. I think it to be less of a problem for more established companies that don't face unexpected outages too often. When we were starting out with our mobile app however this wasn't the case, and each outage meant downloads lost which were critical for getting early feedback. I see it as a bigger pain point for early founders/small teams whose server could see a lot of volatility.
So far we haven't encountered any false positives (been using it for around 6 months) but perhaps with the wrong endpoint that could be a problem. I'll keep an eye out for that.
Correct. It requires an unauthenticated endpoint that retuns a 200 response. So usually this is the /health endpoint, but as long as we can send a ping it works.
ok how does it actually work. i get it you ll check for 500 errors by hitting multiple endpoints every x units of time. But the number of endpoints you must check also keeps going up for your service. Today you start and have 10 endpoints,6 months down the line you need to check 10000 endpoints every x units of time. How do you manage scaling this?
Right, we ping the servers every minute. Since we charge a one-time fee the credits expire after a year, but the service is scaleable. To answer your question I'll give you some more context:
The architecture uses scalable AWS serverless components (Lambda, SQS, DynamoDB) and is well-suited to handle a large increase in monitored endpoints. The primary scaling mechanism is the automatic concurrency scaling of the Lambda functions processing messages from SQS queues. Should we scale to 10,000 endpoints we do expect some bottlenecks that would require optimizing i.e. increasing lambda timeouts/memory etc. but we'll cross that bridge when we get to it.
For the actual sms sending our numbers can send up to 100 sms texts/second.
thank you for the detail responses, so i understand that you have a lambda function that fires a request to fetch a website url from dynamodb, since lambda's require a memory limit and a timeout, how much memory is each function using and what is the timeout for a request (30s?) Also does each lambda function handle a single url or we doing asyncio aiohttp stuff with a whole bunch of urls at one go?
Built YourServerIsDown.com as a side project that we needed for our startup... anyone else have the issue of not finding out quickly enough if your server went down?
For our app it's super important as if our server goes down, users can download the app but get stuck at the sign in flow. There's subscription services out there that do more in-depth monitoring but this is all we needed.
I listed an alternative solution below for those wanting to build or customize their own solution, ours just gets the job done, is quick to set up, and you can avoid the monthly twilio/sms fees.
Other alternative we received as feedback for those interested: "If any one wants an AWS Native way and assuming it has ALB you can target elb metric 503 via Cloudwatch Alarm and create an output to an SNS topic that goes to Slack, or use AWS chatbot/q, or set number as destination for sms via sns"
If the service has no monthly fee, how is it being paid for?
It's a one-time 4.99 fee (covers a year of monitoring or 50 downtime events).
That’s an annual subscription.
I suppose you're right, unless your server goes down more than 50 times. I saw it as credits that expire in a year, would be a bit scary to offer monitoring in perpetuity for $5 if they didn't expire.
Consider changing your landing page to reflect the price (“Only $5 Annually!”). The reason I asked was because the way it is now makes it look like the service is being offered for free, which made me think it was a phishing scheme.
I appreciate the feedback! Just implemented this, hadn't thought of that. Cheers.
How it compares with https://www.site24x7.com?
Who’s this built for and what is the use case?
This tool wouldn’t be useful for most (if not) all enterprise services I’ve worked for. For enterprise, you want fully featured synthetics services such as Thousand Eyes, plus an internal monitoring and alerting system.
Also you typically don’t want to expose your health endpoint to the outside world. It’s a security risk.
It's aimed at indie devs/startups shipping ideas quick. Built it for ourselves while we were starting an app under the aws free tier which occasionally went down when usage spiked. Notified us to fix it quickly before losing users that could download the app but not create an account. It can be set up in 30 seconds without needing to code anything, so mainly for coders that want a quick and easy solution.
So not aiming for enterprise on this one, made the pricing quite accessible and with minimal features.
For the health endpoint as long as it only returns a 200 status code (without disclosing info like tokens or resource info/server configurations) then the risk is very minimal.
First and foremost, I love a good side hustle.
With that being said, I find these kinds notifications to provide more false positives than correctly detecting downtime. That ends up costing more time checking/double checking.
On the other hand, if you are running a service with no users and you have downtime... did you really have downtime?
If you run a service and you have downtime and no one reports it, did you have downtime?
I don't even check for my services. If something goes down, I'll find out via email from one or more of my customers. It happens very rarely.
If a tree falls in the forest and no one is around to hear it, does it make a sound?
You bring up a good point. I think it to be less of a problem for more established companies that don't face unexpected outages too often. When we were starting out with our mobile app however this wasn't the case, and each outage meant downloads lost which were critical for getting early feedback. I see it as a bigger pain point for early founders/small teams whose server could see a lot of volatility.
So far we haven't encountered any false positives (been using it for around 6 months) but perhaps with the wrong endpoint that could be a problem. I'll keep an eye out for that.
> I find these kinds notifications to provide more false positives than correctly detecting downtime
There are services like Textbelt that leave the trigger mechanisms all up to you and your local tools:
https://textbelt.com/
how do you determine if the server went down?
By checking a health end point. (I'm not the owner.)
Correct. It requires an unauthenticated endpoint that retuns a 200 response. So usually this is the /health endpoint, but as long as we can send a ping it works.
ok how does it actually work. i get it you ll check for 500 errors by hitting multiple endpoints every x units of time. But the number of endpoints you must check also keeps going up for your service. Today you start and have 10 endpoints,6 months down the line you need to check 10000 endpoints every x units of time. How do you manage scaling this?
Right, we ping the servers every minute. Since we charge a one-time fee the credits expire after a year, but the service is scaleable. To answer your question I'll give you some more context:
The architecture uses scalable AWS serverless components (Lambda, SQS, DynamoDB) and is well-suited to handle a large increase in monitored endpoints. The primary scaling mechanism is the automatic concurrency scaling of the Lambda functions processing messages from SQS queues. Should we scale to 10,000 endpoints we do expect some bottlenecks that would require optimizing i.e. increasing lambda timeouts/memory etc. but we'll cross that bridge when we get to it.
For the actual sms sending our numbers can send up to 100 sms texts/second.
thank you for the detail responses, so i understand that you have a lambda function that fires a request to fetch a website url from dynamodb, since lambda's require a memory limit and a timeout, how much memory is each function using and what is the timeout for a request (30s?) Also does each lambda function handle a single url or we doing asyncio aiohttp stuff with a whole bunch of urls at one go?