Quantcast
Channel: THWACK: All Content - All Communities

FEATURE REQUEST - Allow changes to default SNMP Read and Write Communities

0
0

Summary

Provide a method via the GUI to set the default SNMP public and private strings that appear in the "Add Node" dialog.

 

Background

This request is borne out of the need to manually migrate over 100 network device nodes from an old version of NPM to the latest and greatest. While a SQL import\export process is possible, it is not an option in the GUI.

Therefore, all devices were manually added via the "add node" dialog. Once changed, the public string dialog box provided a quasi-drop list containing the alternate community. The private community field did not remember the previous entry. Previous versions of the product allowed manual editing of the file containing the value (see How to change the Add Node default SNMP Community String value ). There is a standing question in the community as of January of 2019 that is still unanswered, so there should be some level of support for this change. The time taken to enter and test these strings can be significant if there are many nodes to add. Case number 00268108 was created with support for this issue as well.

 

Additionally, if an organization does not use the default community values, or changes them based on environment, this increases the potential for misconfiguration.

 

Approach

Since the newer versions of the product are compiled and less accessible for customization, the SNMP community string settings should be accessible via the GUI. An ideal location to store these strings would be in the Settings | All Settings | Credentials area. There is an existing SNMP v3 management interface (https://servername/Orion/Admin/Credentials/SNMPCredentialManager.aspx). If this interface could include earlier versions of SNMP, it could provide the alternate variables to the "Add Node" dialog.

 

Alternatively, the credentials could be stored under the "Product Specific Settings" area in "All Settings."

 

This approach follows the model used with NCM and Windows servers, where there may be multiple iterations of credentials used to access different devices. The stored data is presented as  choice when adding nodes requiring those protocols and credentials.

 

In reference to that, it may also be helpful to include an option to change the SNMP default port as well. Some organizations change these settings to harden their infrastructure.


Virtual Classroom Playback Speed

0
0

To start, I love the increase in on-demand content that is available.

 

Can we get a feature to increase the playback speed?  Preferably in .25% increments.

 

For myself It's easier to maintain focus at 1.25% - 1.75% playback speed.

AVAILABILITY QUERY IS DROPPING DATA

0
0

I have this SQL query that gives me the average availability of the previous month for the firewalls at my sites:

DECLARE @startOfCurrentMonth DATETIME  
SET @startOfCurrentMonth = DATEADD(month, DATEDIFF(month, 0, CURRENT_TIMESTAMP), 0)  
SELECT  
sub.SummaryMonth AS Month_Of,  
AVG (sub.AVERAGE_of_Availability) as Total_Average  
FROM  
(  
SELECT Nodes.VendorIcon AS Vendor_Icon,  
Nodes.Caption AS NodeName,  
Nodes.MachineType AS Machine_Type,  
AVG(ResponseTime.Availability) AS AVERAGE_of_Availability,  
CONVERT(DateTime,  
LTRIM(MONTH(DateTime)) + '/01/' + LTRIM(YEAR(DateTime)),  
101) AS SummaryMonth  
FROM  
Nodes INNER JOIN ResponseTime ON (Nodes.NodeID = ResponseTime.NodeID)  
WHERE  
( datetime >= DATEADD(month, -1, @startOfCurrentMonth) AND datetime < @startOfCurrentMonth )  
AND   
(    (Nodes.Caption LIKE '%-FW')  
)  
GROUP BY CONVERT(DateTime, LTRIM(MONTH(DateTime)) + '/01/' + LTRIM(YEAR(DateTime)), 101),  
Nodes.VendorIcon, Nodes.Caption, Nodes.MachineType  
) sub  
GROUP BY sub.SummaryMonth  

 

However, the percentage that is returned keeps changing. I figured this was because there is some Availability retention setting that is less than 30 days or something. i've done some research, and it seems the Daily Statistics Retention setting is used for availability retention. I have mine set for 90 days, but the percentage was still changing.

 

I then played with the query a little bit and came up with this to see what data was getting pulled in:

 

DECLARE @startOfCurrentMonth DATETIME  
SET @startOfCurrentMonth = DATEADD(month, DATEDIFF(month, 0, CURRENT_TIMESTAMP), 0)  
SELECT Nodes.VendorIcon AS Vendor_Icon,  
Nodes.Caption AS NodeName,  
Nodes.MachineType AS Machine_Type,      
--find where responsetime.availability is pulling  
ResponseTime.Availability AS AVERAGE_of_Availability,  
CONVERT(DateTime,  
LTRIM(MONTH(DateTime)) + '/01/' + LTRIM(YEAR(DateTime)),  
101) AS SummaryMonth,  
DateTime, ResponseTime.NodeID  
FROM  
Nodes INNER JOIN ResponseTime ON (Nodes.NodeID = ResponseTime.NodeID)  
WHERE  
( datetime >= DATEADD(month, -1, @startOfCurrentMonth) AND datetime < @startOfCurrentMonth )  
AND   
(    (Nodes.Caption LIKE '%-FW')  
)  
GROUP BY CONVERT(DateTime, LTRIM(MONTH(DateTime)) + '/01/' + LTRIM(YEAR(DateTime)), 101),  
Nodes.VendorIcon, Nodes.Caption, Nodes.MachineType, ResponseTime.Availability, DateTime, ResponseTime.NodeID  
Order BY DateTime

 

It looks like it is just appending the data again? i figure this is because it is a view maybe?

 

So, my question is what am i doing wrong here? where is ResponseTime.Availability getting pulled from for that view?

I'm just trying to pull in the average availability for devices that end in '-FW'.

 

Thank you all in advance.

Virtualized Databases, Friend or Foe?

0
0

Background

 

Fire-fighting mode for DBAs can be stressful when they have co-workers and managers breathing down their necks due to application slow-downs and/or outages. Logic says something changed, but what? In a worst-case scenario, the database instance itself looks fine, nothing changed within the database and the SQL being executed was running fine before. Of course, the SysAdmin says nothing is wrong with the physical server or storage which makes it even more questionable. Hmm, could you be running in a virtual machine (VM)? Is your VM resource starved and competing with other VMs?

 

According to Gartner’s Market Guide for Server Virtualization[1], “Hypervisor-based server virtualization is now mature, with 80% to 90% of server workloads running in a virtual machine (VM) for most midsize to large enterprises.” Additionally, anecdotal evidence states 70% of all databases are virtualized. In fact, here at SolarWinds, 50% of our database instances run in a VM. For all the benefits of virtualization like cost savings and ease of migrating workloads, the abstraction of the virtual layer from the physical hardware can introduce some challenges.

 

And let’s not forget the elephant in the room, snapshots. Many DBAs I’ve talked to are at a loss as to why SysAdmins and IT ops perform snapshots of their database instance VMs, which in turn can cause performance issues, especially if a memory snapshot is invoked which renders the VM inactive while the memory is written to disk. Database backups are best left to DBAs who ensure referential integrity is maintained to recover a database.

 

Which Metrics Matter?

 

If you find yourself running your database instances in a VMware VM, what do you need to look for to see if the VM your database is running in has problems? There are many metrics available, so let’s review the usual suspects.

 

CPU Ready

 

  • This metric indicates the VM (and the database trying to run inside it) was ready to run but instead sat idle waiting behind other VMs contending to control the same shared resources such as physical CPUs or memory.

    For example, a vSphere host has six physical CPUs, and two VMs are configured to each require four virtual CPUs (vCPUs) before they can run. This situation means only one VM can run at a time. You can eliminate the VMs queueing behind each other by either moving a VM to another host or configuring both VMs to require three or fewer virtual CPUs.

 

    • The term “oversubscription” simply means you’ve assigned more virtual resources than what physical resources exist to run all VMs concurrently. It may seem a bit strange but reducing the number of vCPUs may dramatically increase its performance. Generally, oversubscription should not go above 5%.With the SolarWinds®Database Performance Analyzer (DPA) VM Option, an easy way to see how many physical CPUs your host server has is to view the Host tab on the VM CONFIG page.

 

VM CPU Usage

  • Actively used CPU as a percent of total available virtual CPU in the virtual machine.

Host CPU Usage

 

  • Actively used CPU as a percent of total available CPU on the machine. If this number is high you might see VMs with high CPU ready and/or co-stop.
    • Active CPU is approximately equal to the ratio of the used CPU to the available CPU where: Available CPU = # of physical CPUs x clock rate.
    • When your database instance is running in a VM, with the VM Option, DPA automatically expands the data in the CPU tab to include this information along with other VM specific metrics.

 

Co-Stop

 

  • The time a VM waits for a vCPU is due to scheduling (lack of resources). So basically, your VM can be waiting on physical CPU resources in use by other VMs. If you see high Host CPU Usage this is probably a sign there are too many VMs on this host and/or you need more physical CPU resources.

 

VM Memory Swap Rate

 

  • The “swap in” and “swap out” rates generally mean you have a shortage of physical memory on the host, so the memory is swapped out and in from disk.

 

VM Active Memory Usage

  • This is the memory in use as a percent of the memory configured for the VM.

 

Host Memory Usage

  • This is the memory usage on the host (consumed memory / total machine memory). If this is high (e.g., GT 90%) this could indicate host memory over-commit which could lead to high VM swap rates.

VM Memory Overhead

  • This is simply the amount of memory used to run the VM. Over-configuring memory (or excess vCPU for that matter) will unnecessarily increase overhead. That said, there’s memory needed by ESXi itself and the virtual machine (virtual machine frame buffer).

 

VM Memory Balloon

  • The balloon driver reclaims pages on the server considered less valuable. The crux of this VMware proprietary technique is to match the behavior of a guest OS. You should only see this when the host is running low or out of physical memory.
  • If you see the virtual machine your database instance is running in has a certain percent of memory claimed by the balloon driver, look for memory swapping which could affect your VM’s performance. However, if you don’t see any swapping issues you don’t and won’t necessarily have a performance problem.

 

VM Disk Commands

  • Number of disk commands executed is an indication of how busy the disks are. That said, unless you see large queues developing and commands start to be aborted there isn’t a problem.
  • If you see aborted disk commands, then your storage is severely overloaded and can lead to serious application response issues.

 

VM Disk Usage

  • Available if you aren’t using a NFS datastore, it will show the average disk I/O rates across all virtual disks on the VM.

 

VM Read / Write Rates

  • VM disk read rate is the average amount of data read from the disk each second during the collection interval. For a VM, this is the rate at which data is read from each virtual disk to the virtual machine.
  • VM disk write rate is the average amount of data written to disk each second during the collection interval—simply the rate data is written to each virtual disk on the VM.

 

Host Disk Device Read / Write Rates

  • The host disk read-and-write rate is the average read/write rate across all disks/LUNs on the host. The rate represents the read/write throughput at the host level across all disks/LUNs and VMs running on the host.
    • If the database instance has I/O performance issues, you may have another VM on the same host causing the delays. Compare this metric to the physical I/O rate from the database instance. If the Host rate is higher, then it’s likely another VM is the problem. Otherwise, the VM your instance is running in may be causing too much of a demand on the underlying physical storage.

Host Max Disk Latency

  • This is the highest latency value across all disks used by this host.

 

Host Disk Latency

  • Read latency is the average amount of time to process a read command to a disk to the host (across all VMs). High disk latency indicates storage may be slow or overloaded.
  • Write latency is similar to read and is the average amount of time to process a write command from the specific disk across all VMs.
    • Disk Write Latency = Kernel Write Latency + Device Write Latency
  • Expected disk latencies will depend on the nature of the storage like read/write mix, randomness and I/O size along with the capability of the storage subsystem.

 

In addition to these metrics being found in DPA, you can execute the “esxtop” command from your VMware ESXi host or look at various utilization metrics from the VMware ESXi console. SolarWinds Virtualization Manager also reports on all of these metrics and more in a friendlier format with both historical and real-time data.

 

 

 

Sample Nightmare Scenario Avoided

 

As I mentioned when I started off, a nightmare scenario could be when everything associated with the database instance seems fine—nothing changed. Since we’ve covered the essential VM metrics you should be monitoring, let’s walk through a hard-to-find problem for a database instance running in a VM using SolarWinds Database Performance Analyzer (DPA) with the VM Option. In the 2019.4 release of DPA, we expanded the VM option to go beyond the basic resource metrics to include additional HOST metrics and to make note of events, as seen in the DPA CPU tab in RESOURCES.

 

* Example of event logging in DPA 2019.4

 

 

Let’s walk through our sample “nightmare” scenario.

 

  • Problem ticket open for poor application performance response time
    • Users complained the morning of Monday, December 2 “around 8 a.m.” they experienced abnormally long wait times.

 

  • No outages were recorded from the IT Ops group

 

  • You go to DPA to look at the Database instance supporting the application
    • You notice a longer than normal wait occurrence on December 2, and the machine learning anomaly detection flags this time as a critical wait time delta from what is normally expected at this time of day.

 

 

  • You then look at the tab ADVISORS for additional data for this day.

    1. As it turns out, a specific query accounts for the top amount of execution time.


  • You select this query to find out more about it and what occurred at the time. From the QUERY DETAIL page, you see the longest wait time was for memory/CPU from which you click on the green bar for memory/CPU to explore further by going down to the hour.

 

  • Once you get down to the hourly view, you see a noticeable spike in wait time in the morning hours when the application response time issue was occurring.
  • As you scroll down the page to the end where VM metrics are shown, you see the new co-stop metric where there’s a corresponding spike. By hovering over the annotation dots, you see during this time the VM was being moved via vMotion from one host to another.


  • Just as with snapshots, vMotion events can have a negative impact on the performance of the VM the database instance is running in. Without visibility into the virtualized infrastructure, it can be time consuming to find the culprit of poor performance. 
    With DPA, you can easily line up all of resources for a specific time to pinpoint the problem as seen below.

 

Summary

 

With VMware’s 500,000 customers and tens of millions of VMs, virtualization is here to stay. Since many database on-premises to cloud migrations involve virtualization, e.g., Azure VM, many of the same challenges existing on-premises will exist in IaaS environments. DBA’s don’t have to be virtual admins, but they do need to be aware of the environment their database instances run in and the impact those environments have on database performance.

 

That said, I’ve discovered many DPA customers have no idea there’s a purpose-built option for VMware that can be added to the product. It’s easy to see if you have the option by looking for the VIRTUALIZATION tab on the home page.

 

*  This all-in-one view lets you line up all your resources in a single view to look for problems on a specific date and time.

 

Our goal at SolarWinds is to listen to our customers which is why we’ve enhanced the VM option for DPA. If you are a DPA customer, be sure to utilize our THWACK® feature request page to request and vote on feature enhancements. Lastly, if you are currently running your database instances in a VM, we'd appreciate you taking this 60 second survey (and reward you with THWACK points).

 

 

 

 


[1] Gartner Market Guide for Server Virtualization, Published 24 April 2019, ID G00350674

Create one job that updates multiple devices with different scripts

0
0

Is there any way with NCM to update multiple devices via the same job with different scripts per device?  Let's say I have redundant internet connections and manually re-routing takes a different set of configurations for 3 different devices.  I'd like to pre-populate the scripts and have one place to launch it.

Adjust polling rate of Real-Time Polling in Perfstack

0
0

Not all devices can be polled a 1 second intervals, so it would be helpful if we could adjust the polling interval to 5 or 10 seconds in Perfstack.

Force update to map in Orion Maps - NPM

0
0

I created a map in Orion Maps but I got "unknown" in one half of the connection between switches. I realised I wasn't monitoring both sides of some connections, so I've fixed that but the map is not updating - still showing as "unknown".

 

The maps were last updated on the day I created them (3 days ago) so how can I force a refresh to see if monitoring both sides of the connection has worked?

 

Thanks.

LDAP Problem

0
0

We recently demoted our old 2008 r2 DC that WHD used for LDAP. After adding in a new 2016 DC, all LDAP connections started failing, as in clients could not log in and no LDAP connection was present in the client info. I changed to a different 2016 DC for LDAP and everything started working normally. I'm trying to figure out what the difference could be between the two 2016 DCs that would cause this. Any ideas?

Thanks


ChChChChanges... Coming to THWACK in February 2020

0
0

First, it's so good to be writing to the community again. I've been hibernating working on fun necessary projects. But before I share what's coming, I have a question: how many of you knew THWACK was born more than 16, almost 17 years ago? I've been at SolarWinds for 8-plus years solely working on the community and our user group program and I have to say, it's been a thrilling, inspiring, and eye-opening experience to witness what THWACK has become today.

 

But with age come changes. The external community vendor arena has grown sparse due to shifts in how businesses are engaging with customers, and few have been successful in building a community program to do what all of you have built here on THWACK. That said, here’s how we arrived where we are now:

 

  • July 2017, Jive (our current platform) sells to Aurea.
  • September 2017, Aurea breaks up the business and sells Jive-X (Jive external) to Lithium.
  • Following this, we went through a lengthy RFP process and ultimately signed with Khoros (aka Lithium), who hosts external communities for Cisco, HP, Microsoft, Spotify, and more.
  • December 2018 it was announced Lithium will EOL Jive-X at the end of 2020.

 

TL;DR we’re migrating platforms! We saw this coming and work has been underway to bring things to parity between the platforms.

 

GOING FROM HOSTED TO SaaS

 

Because I know this audience understands these annoying yet necessary circumstances, I'm going to geek out with you a bit.

 

Our current environment (Jive) is hosted. We were told for years we'd never be able to move to the cloud due to the number of customizations we house. Thankfully, our forever and always partner, sonofagum, came through with a plan.

 

All our customizations (the THWACK Store, Monthly Missions, SolarWinds Lab live chat, etc.) were previously written in Java and ran in-process in Jive, taking advantage of a rich set of available services and libraries. The new platform mandated all our non-trivial customizations run out-of-process and be hosted externally. This presented us with a lot of challenges: from authentication, to platform differences and migration incompatibilities, to scalability, to having to rewrite dependencies from scratch without the benefit of source code, all while learning the ins and outs of the new platform. Needless to say, we’ve been busy over here!

 

EVERYTHING has been rewritten in C# (.NetCore 3.0 on Linux) and Angular or Vue and now runs in AWS. There’s still a lot of work to do and everything may not be perfect on day one, but we’re committed to keeping THWACK the best user community on the interwebs.

 

WHEN?

 

The official migration date is February 20, 2020. That's right, 02.20.2020 or 20.02.2020 for most of the world. I don't know how that worked out, but I'm calling it next-gen binary. We'll have more updates in the weeks coming, but below is what you can expect.

 

 

WHAT’S CHANGING?

 

  • Khoros is a mobile-first platform, which is great for you and really, really painful for us. We’ve had to rethink our entire webpage structure—headers, navigation, body content, widgets, footers, etc. But it’s really helped us clean up some of the website real estate and we’ll be curious to see if mobile usage picks up. Currently <1% of you visit the website on devices smaller than an iPad or similar.

 

  • Forums are now called categories and each category houses multiple boards underneath it. Confused yet? Don’t read too much into the semantics, but I wanted to bring this up and share an example to help understand what this change means.
    • Current THWACK: Network Performance Monitor is a forum housing multiple types of content – discussions, documents, feature requests, etc.
    • Future THWACK: Network Performance Monitor is now a category and has three main boards underneath it (hierarchically speaking).
      • Network Performance Monitor – shows all content contained in the sub-boards but content cannot be posted here.
        • Network Performance Monitor Discussions – houses all NPM discussions.
        • Network Performance Monitor Documents – houses all NPM documents.
        • Network Performance Monitor Feature Requests – houses all NPM feature requests.
      • This means you’ll need to either follow the NPM category page or all the sub-boards depending on your preferences. Not a big deal, but something to note.

       

      • Polls and events are no more. I know what you’re thinking. Trust me, I grilled Khoros hard on this. Not too much to say here other than we’ll have backups of this content, but it won’t appear on the new platform. We may revisit later to assess what can be done.

       

      • Remember when I mentioned we’d never be able to move to the cloud due to the number of customizations we have? This is where things get real. Our entire gamification strategy is unique to SolarWinds. We invented something even experts in the industry have never seen done at the level we’ve taken it to, and successfully I might add. The Khoros gamification strategy is fundamentally different than what we’ve developed. Their ranking system is based purely on community activity and engagement whereas our current ranking system is determined by your point accumulation. Sure, we’re logging community activity and engagement, but it’s not apples to apples.
        • Before you start penning hate letters, your points are being carried over and the store will live on. I won’t go into detail as to how we made this happen, but we did. Going forward, you’ll continue to earn points, but there will be more defined ways upon which you can earn them. More to come on this later.
        • However, everyone’s level will start at 1 (Ready Player One anyone?). There’s simply no way to port the way your current level is determined to match how your level will be determined moving forward. The data is too different. It's like trying to direct connect a 300baud modem to an MPLS line. It's like trying to port your TRS80 Basic program to the cloud as a microservice. It's like trying to convert your "database" that was lovingly crafted from macros in Lotus 1-2-3 to SQL 2016. It's like trying to make Battlestar Galactica jokes to a bunch of overly-earnest LOTR fans. It's like... well, it's like trying to port a gamification system built from custom scripts and calculations into a completely different platform.

       

      • Oh, and you’ll need to reset your password the first time you log in on the new platform because security or something important like that.

       

      ACTIONS YOU NEED TO TAKE

       

      Prior to our migration date, February 20, 2020, you will need to take note on the following items:

       

      • You will need to take inventory of the places and people you follow as well as your bookmarks. These cannot be migrated with your profile details.
        • Steps to find the places you follow: navigate to your profile > click on “More” > click on “Places.”
        • Steps to find the people you follow is the same as above except you’ll choose “Connections”: navigate to your profile > click on “More” > click on “Connections.”
        • Steps to find your bookmarkers is the same as above except you’ll choose “Bookmarks”: navigate to your profile > click on “More” > click on “Bookmarks.”

       

      • You will need to save a local copy of any content currently in draft mode.
        • If you have any drafted content that won’t be published before February 20, save it off THWACK! Drafts cannot be migrated.

       

      Once the migration is complete, we’ll publish instructions on how to get the items above set up on the new platform. I would recommend to follow the following people to ensure you get the latest or if you need to shoot us any questions: yumdarling, KMSigma, sonofagum, and me DanielleH.

       

      I’m exhausted. The team is exhausted. Did we want to spend the last 12 months working on this (while continuing our regular jobs I might add)? Absolutely not, but this is technology and it’s a constant game of keeping up. Writing all of this on paper knowing I’ve purposely left out 90% of what we've done makes me so proud to call these folks my team. I hear this audience knows a thing or two about migrations... I’m hoping you’ll bear with us through this transition and understand not everything will be perfect, but rest assured we'll be working around the clock to make it right. The backbone of this community—you—is all we need.

       

      Oh—and #darktheme is coming.

      Agent For Monitoring Solaris SPARC Operating Systems

      0
      0

      There's already a Linux Agent in the works for SAM that addresses most of the limitations associated with Agentless monitoring of Solaris hosts. What about an Agent that addressed the same/similar limitations for the Solaris SPARC operating system?





      Latest Images