Feeds:
Posts
Comments

I have been learning a little a bit about systems thinking. I have spent a bit of time watching, lisstening to and and reading John Seddon’s work. His observations of the UK public sector resonate with what I saw when I worked there.

A lot of what Seddon talks about is the industrialization of the service sector and government services. He explains why he thinks that the arbitrary utilization of management tools and targets is actually counterproductive. He advocates systems thinking and suggests ways of achieving a regime of continuous improvement using the Vangard Method. ( I am not going to comment on how the Vangard Method could be construed as a tool.)

One important thing Seddon talks about is the principle of “Failure Demand“. This is additional demand upon a system that would not have  been there if the system had done what it was supposed to do. In IT operations this sits neatly with the Visible Ops idea that you should break your work in to what creates value for the business and what does not. Merging the two together you have two levels of waste in Incident Management to work on reducing. I am going to classify them as definately failure demand and probably failure demand:

Definitely Failure Demand

The following fit Seddon’s definition of failure demand:

  • Incidents logged due to an incident. When a customer or client needs to log another incident because the original one that she logged did not resolve the issue.
  • Incidents logged due to a recently completed change. If a change causes a breakage that results in a customer or client logging and incident.
  • Incidents logged to expedite an Incident. While a good service desk rep should not let this happen, it does and is failure demand.
  • Incidents logged to expedite a change. Seems to happen a bit – the squeaky wheel and all that.

Unfortunately, figuring out if any of these things happen and how much involves trawling through the demand side of your ITSM system. This is definitely a good thing to do anyway. Any systems thinker will tell you that there is lots of really good information to be gleaned about your system by examining the demand side.

Probably Failure Demand

Here I am stretching Seddon’s definition to include some of the wisdom of Visible Ops and the principles of Devops.

Visible Ops suggests that you seperate your daily work in to tasks that add value to the business (ie. the mission of your company, or that of your customers) and those that do not add value. They advocate finding ways to reduce time spent on non value add tasks, especially the predictable stuff. Is it fair to say that incident management is ficing broken stuff. Therefore, it is actually switching negative value to nominal value. It definitely does not add value to the customer’s business. Does that mean it is failure demand?

In Devops – the hot, new cultural force that is trying to propel IT operations from good enough to excellent – talks about not viewing application devolpment, and deployment / support as separate processes. Instead they suggest we view development and operations as a single system where one part affects the others. If that is the case then are incidents logged against that application failure demand against that it? Instead of declaring the app as “in production” and making incidents operational issues, by treating these incidents as failure demand upon the development process perhaps developers will be more compelled to take a more proactive role in incident management. Perhaps failure demand could also be another metric the project managers track.

A lot of the ideas I have just said are still half cooked. Any comments, thoughts or extra ingredients to add are welcome.

If you are trying to collect performance data from IBM’s Tivoli Directory Server, and you do not have Tivoli Directory Integrator installed, then you can still monitor some performance metrics with Zenoss.

The big Aha! moment for me was when I read that you can query some useful metrics with an LDAP client.

ldapsearch -h $LDAP_HOST -x -D "$LDAP_ADMIN_DN" -w $LDAP_ADMIN_PASSWORD -s base -b cn=monitor objectclass=*
# extended LDIF
#
# LDAPv3
# base <cn=monitor> with scope baseObject
# filter: objectclass=*
# requesting: ALL
#
# MONITOR
dn: CN=MONITOR
version: IBM Tivoli Directory (SSL), 6.1
totalconnections: 50912
total_ssl_connections: 0
total_tls_connections: 0
currentconnections: 67
maxconnections: 1024
writewaiters: 0
readwaiters: 0
opsinitiated: 712582
livethreads: 1
opscompleted: 712581
entriessent: 620628
searchesrequested: 585265
searchescompleted: 585264
bindsrequested: 50917
bindscompleted: 50917
unbindsrequested: 50791
.......

So we no longer need to use IBM’s SNMP listener therefore saving some time and maybe even some money. Zenoss allows you to run scripts and  so long as the script returns stuff in the right format Zenoss can graph them. Here’s how I did it.

1. The script

First we need a script to go get the data from TDS. As shown above, it is really only a simple LDAP search, but the output need to conform to the Nagios plugin standard. So here is my script. Feel free to use and improve upon it.

#!/bin/bash

# Robert Hart http://robertjhart.wordpress.com/ July 2010
# Script to collect performance metrics from IBM Tivoli Directory Server 6.1
# For reference:
# http://community.zenoss.org/docs/DOC-4770
# http://nagiosplug.sourceforge.net/developer-guidelines.html#PLUGOUTPUT

# List of attributes to get.
# format: space seperated - attribute,UOM
ATTRIBUTES="bindsrequested,c currentconnections,"

# Stuff to tweak
LDAP_HOST=$1
LDAP_ADMIN_DN=<administrator distinguished name>
LDAP_ADMIN_PASSWORD=<password>
CMD="ldapsearch -h $LDAP_HOST -x -D "$LDAP_ADMIN_DN" -w $LDAP_ADMIN_PASSWORD -s base -b cn=monitor objectclass=*"
TMP_FILE="/tmp/$$"
# if needed, set to a real file
DEBUG="/dev/null"

# this parses the attribute value out of the ldpsearch output.
# usage: parse attribute,UOM
function parse
{
 echo "parsing $1" > $DEBUG
 ATT=`echo $1 | awk -F "," '{print$1}'`
 UOM=`echo $1 | awk -F "," '{print$2}'`
 KEY=`grep "^$ATT" $TMP_FILE | awk '{print $1}' | sed -e 's/://'`
 VALUE=`grep "^$ATT" $TMP_FILE | awk '{print $2}'`
 if [ -z $UOM ]
 then
 OUTPUT="$KEY=$VALUE;;;; $OUTPUT"
 else
 OUTPUT="$KEY=$VALUE[$UOM];;;; $OUTPUT"
 fi
}

# Lets do some work

if $CMD > $TMP_FILE
then
 echo "command ran" > $DEBUG
else
 exit 2
fi

for ATTRIBUTE in $ATTRIBUTES
do
 if parse $ATTRIBUTE
 then
 echo "parsed" > $DEBUG
 else
 exit 1
 fi
done

echo "tds |$OUTPUT"
rm $TMP_FILE
exit 0

Save this script as a file on the Zenoss server, make the zenoss user the owner and give it execute permissions.

You should be able to test the script and get a result:

$ ./zenoss-tds.sh ldap.example.org
tds |currentconnections=69;;;; bindsrequested=50880[c];;;;

2. Set up the template in Zenoss

First I created an device class because we have a couple of LDAP servers, and so devices inheriting from the device class is the more efficient way to do this.

I also set the LDAP monitor zProperties and bound the LDAP monitor template so we could graph LDAP response times too.

In the templates tab for the device class, I pulled down the menu in the Available Performance Templates section and selected “Add Template…”. Once you have given it a name, then you end up in a page where you can add a data source. In the Data Sources section pull down the menu and select “Add Datasource…”. Give it a name, and set the source to COMMAND. Make sure you set the parser to Nagios, and make sure you pass the device name to your command, eg. /opt/zenoss/scripts/zenoss-tds.sh ${devname}. Click Save, and then add DataPoints at the bottom.

When you create the DataPoints, remember to set the correct Type. Since most of the metrics in TDS zero themselves when you restart the server, then COUNTER is probably the most appropriate.

Once you have done that then you can go back to the template and add graph definitions. Then you can attach the appropriate data point to the graph.

3. See your Graph Loveliness

Lastly you need to bind your new template to the device class. You do that in the templates tab again. Remember to hold down the control key or you will deselect all the other templates in use here.

If you have not already done so, you can put a device in to that class and odel. You should start to see data for the metrics you are collecting.

References

Monitoring using zencommands

Nagios Plugin Output

And thanks to Dan for his help too.

I notice that my notes from a couple of days ago gets a lot of hits, so I  feel inclined to be a bit more detailed about how to integrate Ubuntu Lucid Lynx in to a windows domain.

Setting up Kerberos

This is worth doing regardless of whether or not you set up PAM. You can see why later.

1. Install the following packages:

sudo apt-get install krb5-config krb5-user

During the installation process, it will ask you for your realm. Enter the Realm for your Windows Domain (talk to your Active Directory administrator if you don’t know)

2. Edit /etc/krb5.conf

You will need to add a stanza for your realm in /etc/krb5.conf. something like this:

<REALM> = {
 kdc = <active directory server>
 admin_server = <active directory server>
 default_domain  = <domain.name>
 }

3. Test

kinit <user>

The user should be a windows domain user. When challenged input the windows password for that account. Look for the ticket with the klist command.

Setting up PAM

If you want to sign in to your desktop / server using your Windows network credentials, then do follow these steps.

1. Prerequisite -This is your get out of jail free card

Set up the root account. If you mess this up and you need to fix it, then you need to be able sign in as root. I strongly recommend you do this so that you can.

sudo passwd root

and then test:

su -

Glad we got that done – lets move on.

2. Install the PAM module

sudo apt-get install libpam-krb5

Ubuntu sets up PAM for you, so that should be it.

3. Test

Try to log in to the computer with your windows credentials. There must be a local account already created and the user name and must be the same as the domain user name. It should just work.  Run klist and you should also see a ticket. Nice!

Applications

Firefox

If you have kerborized web applications, or spnego enabled sites, then you can configure Firefox to use your kerberos ticket to negotiate for you and logging you in unchallenged. You need to type the following in to the location bar (preferably in another tab – you don;t want to lose this page just yet)

about:config

Say that you will be careful and make sure the following attributes are set to true:

  • network.automatic-ntlm-auth.allow-proxies
  • network.negotiate-auth.allow-proxies
  • network.negotiate-auth.using-native-gsslib

Set the following to your local DNS sub-domain. This defines the scope of the trust for sites it will try to negotiate with:

  • network.automatic-ntlm-auth.trusted-uris
  • network.negotiate-auth.trusted-uris

Next time you attach to such a web site, then you will get logged in.

Pidgin

If your company uses the Microsoft Office Communicator Suite for IM, then you too can join the conversaton with the Sipe plugin. Install it with this commend:

sudo apt-get install pidgin-sipe

Once you have restarted Pidgin, then you can add an account. Here are some guidelines:

  • Protocol: Office Communicator
  • Username: Exchange email address
  • Login: <DOMAIN>\<User>
  • Password: <domain password>
  • In the advanced tab, set the server to your IM server.

Kerberos seemingly works but the version that is here has not really worked for me. Give it a try by blanking out the password and checking the kerberos box and see how you do.

Mounting Windows File Shares

Once you have a kerberos ticket, then you can mount file shares without providing a user name or password. You can use the connect to server form in the Places menu. You set the service type to be windows server, set the server to the fully qualified domain name of the windows file server and enter the share name. You do not need to put a user name of domain in. Bookmark if you like. The share should open up in nautilus with no further prompting.

Evolution

I tried the evolution-mapi plugin which implements the Exchange MAPI protocol. It works but I found it sluggish and still very buggy. I would wait a little longer for anything more serious than testing.

That’s all folks!

If you have any other tricks that I have not mentioned then let us know.

Ubuntu 10.04, the Lucid Lynx is now released to the wild. There is lots of stuff out there talking about all the cool stuff that is included. I am going to talk about some of the stuff that you probably won’t see in the reviews.

Kerberos

A fairly niche subject important to those in a Kerberos environment, or who want better integration in a Windows Domain. In Ubuntu 10.04, setting up a kerberos client just got a lot easier.

When you install the krb5-config package, it will ask you some questions abot the realm you are in, etc. It does not do everything, but it does most of the work. I still had to edit /etc/krb5.conf to add in the hostname of the KDC, etc.

Installing libpam-krb5 does the right things to configure PAM. You can start to use it straight away, and it just works. For me, it also creates a kerberos ticket for you, which I don’t think it did before. This, to me, is a big deal. It means that I can use firefox to go into kerborized, and spnego enabled web sites without having to manually create a ticket before-hand. Same with kerborized ssh servers, and pidgin-sipe.

I tried libpam-ccreds too and it also just worked. Again, no messing with PAM configurations.

Connecting Pidgin to Office Communication Services

If you are in a Windows Domain and need to IM with your colleagues who are hanging out on Office Communicator, then pidgin-sipe does the job very well. If you hover over a buddy icon, then you can see what is on their calendar now. Pidgin also then sets your status according to what is on your calendar, so if you are scheduled for a meeting then it will set your status to busy at that time.

Mounting Windows File Shares

Once you have your kerberos ticket, then you can mount cifs file shares in the domain by running:

gvfs-mount smb://server-fqnd/share

The share is then set up in GNOME and you are not challenged for credentials. That means that with libpam-krb5 and a login script, you can have all your Windows shares auto-magically mounted when you log in. Nice!

Very often the discipline of documentation does not come naturally to systems administrators. When it comes to  aspects like relationship data, this is not immediately obvious to them, and the benefits of capturing and sharing that data is probably a little esoteric. This means that without a direct and immediate benefit to them, getting SA’s to input this data is going to be a chore.

The social networking paradigm would mean that entering data is a little more fun and interesting than filling the forms you see in most configuration management databases. Hopefully this would be an added motivator to keep data up to date.

In my last post I extolled the virtues of mapping relationships between configuration items. As I was thinking about this more, it came to me that there are many similarities between this and social networking. In both we map and track relationships and in both we use these relationships to derive value and aggregate information.

In my mind the social networking product that best fits the analog is Facebook. You create a profile for yourself, you build up a network of relationships, and you have a stream of status information. Your friends aggregate this status information to form a single news feed about all the people they care about.

So in terms of a configuration management equivalent, we have a profile for each server, application and service in our data center. We then link them together with relationships. I would add more information with these links like what type of link it is. Because servers don’t have the same privacy concerns we do, then I would have the page in the profile that shows the relationships map recursively as far as they can up and down the pyramid stating who is related to what and how. Each item that is mentioned should be hyper-linked to their own profile page for easy navigation.

The “Wall” as it were can be an aggregation of status messages from the incident/problem/change ticketing system, and maybe other areas like the network monitoring and performance monitoring tools that may be deployed. I think it would be really cool for system administrators to be able to reference items in a micro-blogging environment which then magically appear on the wall.

Why Relationships Mater

No I am not talking about human relationships – although they matter too. I am talking about configuration management. Let me explain.

First of let me clarify configuration management. I am not talking about enforcing standard server builds and standard software configurations. I am talking about getting all the stuff in your data center in a database and being able to use that information. This is more like the ITIL view of configuration management.

The ITIL standard states that you should keep records of all “configuration items”, accurate details about them and also data like relationships between CIs and change history.

The first part just sounds like simple asset management to me and once your data center has grown to a certain size, then you are probably pretty good at that. It seems to me that the next level is relationship mapping and I am pretty sure that is where you get the most bang for your buck when it comes to ITIL CMDBs.

To illustrate my point, lets build a word picture of how a set of relationships might look:

  • server 1 runs application A
  • application A provides the authentication service
  • server 2 runs application B
  • application B uses application A
  • application B provides the website service.

Now this is a very simple set of relationships and already we can draw some very useful insights:

  1. If the authentication service goes down, then the website breaks too.
  2. By looking just at the service items, we see the start of a service catalog. Not only that, it is easy to see the key assets needed to run this service.
  3. We can see the impact of issues like performance problems in the authentication service.
  4. We can perform impact analysis for change requests to any of these items.
  5. In the event of a disaster, we can see what order things need to be restored. In fact if we have time estimates for the restoration process of each of these items, we have the start of a project plan.

Immediately, we can see that there is huge value to relationship data. This is why it matters and why it is worth maintaining this data. Once this data is being collected, maintained and used, then in my mind that is a big milestone towards transitioning your systems team from an asset centric operations oriented shop to a more service oriented, and hence customer focused endeavor.

Goodbye Sun Microsystems

So the purchase of Sun Microsystems by Oracle is complete. Sun made great stuff, including my most favoritest UNIX. Solaris is the UNIX I worked the most on and I will always have a soft spot for it.

I would have loved to have worked for Sun too. Their corporate culture would have suited me very well. Oracle makes a great RDBMS and in my opinion the quality of it’s application stack is suspect. My freind worked for Oracle a long time ago. He did not last very long. He claimed that the culture there was not fun, and the Larry reverence was too much for him.

I hope the future of Solaris and Java is assured, and I hope that Oracle learns something from Sun’s culture.

My favorite Eulogy so far is James Goslings Picture. Favorite comment under the picture is:

So long, and thanks for all the Glassfish.

I am reading Rise of the Creative Class by Richard Florida. I bought the paperback a few years back but the text was too small for me to read. A few weeks back I discovered that Kindle for PC runs on Linux with the help of WINE. I have found that with the text size cranked right up I can read books. This is a boon for me as I have been wanting to dig deeper in to stuff than can really be done with the web alone. So I bought the ebook of Rise and started reading.

Thank goodness it is a good book because it would have been really disappointing to have bought it twice and then found it to be rubbish. However I stopped dead when I came across this statement as seems to me to be completely wrong:

If they behave unwisely or if their vision fails, “forking” may occur, whereby the disgruntled group takes the projects source code (which of course is not protected by any copyright or patent) and starts a new project with a new vision – as has sometimes happened.

Florida is generally correct about the patents, but dead wrong about the copyright. Part of the culture of open source is attribution, and even if you fork code, that code is still attributed tot he original coder. This is enforced using copyright. Copyright is also what gives licenses like the GPL its teeth. By asserting copyright they are able to give you the freedoms of free and open source software. It is all over the GPL

Granted, it does not change Mr Florida’s argument, but I did feel the need to nit-pick a point that is central to free software.

Last week this article bubbled up a few times on twitter and other places. It was wrriten by Tim Bray who it seems used to work for Sun.

He states that those who have been writing Enterprise Systems have been doing it wrong.

What I’m writing here is the single most important take-away from my Sun years, and it fits in a sentence: The community of developers whose work you see on the Web, who probably don’t know what ADO or UML or JPA even stand for, deploy better systems at less cost in less time at lower risk than we see in the Enterprise. This is true even when you factor in the greater flexibility and velocity of startups.

While this paragraph pleases the open source advocate in me, and I agree that there are a lot of really nice web apps out there, I think he is comparing apples with oranges.

An Enterprise System is not like microblogging, or sharing photos. In my mind, a better comparison on the web would be booking a flight. Have you seen a really kick-ass flight booking system lately?

The thing is, I want to agree with him. I believe him when he talks about how good the software these developers build is and how efficiently they do it. I use this stuff every day  (heck, I am using one right now). This software is fantastic, but as of today, it is not enterprise software.

Later on in the article, Tim Bray goes on to advocate the purchase of vendor supplied Enterprise Software such as Oracle. I have seen two implementations of Oracle E-Business Suite, one from afar, and one up close. One failed, and the other took years to bed in. I have also worked on or near similar systems like Laboratory systems in diagnostic labs, or MMIS systems, etc. All of them were supplied by external vendors, and all of them were implemented with much pain and suffering. The supplied software is incredibly complicated, not very great and it always involves some amount of business change. This adds a lot of risk and causes a lot of upset amongst the user community.

My main point, however is that this software is nothing like the what was quoted above.

Yes, building your own HR system may be silly, but in some respects the effort needed to build your own HR system that does not cause waves when implemented because it actually automates the process they do instead of forcing the users to change their process so that they can be automated is perhaps the same as buying a COTS product, going through the two year business change pain, and then all the developer time trying to get it to integrate with all the other silos languishing in your data center.

All being said, I live in hope that somebody will come along and prove Mr Bray right by producing a high quality low cost low risk enterprise system.

Older Posts »

Follow

Get every new post delivered to your Inbox.