Search Results: "Geoffrey Thomas"

13 September 2020

Russell Coker: Setting Up a Dell T710 Server with DRAC6

I ve just got a Dell T710 server for LUV and I ve been playing with the configuration options. Here s a list of what I ve done and recommendations on how to do things. I decided not to try to write a step by step guide to doing stuff as the situation doesn t work for that. I think that a list of how things are broken and what to avoid is more useful. BIOS Firstly with a Dell server you can upgrade the BIOS from a Linux root shell. Generally when a server is deployed you won t want to upgrade the BIOS (why risk breaking something when it s working), so before deployment you probably should install the latest version. Dell provides a shell script with encoded binaries in it that you can just download and run, it s ugly but it works. The process of loading the firmware takes a long time (a few minutes) with no progress reports, so best to plug both PSUs in and leave it alone. At the end of loading the firmware a hard reboot will be performed, so upgrading your distribution while doing the install is a bad idea (Debian is designed to not lose data in this situation so it s not too bad). IDRAC IDRAC is the Integrated Dell Remote Access Controller. By default it will listen on all Ethernet ports, get an IP address via DHCP (using a different Ethernet hardware address to the interface the OS sees), and allow connections. Configuring it to be restrictive as to which ports it listens on may be a good idea (the T710 had 4 ports built in so having one reserved for management is usually OK). You need to configure a username and password for IDRAC that has administrative access in the BIOS configuration. Web Interface By default IDRAC will run a web server on the IP address it gets from DHCP, you can connect to that from any web browser that allows ignoring invalid SSL keys. Then you can use the username and password configured in the BIOS to login. IDRAC 6 on the PowerEdge T710 recommends IE 6. To get a ssl cert that matches the name you want to use (and doesn t give browser errors) you have to generate a CSR (Certificate Signing Request) on the DRAC, the only field that matters is the CN (Common Name), the rest have to be something that Letsencrypt will accept. Certbot has the option --config-dir /etc/letsencrypt-drac to specify an alternate config directory, the SSL key for DRAC should be entirely separate from the SSL key for other stuff. Then use the --csr option to specify the path of the CSR file. When you run letsencrypt the file name of the output file you want will be in the format *_chain.pem . You then have to upload that to IDRAC to have it used. This is a major pain for the lifetime of letsencrypt certificates. Hopefully a more recent version of IDRAC has Certbot built in. When you access RAC via ssh (see below) you can run the command racadm sslcsrgen to generate a CSR that can be used by certbot. So it s probably possible to write expect scripts to get that CSR, run certbot, and then put the ssl certificate back. I don t expect to use IDRAC often enough to make it worth the effort (I can tell my browser to ignore an outdated certificate), but if I had dozens of Dells I d script it. SSH The web interface allows configuring ssh access which I strongly recommend doing. You can configure ssh access via password or via ssh public key. For ssh access set TERM=vt100 on the host to enable backspace as ^H. Something like TERM=vt100 ssh root@drac . Note that changing certain other settings in IDRAC such as enabling Smartcard support will disable ssh access. There is a limit to the number of open sessions for managing IDRAC, when you ssh to the IDRAC you can run racadm getssninfo to get a list of sessions and racadm closessn -i NUM to close a session. The closessn command takes a -a option to close all sessions but that just gives an error saying that you can t close your own session because of programmer stupidity. The IDRAC web interface also has an option to close sessions. If you get to the limits of both ssh and web sessions due to network issues then you presumably have a problem. I couldn t find any documentation on how the ssh host key is generated. I Googled for the key fingerprint and didn t get a match so there s a reasonable chance that it s unique to the server (please comment if you know more about this). Don t Use Graphical Console The T710 is an older server and runs IDRAC6 (IDRAC9 is the current version). The Java based graphical console access doesn t work with recent versions of Java. The Debian package icedtea-netx has has the javaws command for running the .jnlp command for the console, by default the web browser won t run this, you download the .jnlp file and pass that as the first parameter to the javaws program which then downloads a bunch of Java classes from the IDRAC to run. One error I got with Java 11 was Exception in thread Timer-0 java.util.ServiceConfigurationError: java.nio.charset.spi.CharsetProvider: Provider sun.nio.cs.ext.ExtendedCharsets could not be instantiated , Google didn t turn up any solutions to this. Java 8 didn t have that problem but had a connection failed error that some people reported as being related to the SSL key, but replacing the SSL key for the web server didn t help. The suggestion of running a VM with an old web browser to access IDRAC didn t appeal. So I gave up on this. Presumably a Windows VM running IE6 would work OK for this. Serial Console Fortunately IDRAC supports a serial console. Here s a page summarising Serial console setup for DRAC [1]. Once you have done that put console=tty0 console=ttyS1,115200n8 on the kernel command line and Linux will send the console output to the virtual serial port. To access the serial console from remote you can ssh in and run the RAC command console com2 (there is no option for using a different com port). The serial port seems to be unavailable through the web interface. If I was supporting many Dell systems I d probably setup a ssh to JavaScript gateway to provide a web based console access. It s disappointing that Dell didn t include this. If you disconnect from an active ssh serial console then the RAC might keep the port locked, then any future attempts to connect to it will give the following error:
/admin1-> console com2
console: Serial Device 2 is currently in use
So far the only way I ve discovered to get console access again after that is the command racadm racreset . If anyone knows of a better way please let me know. As an aside having racreset being so close to racresetcfg (which resets the configuration to default and requires a hard reboot to configure it again) seems like a really bad idea. Host Based Management
deb http://linux.dell.com/repo/community/ubuntu xenial openmanage
The above apt sources.list line allows installing Dell management utilities (Xenial is old but they work on Debian/Buster). Probably the packages srvadmin-storageservices-cli and srvadmin-omacore will drag in enough dependencies to get it going. Here are some useful commands:
# show hardware event log
omreport system esmlog
# show hardware alert log
omreport system alertlog
# give summary of system information
omreport system summary
# show versions of firmware that can be updated
omreport system version
# get chassis temp
omreport chassis temps
# show physical disk details on controller 0
omreport storage pdisk controller=0
RAID Control The RAID controller is known as PERC (PowerEdge Raid Controller), the Dell web site has an rpm package of the perccli tool to manage the RAID from Linux. This is statically linked and appears to have different versions of libraries used to make it. The command perccli show gives an overview of the configuration, but the command perccli /c0 show to give detailed information on controller 0 SEGVs and the kernel logs a vsyscall attempted with vsyscall=none message. Here s an overview of the vsyscall enmulation issue [2]. Basically I could add vsyscall=emulate to the kernel command line and slightly reduce security for the system to allow system calls from perccli that are called from old libc code to work, or I could run code from a dubious source as root. Some versions of IDRAC have a racadm raid command that can be run from a ssh session to perform RAID administration remotely, mine doesn t have that. As an aside the help for the RAC system doesn t list all commands and the Dell documentation is difficult to find so blog posts from other sysadmins is the best source of information. I have configured IDRAC to have all of the BIOS output go to the virtual serial console over ssh so I can see the BIOS prompt me for PERC administration but it didn t accept my key presses when I tried to do so. In any case this requires downtime and I d like to be able to change hard drives without rebooting. I found vsyscall_trace on Github [3], it uses the ptrace interface to emulate vsyscall on a per process basis. You just run vsyscall_trace perccli and it works! Thanks Geoffrey Thomas for writing this! Here are some perccli commands:
# show overview
perccli show
# help on adding a vd (RAID)
perccli /c0 add help
# show controller 0 details
perccli /c0 show
# add a vd (RAID) of level RAID0 (r0) with the drive 32:0 (enclosure:slot from above command)
perccli /c0 add vd r0 drives=32:0
When a disk is added to a PERC controller about 525MB of space is used at the end for RAID metadata. So when you create a RAID-0 with a single device as in the above example all disk data is preserved by default except for the last 525MB. I have tested this by running a BTRFS scrub on a disk from another system after making it into a RAID-0 on the PERC.

2 September 2011

Asheesh Laroia: Debian bug squashing party at SIPB, MIT


(Photo credit: Obey Arthur Liu; originally on Picasa, license.) Three weekends ago, I participated in a Debian bug squashing party. It was more fun than I had guessed! The event worked: we squashed bugs. Geoffrey Thomas (geofft) organized it as an event for MIT's student computing group, SIPB. In this post, I'll review the good parts and the bad. I'll conclude with beaming photos of my two mentees and talk about the bugs they fixed. So, the good:

The event was a success, but as always, there are some things that could have gone more smoothly. Here's that list: Still, it turned out well! I did three NMUs, corresponding to three patches submitted for release-critical bugs by my two mentees. Those mentees were: Jessica enjoying herself Jessica McKellar is a software engineer at Ksplice Oracle and a recent graduate of MIT's EECS program. She solved three release-critical bugs. This was her first direct contribution to Debian. In particular: Jessica has since gotten involved in the Twisted project's personal package archive. Toward the end of the sprint, she explained, "I like fixing bugs. I will totally come to the next bug squashing party." Noah grinning Noah Swartz is a recent graduate of Case Western Reserve University where he studied Mathematics and played Magic. He is an intern at the MIT Media Lab where he contributes to DoppelLab in Joe Paradiso's Responsive Environments group. This was definitely his first direct contribution to Debian. It was also one of the most intense command-line experiences he has had so far. Noah wasn't originally planning to come, but we were having lunch together before the hackathon, and I convinced him to join us. Noah fixed #625177, a fails-to-build-from-source (FTBFS) bug in nslint. The problem was that "-Wl" was instead written in all lowercase in the debian/rules file, as "-wl". Noah fixed that, making sure the package properly built in pbuilder, and then spent some quality time with lintian figuring out the right way to write a debian/changelog. That's a wrap! We'll hopefully have one again in a few months, and before that, I hope to write up a guide so that we run things even more smoothly next time.