Search Results: "gcs"

20 June 2024

C.J. Collier: Signed NVIDIA drivers on Google Cloud Dataproc 2.2

Hello folks, I ve been working this year on better integrating NVIDIA hardware with the Google Cloud Dataproc product (Hadoop on Google Cloud) running the default cluster node image. We have an open bug[1] in the initialization-actions repo regarding creation failures upon enabling secure boot. This is because with secure boot, kernel driver code has its signature verified before insmod places the symbols into kernel memory. The verification process involves reading trust root certificates from EFI variables, and validating that the signatures on the kernel driver either a) were made directly by one of the certificates in the boot sector or b) were made by certificates which chain up to one of them. This means that Dataproc disk images must have a certificate installed into them. My work on the internals will likely start producing images which have certificates from Google in them. In the meantime, however, our users are left without a mechanism to have both secure boot enabled and install out-of-tree kernel modules such as the NVIDIA GPU drivers. To that end, I ve got PR #83[2] open with the GoogleCloudDataproc/custom-images github repository. This PR introduces a new argument to the custom image creation script, trusted-cert , the argument of which is the path to a DER-encoded certificate to be included in the certificate database in the EFI variables of the disk s boot sector. I ve written up the instructions on creating a custom image with a trusted certificate here: https://github.com/cjac/custom-images/blob/secure-boot-custom-image/examples/secure-boot/README.md Here is a set of commands that can be used to create a Dataproc custom image with certificate installed to the EFI s db variable. You can run these commands from the root directory of a checkout such as this:

git clone https://github.com/cjac/custom-images.git --branch secure-boot-custom-image --single-branch
pushd custom-images
PROJECT_ID=your-project-here
PROJECT_NUMBER=your-project-nnnn-here
my_bucket=your-bucket-here
custom_image_zone=your-zone-here
gcloud projects add-iam-policy-binding $ PROJECT_ID  \
        --member=serviceAccount:$ PROJECT_NUMBER -compute@developer.gserviceaccount.com \
        --role=roles/secretmanager.secretAccessor
gcloud config set project $ PROJECT_ID 
gcloud auth login
eval $(bash examples/secure-boot/create-key-pair.sh)
metadata="public_secret_name=$ public_secret_name "
metadata="$ metadata ,private_secret_name=$ private_secret_name "
metadata="$ metadata ,secret_project=$ secret_project "
metadata="$ metadata ,secret_version=$ secret_version "
#dataproc_version=2.1-debian11
dataproc_version=2.2-debian12
#customization_script=examples/secure-boot/install-nvidia-driver-debian11.sh
customization_script=examples/secure-boot/install-nvidia-driver-debian12.sh
#image_name="nvidia-open-kernel-bullseye-$(date +%F)"
image_name="nvidia-open-kernel-bookworm-$(date +%F)"
disk_size_gb="50"
python generate_custom_image.py \
    --image-name $ image_name  \
    --dataproc-version $ dataproc_version  \
    --trusted-cert "tls/db.der" \
    --customization-script $ customization_script  \
    --metadata "$ metadata " \
    --zone "$ custom_image_zone " \
    --disk-size "$ disk_size_gb " \
    --no-smoke-test \
    --gcs-bucket "$ my_bucket "
popd
I d love to hear your feedback! [1] https://github.com/GoogleCloudDataproc/initialization-actions/issues/1058
[2] https://github.com/GoogleCloudDataproc/custom-images/pull/83

7 December 2023

Daniel Kahn Gillmor: New OpenPGP certificate for dkg, December 2023

dkg's New OpenPGP certificate in December 2023 In December of 2023, I'm moving to a new OpenPGP certificate. You might know my old OpenPGP certificate, which had an fingerprint of C29F8A0C01F35E34D816AA5CE092EB3A5CA10DBA. My new OpenPGP certificate has a fingerprint of: D477040C70C2156A5C298549BB7E9101495E6BF7. Both certificates have the same set of User IDs:
  • Daniel Kahn Gillmor
  • <dkg@debian.org>
  • <dkg@fifthhorseman.net>
You can find a version of this transition statement signed by both the old and new certificates at: https://dkg.fifthhorseman.net/2023-dkg-openpgp-transition.txt The new OpenPGP certificate is:
-----BEGIN PGP PUBLIC KEY BLOCK-----
xjMEZXEJyxYJKwYBBAHaRw8BAQdA5BpbW0bpl5qCng/RiqwhQINrplDMSS5JsO/Y
O+5Zi7HCwAsEHxYKAH0FgmVxCcsDCwkHCRC7fpEBSV5r90cUAAAAAAAeACBzYWx0
QG5vdGF0aW9ucy5zZXF1b2lhLXBncC5vcmfUAgfN9tyTSxpxhmHA1r63GiI4v6NQ
mrrWVLOBRJYuhQMVCggCmwECHgEWIQTUdwQMcMIValwphUm7fpEBSV5r9wAAmaEA
/3MvYJMxQdLhIG4UDNMVd2bsovwdcTrReJhLYyFulBrwAQD/j/RS+AXQIVtkcO9b
l6zZTAO9x6yfkOZbv0g3eNyrAs0QPGRrZ0BkZWJpYW4ub3JnPsLACwQTFgoAfQWC
ZXEJywMLCQcJELt+kQFJXmv3RxQAAAAAAB4AIHNhbHRAbm90YXRpb25zLnNlcXVv
aWEtcGdwLm9yZ4l+Z3i19Uwjw3CfTNFCDjRsoufMoPOM7vM8HoOEdn/vAxUKCAKb
AQIeARYhBNR3BAxwwhVqXCmFSbt+kQFJXmv3AAALZQEAhJsgouepQVV98BHUH6Sv
WvcKrb8dQEZOvHFbZQQPNWgA/A/DHkjYKnUkCg8Zc+FonqOS/35sHhNA8CwqSQFr
tN4KzRc8ZGtnQGZpZnRoaG9yc2VtYW4ubmV0PsLACgQTFgoAfQWCZXEJywMLCQcJ
ELt+kQFJXmv3RxQAAAAAAB4AIHNhbHRAbm90YXRpb25zLnNlcXVvaWEtcGdwLm9y
ZxLvwkgnslsAuo+IoSa9rv8+nXpbBdab2Ft7n4H9S+d/AxUKCAKbAQIeARYhBNR3
BAxwwhVqXCmFSbt+kQFJXmv3AAAtFgD4wqcUfQl7nGLQOcAEHhx8V0Bg8v9ov8Gs
Y1ei1BEFwAD/cxmxmDSO0/tA+x4pd5yIvzgfGYHSTxKS0Ww3hzjuZA7NE0Rhbmll
bCBLYWhuIEdpbGxtb3LCwA4EExYKAIAFgmVxCcsDCwkHCRC7fpEBSV5r90cUAAAA
AAAeACBzYWx0QG5vdGF0aW9ucy5zZXF1b2lhLXBncC5vcmd7X4TgiINwnzh4jar0
Pf/b5hgxFPngCFxJSmtr/f0YiQMVCggCmQECmwECHgEWIQTUdwQMcMIValwphUm7
fpEBSV5r9wAAMuwBAPtMonKbhGOhOy+8miAb/knJ1cIPBjLupJbjM+NUE1WyAQD1
nyGW+XwwMrprMwc320mdJH9B0jdokJZBiN7++0NoBM4zBGVxCcsWCSsGAQQB2kcP
AQEHQI19uRatkPSFBXh8usgciEDwZxTnnRZYrhIgiFMybBDQwsC/BBgWCgExBYJl
cQnLCRC7fpEBSV5r90cUAAAAAAAeACBzYWx0QG5vdGF0aW9ucy5zZXF1b2lhLXBn
cC5vcmfCopazDnq6hZUsgVyztl5wmDCmxI169YLNu+IpDzJEtQKbAr6gBBkWCgBv
BYJlcQnLCRB3LRYeNc1LgUcUAAAAAAAeACBzYWx0QG5vdGF0aW9ucy5zZXF1b2lh
LXBncC5vcmcQglI7G7DbL9QmaDkzcEuk3QliM4NmleIRUW7VvIBHMxYhBHS8BMQ9
hghL6GcsBnctFh41zUuBAACwfwEAqDULksr8PulKRcIP6N9NI/4KoznyIcuOHi8q
Gk4qxMkBAIeV20SPEnWSw9MWAb0eKEcfupzr/C+8vDvsRMynCWsDFiEE1HcEDHDC
FWpcKYVJu36RAUlea/cAAFD1AP0YsE3Eeig1tkWaeyrvvMf5Kl1tt2LekTNWDnB+
FUG9SgD+Ka8vfPR8wuV8D3y5Y9Qq9xGO+QkEBCW0U1qNypg65QHOOARlcQnLEgor
BgEEAZdVAQUBAQdAWTLEa0WmnhUmDBdWXX0ZlYAa4g1CK/fXg0NPOQSteA4DAQgH
wsAABBgWCgByBYJlcQnLCRC7fpEBSV5r90cUAAAAAAAeACBzYWx0QG5vdGF0aW9u
cy5zZXF1b2lhLXBncC5vcmexrMBZe0QdQ+ZJOZxFkAiwCw2I7yTSF2Ox9GVFWKmA
mAKbDBYhBNR3BAxwwhVqXCmFSbt+kQFJXmv3AABcJQD/f4ltpSvLBOBEh/C2dIYa
dgSuqkCqq0B4WOhFRkWJZlcA/AxqLWG4o8UrrmwrmM42FhgxKtEXwCSHE00u8wR4
Up8G
=9Yc8
-----END PGP PUBLIC KEY BLOCK-----
When I have some reasonable number of certifications, i'll update the certificate associated with my e-mail addresses on https://keys.openpgp.org, in DANE, and in WKD. Until then, those lookups should continue to provide the old certificate.

9 August 2023

Antoine Beaupr : OpenPGP key transition

This is a short announcement to say that I have changed my main OpenPGP key. A signed statement is available with the cryptographic details but, in short, the reason is that I stopped using my old YubiKey NEO that I have worn on my keyring since 2015. I now have a YubiKey 5 which supports ED25519 which features much shorter keys and faster decryption. It allowed me to move all my secret subkeys on the key (including encryption keys) while retaining reasonable performance. I have written extensive documentation on how to do that OpenPGP key rotation and also YubiKey OpenPGP operations.

Warning on storing encryption keys on a YubiKey People wishing to move their private encryption keys to such a security token should be very careful as there are special precautions to take for disaster recovery. I am toying with the idea of writing an article specifically about disaster recovery for secrets and backups, dealing specifically with cases of death or disabilities.

Autocrypt changes One nice change is the impact on Autocrypt headers, which are considerably shorter. Before, the header didn't even fit on a single line in an email, it overflowed to five lines:
Autocrypt: addr=anarcat@torproject.org; prefer-encrypt=nopreference;
 keydata=xsFNBEogKJ4BEADHRk8dXcT3VmnEZQQdiAaNw8pmnoRG2QkoAvv42q9Ua+DRVe/yAEUd03EOXbMJl++YKWpVuzSFr7IlZ+/lJHOCqDeSsBD6LKBSx/7uH2EOIDizGwfZNF3u7X+gVBMy2V7rTClDJM1eT9QuLMfMakpZkIe2PpGE4g5zbGZixn9er+wEmzk2mt20RImMeLK3jyd6vPb1/Ph9+bTEuEXi6/WDxJ6+b5peWydKOdY1tSbkWZgdi+Bup72DLUGZATE3+Ju5+rFXtb/1/po5dZirhaSRZjZA6sQhyFM/ZhIj92mUM8JJrhkeAC0iJejn4SW8ps2NoPm0kAfVu6apgVACaNmFb4nBAb2k1KWru+UMQnV+VxDVdxhpV628Tn9+8oDg6c+dO3RCCmw+nUUPjeGU0k19S6fNIbNPRlElS31QGL4H0IazZqnE+kw6ojn4Q44h8u7iOfpeanVumtp0lJs6dE2nRw0EdAlt535iQbxHIOy2x5m9IdJ6q1wWFFQDskG+ybN2Qy7SZMQtjjOqM+CmdeAnQGVwxowSDPbHfFpYeCEb+Wzya337Jy9yJwkfa+V7e7Lkv9/OysEsV4hJrOh8YXu9a4qBWZvZHnIO7zRbz7cqVBKmdrL2iGqpEUv/x5onjNQwpjSVX5S+ZRBZTzah0w186IpXVxsU8dSk0yeQskblrwARAQABzSlBbnRvaW5lIEJlYXVwcsOpIDxhbmFyY2F0QHRvcnByb2plY3Qub3JnPsLBlAQTAQgAPgIbAwULCQgHAwUVCgkICwUWAgMBAAIeAQIXgBYhBI3JAc5kFGwEitUPu3khUlJ7dZIeBQJihnFIBQkacFLiAAoJEHkhUlJ7dZIeXNAP/RsX+27l9K5uGspEaMH6jabAFTQVWD8Ch1om9YvrBgfYtq2k/m4WlkMh9IpT89Ahmlf0eq+V1Vph4wwXBS5McK0dzoFuHXJa1WHThNMaexgHhqJOs
 S60bWyLH4QnGxNaOoQvuAXiCYV4amKl7hSuDVZEn/9etDgm/UhGn2KS3yg0XFsqI7V/3RopHiDT+k7+zpAKd3st2V74w6ht+EFp2Gj0sNTBoCdbmIkRhiLyH9S4B+0Z5dUCUEopGIKKOSbQwyD5jILXEi7VTZhN0CrwIcCuqNo7OXI6e8gJd8McymqK4JrVoCipJbLzyOLxZMxGz8Ki0b9O844/DTzwcYcg9I1qogCsGmZfgVze2XtGxY+9zwSpeCLeef6QOPQ0uxsEYSfVgS+onCesSRCgwAPmppPiva+UlGuIMun87gPpQpV2fqFg/V8zBxRvs6YTGcfcQjfMoBHmZTGb+jk1//QAgnXMO7fGG38YH7iQSSzkmodrH2s27ZKgUTHVxpBL85ptftuRqbR7MzIKXZsKdA88kjIKKXwMmez9L1VbJkM4k+1Kzc5KdVydwi+ujpNegF6ZU8KDNFiN9TbDOlRxK5R+AjwdS8ZOIa4nci77KbNF9OZuO3l/FZwiKp8IFJ1nK7uiKUjmCukL0od/6X2rJtAzJmO5Co93ZVrd5r48oqUvjklzzsBNBFmeC3oBCADEV28RKzbv3dEbOocOsJQWr1R0EHUcbS270CrQZfb9VCZWkFlQ/1ypqFFQSjmmUGbNX2CG5mivVsW6Vgm7gg8HEnVCqzL02BPY4OmylskYMFI5Bra2wRNNQBgjg39L9XU4866q3BQzJp3r0fLRVH8gHM54Jf0FVmTyHotR/Xiw5YavNy2qaQXesqqUv8HBIha0rFblbuYI/cFwOtJ47gu0QmgrU0ytDjlnmDNx4rfsNylwTIHS0Oc7Pezp7MzLmZxnTM9b5VMprAXnQr4rewXCOUKBSto+j4rD5/77DzXw96bbueNruaupb2Iy2OHXNGkB0vKFD3xHsXE2x75NBovtABEBAAHCwqwEGAEIACAWIQSNyQHOZBRsBIrVD7t5IVJSe3WSHgUCWZ4LegIbAgFACRB5IV
 JSe3WSHsB0IAQZAQgAHRYhBHsWQgTQlnI7AZY1qz6h3d2yYdl7BQJZngt6AAoJED6h3d2yYdl7CowH/Rp7GHEoPZTSUK8Ss7crwRmuAIDGBbSPkZbGmm4bOTaNs/gealc2tsVYpoMx7aYgqUW+t+84XciKHT+bjRv8uBnHescKZgDaomDuDKc2JVyx6samGFYuYPcGFReRcdmH0FOoPCn7bMW5mTPztV/wIA80LZD9kPKIXanfUyI3HLP0BPwZG4WTpKzJaalR1BNwu2oF6kEK0ymH3LfDiJ5Sr6emI2jrm4gH+/19ux/x+ST4tvm2PmH3BSQOPzgiqDiFd7RZoAIhmwr3FW4epsK9LtSxsi9gZ2vATBKO1oKtb6olW/keQT6uQCjqPSGojwzGRT2thEANH+5t6Vh0oDPZhrKUXRAAxHMBNHEaoo/M0sjZo+5OF3Ig1rMnI6XbKskLv6hu13cCymW0w/5E4XuYnyQ1cNC3pLvqDQbDx5mAPfBVHuqxJdRLQ3yDM/D2QIsxnkzQwi0FsJuni4vuJzWK/NHHDCvxMCh0YmSgbptUtgW8/niatd2Y6MbfRGxUHoctKtzqzivC8hKMTFrj4AbZhg/e9QVCsh5zSXtpWP0qFDJsxRMx0/432n9d4XUiy4U672r9Q09SsynB3QN6nTaCTWCIxGxjIb+8kJrRqTGwy/PElHX6kF0vQUWZNf2ITV1sd6LK/s/7sH+x4rzgUEHrsKr/qPvY3rUY/dQLd+owXesY83ANOu6oMWhSJnPMksbNa4tIKKbjmw3CFIOfoYHOWf3FtnydHNXoXfj4nBX8oSnkfhLILTJgf6JDFXfw6mTsv/jMzIfDs7PO1LK2oMK0+prSvSoM8bP9dmVEGIurzsTGjhTOBcb0zgyCmYVD3S48vZlTgHszAes1zwaCyt3/tOwrzU5JsRJVns+B/TUYaR/u3oIDMDygvE5ObWxXaFVnCC59r+zl0FazZ0ouyk2AYIR
 zHf+n1n98HCngRO4FRel2yzGDYO2rLPkXRm+NHCRvUA/i4zGkJs2AV0hsKK9/x8uMkBjHAdAheXhY+CsizGzsKjjfwvgqf84LwAzSDdZqLVE2yGTOwU0ESiArJwEQAJhtnC6pScWjzvvQ6rCTGAai6hrRiN6VLVVFLIMaMnlUp92EtgVSNpw6kANtRTpKXUB5fIPZVUrVdfEN06t96/6LE42tgifDAFyFTZY5FdHHri1GG/Cr39MpW2VqCDCtTTPVWHTUlU1ZG631BJ+9NB+ce58TmLr6wBTQrT+W367eRFBC54EsLNb7zQAspCn9pw1xf1XNHOGnrAQ4r9BXhOW5B8CzRd4nLRQwVgtw/c5M/bjemAOoq2WkwN+0mfJe4TSfHwFUozXuN274X+0Gr10fhp8xEDYuQM0qu6W3aDXMBBwIu0jTNudEELsTzhKUbqpsBc9WjwNMCZoCuSw/RTpFBV35mXbqQoQgbcU7uWZslLl9Wvv/C6rjXgd+GeX8SGBjTqq1ZkTv5UXLHTNQzPnbkNEExzqToi/QdSjFMIACnakeOSxc0ckfnsd9pfGv1PUyPyiwrHiqWFzBijzGIZEHxhNGFxAkXwTJR7Pd40a7RDxwbO6p/TSIIum41JtteehLHwTRDdQNMoyfLxuNLEtNYS0uR2jYI1EPQfCNWXCdT2ZK/l6GVP6jyB/olHBIOr+oVXqJh+48ki8cATPczhq3fUr7UivmguGwD67/4omZ4PCKtz1hNndnyYFS9QldEGo+AsB3AoUpVIA0XfQVkxD9IZr+Zu6aJ6nWq4M2bsoxABEBAAHCwXYEGAEIACACGwwWIQSNyQHOZBRsBIrVD7t5IVJSe3WSHgUCWPerZAAKCRB5IVJSe3WSHkIgEACTpxdn/FKrwH0/LDpZDTKWEWm4416l13RjhSt9CUhZ/Gm2GNfXcVTfoF/jKXXgjHcV1DHjfLUPmPVwMdqlf5ACOiFqIUM2ag/OEARh356w
 YG7YEobMjX0CThKe6AV2118XNzRBw/S2IO1LWnL5qaGYPZONUa9Pj0OaErdKIk/V1wge8Zoav2fQPautBcRLW5VA33PH1ggoqKQ4ES1hc9HC6SYKzTCGixu97mu/vjOa8DYgM+33TosLyNy+bCzw62zJkMf89X0tTSdaJSj5Op0SrRvfgjbC2YpJOnXxHr9qaXFbBZQhLjemZi6zRzUNeJ6A3Nzs+gIc4H7s/bYBtcd4ugPEhDeCGffdS3TppH9PnvRXfoa5zj5bsKFgjqjWolCyAmEvd15tXz5yNXtvrpgDhjF5ozPiNp/1EeWX4DxbH2i17drVu4fXwauFZ6lcsAcJxnvCA28RlQlmEQu/gFOx1axVXf6GIuXnQSjQN6qJbByUYrdc/cFCxPO2/lGuUxnufN9Tvb51Qh54laPgGLrlD2huQeSD9Sxa0MNUjNY0qLqaReT99Ygb2LPYGSLoFVx9iZz6sZNt07LqCx9qNgsJwsdmwYsNpMuFbc7nkWjtlEqzsXZHTvYN654p43S+hcAhmmOzQZcew6h71fAJLciiqsPBnCEdgCGFAWhZZdPkMA==
After the change, the entire key fits on a single line, neat!
Autocrypt: addr=anarcat@torproject.org; prefer-encrypt=nopreference;
 keydata=xjMEZHZPzhYJKwYBBAHaRw8BAQdAWdVzOFRW6FYVpeVaDo3sC4aJ2kUW4ukdEZ36UJLAHd7NKUFudG9pbmUgQmVhdXByw6kgPGFuYXJjYXRAdG9ycHJvamVjdC5vcmc+wpUEExYIAD4WIQS7ts1MmNdOE1inUqYCKTpvpOU0cwUCZHZgvwIbAwUJAeEzgAULCQgHAwUVCgkICwUWAgMBAAIeAQIXgAAKCRACKTpvpOU0c47SAPdEqfeHtFDx9UPhElZf7nSM69KyvPWXMocu9Kcu/sw1AQD5QkPzK5oxierims6/KUkIKDHdt8UcNp234V+UdD/ZB844BGR2UM4SCisGAQQBl1UBBQEBB0CYZha2IMY54WFXMG4S9/Smef54Pgon99LJ/hJ885p0ZAMBCAfCdwQYFggAIBYhBLu2zUyY104TWKdSpgIpOm+k5TRzBQJkdlDOAhsMAAoJEAIpOm+k5TRzBg0A+IbcsZhLx6FRIqBJCdfYMo7qovEo+vX0HZsUPRlq4HkBAIctCzmH3WyfOD/aUTeOF3tY+tIGUxxjQLGsNQZeGrQI
Note that I have implemented my own kind of ridiculous Autocrypt support for the Notmuch Emacs email client I use, see this elisp code. To import keys, I pipe the message into this script which is basically just:
sq autocrypt decode   gpg --import
... thanks to Sequoia best-of-class Autocrypt support.

Note on OpenPGP usage While some have claimed OpenPGP's death, I believe those are overstated. Maybe it's just me, but I still use OpenPGP for my password management, to authenticate users and messages, and it's the interface to my YubiKey for authenticating with SSH servers. I understand people feel that OpenPGP is possibly insecure, counter-intuitive and full of problems, but I think most of those problems should instead be attributed to its current flagship implementation, GnuPG. I have tried to work with GnuPG for years, and it keeps surprising me with evilness and oddities. I have high hopes that the Sequoia project can bring some sanity into this space, and I also hope that RFC4880bis can eventually get somewhere so we have a more solid specification with more robust crypto. It's kind of a shame that this has dragged on for so long, but Update: there's a separate draft called openpgp-crypto-refresh that might actually be adopted as the "OpenPGP RFC" soon! And it doesn't keep real work from happening in Sequoia and other implementations. Thunderbird rewrote their OpenPGP implementation with RNP (which was, granted, a bumpy road because it lost compatibility with GnuPG) and Sequoia now has a certificate store with trust management (but still no secret storage), preliminary OpenPGP card support and even a basic GnuPG compatibility layer. I'm also curious to try out the OpenPGP CA capabilities. So maybe it's just because I'm becoming an old fart that doesn't want to change tools, but so far I haven't seen a good incentive in switching away from OpenPGP, and haven't found a good set of tools that completely replace it. Maybe OpenSSH's keys and CA can eventually replace it, but I suspect they will end up rebuilding most of OpenPGP anyway, just more slowly. If they do, let's hope they avoid the mistakes our community has done in the past at least...

14 May 2023

C.J. Collier: Early Access: Inserting JSON data to BigQuery from Spark on Dataproc

Hello folks! We recently received a case letting us know that Dataproc 2.1.1 was unable to write to a BigQuery table with a column of type JSON. Although the BigQuery connector for Spark has had support for JSON columns since 0.28.0, the Dataproc images on the 2.1 line still cannot create tables with JSON columns or write to existing tables with JSON columns. The customer has graciously granted permission to share the code we developed to allow this operation. So if you are interested in working with JSON column tables on Dataproc 2.1 please continue reading! Use the following gcloud command to create your single-node dataproc cluster:
IMAGE_VERSION=2.1.1-debian11
REGION=us-west1
ZONE=$ REGION -a
CLUSTER_NAME=pick-a-cluster-name
gcloud dataproc clusters create $ CLUSTER_NAME  \
    --region $ REGION  \
    --zone $ ZONE  \
    --single-node \
    --master-machine-type n1-standard-4 \
    --master-boot-disk-type pd-ssd \
    --master-boot-disk-size 50 \
    --image-version $ IMAGE_VERSION  \
    --max-idle=90m \
    --enable-component-gateway \
    --scopes 'https://www.googleapis.com/auth/cloud-platform'
The following file is the Scala code used to write JSON structured data to a BigQuery table using Spark. The file following this one can be executed from your single-node Dataproc cluster. Main.scala
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types. Metadata, StringType, StructField, StructType 
import org.apache.spark.sql. Row, SaveMode, SparkSession 
import org.apache.spark.sql.avro
import org.apache.avro.specific
  val env = "x"
  val my_bucket = "cjac-docker-on-yarn"
  val my_table = "dataset.testavro2"
    val spark = env match  
      case "local" =>
        SparkSession
          .builder()
          .config("temporaryGcsBucket", my_bucket)
          .master("local")
          .appName("isssue_115574")
          .getOrCreate()
      case _ =>
        SparkSession
          .builder()
          .config("temporaryGcsBucket", my_bucket)
          .appName("isssue_115574")
          .getOrCreate()
     
  // create DF with some data
  val someData = Seq(
    Row(""" "name":"name1", "age": 10  """, "id1"),
    Row(""" "name":"name2", "age": 20  """, "id2")
  )
  val schema = StructType(
    Seq(
      StructField("user_age", StringType, true),
      StructField("id", StringType, true)
    )
  )
  val avroFileName = s"gs://$ my_bucket /issue_115574/someData.avro"
  
  val someDF = spark.createDataFrame(spark.sparkContext.parallelize(someData), schema)
  someDF.write.format("avro").mode("overwrite").save(avroFileName)
  val avroDF = spark.read.format("avro").load(avroFileName)
  // set metadata
  val dfJSON = avroDF
    .withColumn("user_age_no_metadata", col("user_age"))
    .withMetadata("user_age", Metadata.fromJson(""" "sqlType":"JSON" """))
  dfJSON.show()
  dfJSON.printSchema
  // write to BigQuery
  dfJSON.write.format("bigquery")
    .mode(SaveMode.Overwrite)
    .option("writeMethod", "indirect")
    .option("intermediateFormat", "avro")
    .option("useAvroLogicalTypes", "true")
    .option("table", my_table)
    .save()
repro.sh:
#!/bin/bash
PROJECT_ID=set-yours-here
DATASET_NAME=dataset
TABLE_NAME=testavro2
# We have to remove all of the existing spark bigquery jars from the local
# filesystem, as we will be using the symbols from the
# spark-3.3-bigquery-0.30.0.jar below.  Having existing jar files on the
# local filesystem will result in those symbols having higher precedence
# than the one loaded with the spark-shell.
sudo find /usr -name 'spark*bigquery*jar' -delete
# Remove the table from the bigquery dataset if it exists
bq rm -f -t $PROJECT_ID:$DATASET_NAME.$TABLE_NAME
# Create the table with a JSON type column
bq mk --table $PROJECT_ID:$DATASET_NAME.$TABLE_NAME \
  user_age:JSON,id:STRING,user_age_no_metadata:STRING
# Load the example Main.scala 
spark-shell -i Main.scala \
  --jars /usr/lib/spark/external/spark-avro.jar,gs://spark-lib/bigquery/spark-3.3-bigquery-0.30.0.jar
# Show the table schema when we use  bq mk --table  and then load the avro
bq query --use_legacy_sql=false \
  "SELECT ddl FROM $DATASET_NAME.INFORMATION_SCHEMA.TABLES where table_name='$TABLE_NAME'"
# Remove the table so that we can see that the table is created should it not exist
bq rm -f -t $PROJECT_ID:$DATASET_NAME.$TABLE_NAME
# Dynamically generate a DataFrame, store it to avro, load that avro,
# and write the avro to BigQuery, creating the table if it does not already exist
spark-shell -i Main.scala \
  --jars /usr/lib/spark/external/spark-avro.jar,gs://spark-lib/bigquery/spark-3.3-bigquery-0.30.0.jar
# Show that the table schema does not differ from one created with a bq mk --table
bq query --use_legacy_sql=false \
  "SELECT ddl FROM $DATASET_NAME.INFORMATION_SCHEMA.TABLES where table_name='$TABLE_NAME'"
Google BigQuery has supported JSON data since October of 2022, but until now, it has not been possible, on generally available Dataproc clusters, to interact with these columns using the Spark BigQuery Connector. JSON column type support was introduced in spark-bigquery-connector release 0.28.0.

26 January 2022

Gunnar Wolf: Progvis Now in Debian proper! (unstable)

Progvis finally made it into Debian! What is it, you ask? It is a great tool to teach about memory management and concurrency. I first saw progvis in the poster presentation his author, Filip Str mb ck, did last year at the 52nd ACM Technical Sympossium on Computer Science Education (SIGCSE), immediately recognizing it as a tool I wanted to use at my classes, and being it free software, make it available for all interested Debian users. Quoting from Progvis Web page:
This is a program visualization tool aimed at concurrent programs and related issues. The tool itself is mostly language agnostic, and relies on Storm to compile the provided code and provide basic debug information. The generated code is then inspected and instrumented to provide an experience similar to a basic debugger. The tool emphasizes a visual representation of the object hierarchy that is manipulated by the executed program to make it easy to understand how it looks. In particular, a visual representation is beneficial over a text representation since it makes it easier to find shared data that might need to be synchronized in a concurrent program. As mentioned, the tool is aimed at concurrent programs. Therefore, it allows spawning multiple threads running the same program to see if that affects the program s execution (this is mostly interesting if global variables are used). Furthermore, any spawned threads also appear in the tool, and the user may control them independently to explore possible race conditions or other synchronization errors. If enabled from the menu bar, the tool keeps track of reads and writes to the data structure in order to highlight basic race conditions in addition to deadlocks.
So, what is this Storm thing? Filip promptly informed me that Progvis is not just a pedagogical tool Or rather, that it is part of something bigger. Progvis is a program built using the Storm programming language platform is more than a compiler; it presents as a framework for creating languages, designed to make easy to implement languages that can be extended with new syntax and semantics. Storm is much more than what I have explored, and can be used as an interactive compiler, a language server used as a service for highlighting and completing in IDEs. But I won t dig much more into Storm (which is, of course, now also available in Debian as well as the libraries built from the same source). Back to progvis: It implements a very-close-to-C++ language, with some details to better suit its purpose (i.e. instead of using the usual pthread implementation, an own thread model is used; i.e. thread creation is handled via int thread_id = thread_name(funcname, &params) instead of the more complex pthread_create() function (including details such as the thread object being passed as by reference as a parameter) All in all, while I have not yet taken full advantage of this tool in my teaching, it has helped me show somewhat hard to grasp concepts such as: All in all, a great tool. I hope you find it useful and enjoyable as well! PS- I suggest you to install the progvis-examples package to get started. You will find some dozens of sample programs in /usr/share/doc/progvis-examples/examples; playing with them will help you better understand the tool and be able to better write your own programs.

22 January 2021

Enrico Zini: Polishing nspawn-runner

This post is part of a series about trying to setup a gitlab runner based on systemd-nspawn. I published the polished result as nspawn-runner on GitHub. gitlab-runner supports adding extra arguments to the custom scripts, and I can take advantage of that to pack all the various scripts that I prototyped so far into an all-in-one nspawn-runner command:
usage: nspawn-runner [-h] [-v] [--debug]
                      chroot-create,chroot-login,prepare,run,cleanup,gitlab-config,toml 
                     ...
Manage systemd-nspawn machines for CI runs.
positional arguments:
   chroot-create,chroot-login,prepare,run,cleanup,gitlab-config,toml 
                        sub-command help
    chroot-create       create a chroot that serves as a base for ephemeral
                        machines
    chroot-login        enter the chroot to perform maintenance
    prepare             start an ephemeral system for a CI run
    run                 run a command inside a CI machine
    cleanup             cleanup a CI machine after it's run
    gitlab-config       configuration step for gitlab-runner
    toml                output the toml configuration for the custom runner
optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         verbose output
  --debug               verbose output
chroot maintenance chroot-create and chroot-login are similar to what pbuilder, cowbuilder, schroot, debspawn and similar tools do. They only take a chroot name, and default the rest of paths to where nspawn-runner expects things to be under /var/lib/nspawn-runner. gitlab-runner setup nspawn-runner toml <chroot-name> outputs a snippet to add to /etc/gitlab-runner/config.toml to configure the CI. For example:
$ ./nspawn-runner toml buster
[[runners]]
  name="buster"
  url="TODO"
  token="TODO"
  executor = "custom"
  builds_dir = "/var/lib/nspawn-runner/.build"
  cache_dir = "/var/lib/nspawn-runner/.cache"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.custom]
    config_exec = "/home/enrico/ /nspawn-runner/nspawn-runner"
    config_args = ["gitlab-config"]
    config_exec_timeout = 200
    prepare_exec = "/home/enrico/ /nspawn-runner/nspawn-runner"
    prepare_args = ["prepare", "buster"]
    prepare_exec_timeout = 200
    run_exec = "/home/enrico/dev/nspawn-runner/nspawn-runner"
    run_args = ["run"]
    cleanup_exec = "/home/enrico/ /nspawn-runner/nspawn-runner"
    cleanup_args = ["cleanup"]
    cleanup_exec_timeout = 200
    graceful_kill_timeout = 200
    force_kill_timeout = 200
One needs to remember to set url and token, and the runner is configured. The end, for now This is it, it works! Time will tell what issues or ideas will come up: for now, it's a pretty decent first version. The various prepare, run, cleanup steps are generic enough that they can be used outside of gitlab-runner: feel free to build on them, and drop me a note if you find this useful! Updated: Issues noticed so far, that could go into a new version:

Enrico Zini: Assembling the custom runner

This post is part of a series about trying to setup a gitlab runner based on systemd-nspawn. I published the polished result as nspawn-runner on GitHub. The plan Back to custom runners, here's my plan: The scripts Here are the scripts based on Federico's work: base.sh with definitions sourced by all scripts:
MACHINE="run-$CUSTOM_ENV_CI_JOB_ID"
ROOTFS="/var/lib/gitlab-runner-custom-chroots/buster"
OVERLAY="/var/lib/gitlab-runner-custom-chroots/$MACHINE"
config.sh doing nothing:
#!/bin/sh
exit 0
prepare.sh starting the machine:
#!/bin/bash
source $(dirname "$0")/base.sh
set -eo pipefail
# trap errors as a CI system failure
trap "exit $SYSTEM_FAILURE_EXIT_CODE" ERR
logger "gitlab CI: preparing $MACHINE"
mkdir -p $OVERLAY
systemd-run \
  -p 'KillMode=mixed' \
  -p 'Type=notify' \
  -p 'RestartForceExitStatus=133' \
  -p 'SuccessExitStatus=133' \
  -p 'Slice=machine.slice' \
  -p 'Delegate=yes' \
  -p 'TasksMax=16384' \
  -p 'WatchdogSec=3min' \
  systemd-nspawn --quiet -D $ROOTFS \
    --overlay="$ROOTFS:$OVERLAY:/"
    --machine="$MACHINE" --boot --notify-ready=yes
run.sh running the provided scripts in the machine:
#!/bin/bash
logger "gitlab CI: running $@"
source $(dirname "$0")/base.sh
set -eo pipefail
trap "exit $SYSTEM_FAILURE_EXIT_CODE" ERR
systemd-run --quiet --pipe --wait --machine="$MACHINE" /bin/bash < "$1"
cleanup.sh stopping the machine and removing the writable overlay directory:
#!/bin/bash
logger "gitlab CI: cleanup $@"
source $(dirname "$0")/base.sh
machinectl stop "$MACHINE"
rm -rf $OVERLAY
Trying out the plan I tried a manual invocation of gitlab-runner, and it worked perfectly:
# mkdir /var/lib/gitlab-runner-custom-chroots/build/
# mkdir /var/lib/gitlab-runner-custom-chroots/cache/
# gitlab-runner exec custom \
    --builds-dir /var/lib/gitlab-runner-custom-chroots/build/ \
    --cache-dir /var/lib/gitlab-runner-custom-chroots/cache/ \
    --custom-config-exec /var/lib/gitlab-runner-custom-chroots/config.sh \
    --custom-prepare-exec /var/lib/gitlab-runner-custom-chroots/prepare.sh \
    --custom-run-exec /var/lib/gitlab-runner-custom-chroots/run.sh \
    --custom-cleanup-exec /var/lib/gitlab-runner-custom-chroots/cleanup.sh \
    tests
Runtime platform                                    arch=amd64 os=linux pid=18662 revision=775dd39d version=13.8.0
Running with gitlab-runner 13.8.0 (775dd39d)
Preparing the "custom" executor
Using Custom executor...
Running as unit: run-r1be98e274224456184cbdefc0690bc71.service
executor not supported                              job=1 project=0 referee=metrics
Preparing environment
Getting source from Git repository
Executing "step_script" stage of the job script
WARNING: Starting with version 14.0 the 'build_script' stage will be replaced with 'step_script': https://gitlab.com/gitlab-org/gitlab-runner/-/issues/26426
Job succeeded
Deploy The remaining step is to deploy all this in /etc/gitlab-runner/config.toml:
concurrent = 1
check_interval = 0
[session_server]
  session_timeout = 1800
[[runners]]
  name = "nspawn runner"
  url = "http://gitlab.siweb.local/"
  token = " "
  executor = "custom"
  builds_dir = "/var/lib/gitlab-runner-custom-chroots/build/"
  cache_dir = "/var/lib/gitlab-runner-custom-chroots/cache/"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.custom]
    config_exec = "/var/lib/gitlab-runner-custom-chroots/config.sh"
    config_exec_timeout = 200
    prepare_exec = "/var/lib/gitlab-runner-custom-chroots/prepare.sh"
    prepare_exec_timeout = 200
    run_exec = "/var/lib/gitlab-runner-custom-chroots/run.sh"
    cleanup_exec = "/var/lib/gitlab-runner-custom-chroots/cleanup.sh"
    cleanup_exec_timeout = 200
    graceful_kill_timeout = 200
    force_kill_timeout = 200
Next steps My next step will be polishing all this in a way that makes deploying and maintaining a runner configuration easy.

2 February 2017

Paul Wise: FLOSS Activities January 2017

Changes

Issues

Review

Administration
  • Debian: reboot 1 non-responsive VM, redirect 2 users to support channels, redirect 1 contributor to xkb upstream, redirect 1 potential contributor, redirect 1 bug reporter to mirror team, ping 7 folks about restarting processes with upgraded libs, manually restart the sectracker process due to upgraded libs, restart the package tracker process due to upgraded libs, investigate failures connecting to the XMPP service, investigate /dev/shm issue on abel.d.o, clean up after rename of the fedmsg group.
  • Debian mentors: lintian/security updates & reboot
  • Debian packages: deploy 2 contributions to the live server
  • Debian wiki: unblacklist 1 IP address, whitelist 10 email addresses, disable 18 accounts with bouncing email, update email for 2 accounts with bouncing email, reported 1 Debian member as MIA, redirect 1 user to support channels, add 4 domains to the whitelist.
  • Reproducible builds: rescheduled Debian pyxplot:amd64/unstable for themill.
  • Openmoko: security updates & reboots.

Debian derivatives
  • Send the annual activity ping mail.
  • Happy new year messages on IRC, forward to the list.
  • Note that SerbianLinux does not provide source packages.
  • Expand URL shortener on SerbianLinux page.
  • Invite PelicanHPC, Netrunner, DietPi, Hamara Linux (on IRC), BitKey to the census.
  • Add research publications link to the census template
  • Fix Symbiosis sources.list
  • Enquired about SalentOS downtime
  • Fixed and removed some 404 BlankOn links (blog, English homepage)
  • Fixed changes to AstraLinux sources.list
  • Welcome Netrunner to the census

Sponsors I renewed my support of Software Freedom Conservancy. The openchange 1:2.2-6+deb8u1 upload was sponsored by my employer. All other work was done on a volunteer basis.

27 July 2015

Michael Stapelberg: dh-make-golang: creating Debian packages from Go packages

Recently, the pkg-go team has been quite busy, uploading dozens of Go library packages in order to be able to package gcsfuse (a user-space file system for interacting with Google Cloud Storage) and InfluxDB (an open-source distributed time series database). Packaging Go library packages (!) is a fairly repetitive process, so before starting my work on the dependencies for gcsfuse, I started writing a tool called dh-make-golang. Just like dh-make itself, the goal is to automatically create (almost) an entire Debian package. As I worked my way through the dependencies of gcsfuse, I refined how the tool works, and now I believe it s good enough for a first release. To demonstrate how the tool works, let s assume we want to package the Go library github.com/jacobsa/ratelimit:
midna /tmp $ dh-make-golang github.com/jacobsa/ratelimit
2015/07/25 18:25:39 Downloading "github.com/jacobsa/ratelimit/..."
2015/07/25 18:25:53 Determining upstream version number
2015/07/25 18:25:53 Package version is "0.0~git20150723.0.2ca5e0c"
2015/07/25 18:25:53 Determining dependencies
2015/07/25 18:25:55 
2015/07/25 18:25:55 Packaging successfully created in /tmp/golang-github-jacobsa-ratelimit
2015/07/25 18:25:55 
2015/07/25 18:25:55 Resolve all TODOs in itp-golang-github-jacobsa-ratelimit.txt, then email it out:
2015/07/25 18:25:55     sendmail -t -f < itp-golang-github-jacobsa-ratelimit.txt
2015/07/25 18:25:55 
2015/07/25 18:25:55 Resolve all the TODOs in debian/, find them using:
2015/07/25 18:25:55     grep -r TODO debian
2015/07/25 18:25:55 
2015/07/25 18:25:55 To build the package, commit the packaging and use gbp buildpackage:
2015/07/25 18:25:55     git add debian && git commit -a -m 'Initial packaging'
2015/07/25 18:25:55     gbp buildpackage --git-pbuilder
2015/07/25 18:25:55 
2015/07/25 18:25:55 To create the packaging git repository on alioth, use:
2015/07/25 18:25:55     ssh git.debian.org "/git/pkg-go/setup-repository golang-github-jacobsa-ratelimit 'Packaging for golang-github-jacobsa-ratelimit'"
2015/07/25 18:25:55 
2015/07/25 18:25:55 Once you are happy with your packaging, push it to alioth using:
2015/07/25 18:25:55     git push git+ssh://git.debian.org/git/pkg-go/packages/golang-github-jacobsa-ratelimit.git --tags master pristine-tar upstream
The ITP is often the most labor-intensive part of the packaging process, because any number of auto-detected values might be wrong: the repository owner might not be the Upstream Author , the repository might not have a short description, the long description might need some adjustments or the license might not be auto-detected.
midna /tmp $ cat itp-golang-github-jacobsa-ratelimit.txt
From: "Michael Stapelberg" <stapelberg AT debian.org>
To: submit@bugs.debian.org
Subject: ITP: golang-github-jacobsa-ratelimit -- Go package for rate limiting
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Package: wnpp
Severity: wishlist
Owner: Michael Stapelberg <stapelberg AT debian.org>
* Package name    : golang-github-jacobsa-ratelimit
  Version         : 0.0~git20150723.0.2ca5e0c-1
  Upstream Author : Aaron Jacobs
* URL             : https://github.com/jacobsa/ratelimit
* License         : Apache-2.0
  Programming Lang: Go
  Description     : Go package for rate limiting
 GoDoc (https://godoc.org/github.com/jacobsa/ratelimit)
 .
 This package contains code for dealing with rate limiting. See the
 reference (http://godoc.org/github.com/jacobsa/ratelimit) for more info.
TODO: perhaps reasoning
midna /tmp $
After filling in all the TODOs in the file, let s mail it out and get a sense of what else still needs to be done:
midna /tmp $ sendmail -t -f < itp-golang-github-jacobsa-ratelimit.txt
midna /tmp $ cd golang-github-jacobsa-ratelimit
midna /tmp/golang-github-jacobsa-ratelimit master $ grep -r TODO debian
debian/changelog:  * Initial release (Closes: TODO) 
midna /tmp/golang-github-jacobsa-ratelimit master $
After filling in these TODOs as well, let s have a final look at what we re about to build:
midna /tmp/golang-github-jacobsa-ratelimit master $ head -100 debian/**/*
==> debian/changelog <==                            
golang-github-jacobsa-ratelimit (0.0~git20150723.0.2ca5e0c-1) unstable; urgency=medium
  * Initial release (Closes: #793646)
 -- Michael Stapelberg <stapelberg@debian.org>  Sat, 25 Jul 2015 23:26:34 +0200
==> debian/compat <==
9
==> debian/control <==
Source: golang-github-jacobsa-ratelimit
Section: devel
Priority: extra
Maintainer: pkg-go <pkg-go-maintainers@lists.alioth.debian.org>
Uploaders: Michael Stapelberg <stapelberg@debian.org>
Build-Depends: debhelper (>= 9),
               dh-golang,
               golang-go,
               golang-github-jacobsa-gcloud-dev,
               golang-github-jacobsa-oglematchers-dev,
               golang-github-jacobsa-ogletest-dev,
               golang-github-jacobsa-syncutil-dev,
               golang-golang-x-net-dev
Standards-Version: 3.9.6
Homepage: https://github.com/jacobsa/ratelimit
Vcs-Browser: http://anonscm.debian.org/gitweb/?p=pkg-go/packages/golang-github-jacobsa-ratelimit.git;a=summary
Vcs-Git: git://anonscm.debian.org/pkg-go/packages/golang-github-jacobsa-ratelimit.git
Package: golang-github-jacobsa-ratelimit-dev
Architecture: all
Depends: $ shlibs:Depends ,
         $ misc:Depends ,
         golang-go,
         golang-github-jacobsa-gcloud-dev,
         golang-github-jacobsa-oglematchers-dev,
         golang-github-jacobsa-ogletest-dev,
         golang-github-jacobsa-syncutil-dev,
         golang-golang-x-net-dev
Built-Using: $ misc:Built-Using 
Description: Go package for rate limiting
 This package contains code for dealing with rate limiting. See the
 reference (http://godoc.org/github.com/jacobsa/ratelimit) for more info.
==> debian/copyright <==
Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Upstream-Name: ratelimit
Source: https://github.com/jacobsa/ratelimit
Files: *
Copyright: 2015 Aaron Jacobs
License: Apache-2.0
Files: debian/*
Copyright: 2015 Michael Stapelberg <stapelberg@debian.org>
License: Apache-2.0
Comment: Debian packaging is licensed under the same terms as upstream
License: Apache-2.0
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at
 .
 http://www.apache.org/licenses/LICENSE-2.0
 .
 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 .
 On Debian systems, the complete text of the Apache version 2.0 license
 can be found in "/usr/share/common-licenses/Apache-2.0".
==> debian/gbp.conf <==
[DEFAULT]
pristine-tar = True
==> debian/rules <==
#!/usr/bin/make -f
export DH_GOPKG := github.com/jacobsa/ratelimit
%:
	dh $@ --buildsystem=golang --with=golang
==> debian/source <==
head: error reading  debian/source : Is a directory
==> debian/source/format <==
3.0 (quilt)
midna /tmp/golang-github-jacobsa-ratelimit master $
Okay, then. Let s give it a shot and see if it builds:
midna /tmp/golang-github-jacobsa-ratelimit master $ git add debian && git commit -a -m 'Initial packaging'
[master 48f4c25] Initial packaging                                                      
 7 files changed, 75 insertions(+)
 create mode 100644 debian/changelog
 create mode 100644 debian/compat
 create mode 100644 debian/control
 create mode 100644 debian/copyright
 create mode 100644 debian/gbp.conf
 create mode 100755 debian/rules
 create mode 100644 debian/source/format
midna /tmp/golang-github-jacobsa-ratelimit master $ gbp buildpackage --git-pbuilder
[ ]
midna /tmp/golang-github-jacobsa-ratelimit master $ lintian ../golang-github-jacobsa-ratelimit_0.0\~git20150723.0.2ca5e0c-1_amd64.changes
I: golang-github-jacobsa-ratelimit source: debian-watch-file-is-missing
P: golang-github-jacobsa-ratelimit-dev: no-upstream-changelog
I: golang-github-jacobsa-ratelimit-dev: extended-description-is-probably-too-short
midna /tmp/golang-github-jacobsa-ratelimit master $
This package just built (as it should!), but occasionally one might need to disable a test and file an upstream bug about it. So, let s push this package to pkg-go and upload it:
midna /tmp/golang-github-jacobsa-ratelimit master $ ssh git.debian.org "/git/pkg-go/setup-repository golang-github-jacobsa-ratelimit 'Packaging for golang-github-jacobsa-ratelimit'"
Initialized empty shared Git repository in /srv/git.debian.org/git/pkg-go/packages/golang-github-jacobsa-ratelimit.git/
HEAD is now at ea6b1c5 add mrconfig for dh-make-golang
[master c5be5a1] add mrconfig for golang-github-jacobsa-ratelimit
 1 file changed, 3 insertions(+)
To /git/pkg-go/meta.git
   ea6b1c5..c5be5a1  master -> master
midna /tmp/golang-github-jacobsa-ratelimit master $ git push git+ssh://git.debian.org/git/pkg-go/packages/golang-github-jacobsa-ratelimit.git --tags master pristine-tar upstream
Counting objects: 31, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (25/25), done.
Writing objects: 100% (31/31), 18.38 KiB   0 bytes/s, done.
Total 31 (delta 2), reused 0 (delta 0)
To git+ssh://git.debian.org/git/pkg-go/packages/golang-github-jacobsa-ratelimit.git
 * [new branch]      master -> master
 * [new branch]      pristine-tar -> pristine-tar
 * [new branch]      upstream -> upstream
 * [new tag]         upstream/0.0_git20150723.0.2ca5e0c -> upstream/0.0_git20150723.0.2ca5e0c
midna /tmp/golang-github-jacobsa-ratelimit master $ cd ..
midna /tmp $ debsign golang-github-jacobsa-ratelimit_0.0\~git20150723.0.2ca5e0c-1_amd64.changes
[ ]
midna /tmp $ dput golang-github-jacobsa-ratelimit_0.0\~git20150723.0.2ca5e0c-1_amd64.changes   
Uploading golang-github-jacobsa-ratelimit using ftp to ftp-master (host: ftp.upload.debian.org; directory: /pub/UploadQueue/)
[ ]
Uploading golang-github-jacobsa-ratelimit_0.0~git20150723.0.2ca5e0c-1.dsc
Uploading golang-github-jacobsa-ratelimit_0.0~git20150723.0.2ca5e0c.orig.tar.bz2
Uploading golang-github-jacobsa-ratelimit_0.0~git20150723.0.2ca5e0c-1.debian.tar.xz
Uploading golang-github-jacobsa-ratelimit-dev_0.0~git20150723.0.2ca5e0c-1_all.deb
Uploading golang-github-jacobsa-ratelimit_0.0~git20150723.0.2ca5e0c-1_amd64.changes
midna /tmp $ cd golang-github-jacobsa-ratelimit 
midna /tmp/golang-github-jacobsa-ratelimit master $ git tag debian/0.0_git20150723.0.2ca5e0c-1
midna /tmp/golang-github-jacobsa-ratelimit master $ git push git+ssh://git.debian.org/git/pkg-go/packages/golang-github-jacobsa-ratelimit.git --tags master pristine-tar upstream
Total 0 (delta 0), reused 0 (delta 0)
To git+ssh://git.debian.org/git/pkg-go/packages/golang-github-jacobsa-ratelimit.git
 * [new tag]         debian/0.0_git20150723.0.2ca5e0c-1 -> debian/0.0_git20150723.0.2ca5e0c-1
midna /tmp/golang-github-jacobsa-ratelimit master $
Thanks for reading this far, and I hope dh-make-golang makes your life a tiny bit easier. As dh-make-golang just entered Debian unstable, you can install it using apt-get install dh-make-golang. If you have any feedback, I m eager to hear it.

27 March 2015

Olivier Berger: New short paper : Designing a virtual laboratory for a relational database MOOC with Vagrant, Debian, etc.

Here s a short preview of our latest accepted paper (to appear at CSEDU 2015), about the construction of VMs for the Relational Database MOOC using Vagrant, Debian, PostgreSQL (previous post), etc. :

Designing a virtual laboratory for a relational database MOOC Olivier Berger, J Paul Gibson, Claire Lecocq and Christian Bac Keywords: Remote Learning, Virtualization, Open Education Resources, MOOC, Vagrant Abstract: Technical advances in machine and system virtualization are creating opportunities for remote learning to provide significantly better support for active education approaches. Students now, in general, have personal computers that are powerful enough to support virtualization of operating systems and networks. As a conse- quence, it is now possible to provide remote learners with a common, standard, virtual laboratory and learning environment, independent of the different types of physical machines on which they work. This greatly enhances the opportunity for producing re-usable teaching materials that are actually re-used. However, configuring and installing such virtual laboratories is technically challenging for teachers and students. We report on our experience of building a virtual machine (VM) laboratory for a MOOC on relational databases. The architecture of our virtual machine is described in detail, and we evaluate the benefits of using the Vagrant tool for building and delivering the VM. TOC :
  • Introduction
    • A brief history of distance learning
    • Virtualization : the challenges
    • The design problem
  • The virtualization requirements
    • Scenario-based requirements
    • Related work on requirements
    • Scalability of existing approaches
  • The MOOC laboratory
    • Exercises and lab tools
    • From requirements to design
  • Making the VM as a Vagrant box
    • Portability issues
    • Delivery through Internet
    • Security
    • Availability of the box sources
  • Validation
    • Reliability Issues with VirtualBox
    • Student feedback and evaluation
  • Future work
    • Laboratory monitoring
    • More modular VMs
  • Conclusions
Bibliography
  • Alario-Hoyos et al., 2014
    Alario-Hoyos, C., P rez-Sanagust n, M., Kloos, C. D., and Mu oz Merino, P. J. (2014).
    Recommendations for the design and deployment of MOOCs: Insights about the MOOC digital education of the future deployed in Mir adaX.
    In Proceedings of the Second International Conference on Technological Ecosystems for Enhancing Multiculturality, TEEM 14, pages 403-408, New York, NY, USA. ACM.
  • Armbrust et al., 2010
    Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., and Zaharia, M. (2010).
    A view of cloud computing.
    Commun. ACM, 53:50-58.
  • Billingsley and Steel, 2014
    Billingsley, W. and Steel, J. R. (2014).
    Towards a supercollaborative software engineering MOOC.
    In Companion Proceedings of the 36th International Conference on Software Engineering, pages 283-286. ACM.
  • Brown and Duguid, 1996
    Brown, J. S. and Duguid, P. (1996).
    Universities in the digital age.
    Change: The Magazine of Higher Learning, 28(4):11-19.
  • Bullers et al., 2006
    Bullers, Jr., W. I., Burd, S., and Seazzu, A. F. (2006).
    Virtual machines an idea whose time has returned: Application to network, security, and database courses.
    SIGCSE Bull., 38(1):102-106.
  • Chen and Noble, 2001
    Chen, P. M. and Noble, B. D. (2001).
    When virtual is better than real [operating system relocation to virtual machines].
    In Hot Topics in Operating Systems, 2001. Proceedings of the Eighth Workshop on, pages 133-138. IEEE.
  • Cooper, 2005
    Cooper, M. (2005).
    Remote laboratories in teaching and learning-issues impinging on widespread adoption in science and engineering education.
    International Journal of Online Engineering (iJOE), 1(1).
  • Cormier, 2014
    Cormier, D. (2014).
    Rhizo14-the MOOC that community built.
    INNOQUAL-International Journal for Innovation and Quality in Learning, 2(3).
  • Dougiamas and Taylor, 2003
    Dougiamas, M. and Taylor, P. (2003).
    Moodle: Using learning communities to create an open source course management system.
    In World conference on educational multimedia, hypermedia and telecommunications, pages 171-178.
  • Gomes and Bogosyan, 2009
    Gomes, L. and Bogosyan, S. (2009).
    Current trends in remote laboratories.
    Industrial Electronics, IEEE Transactions on, 56(12):4744-4756.
  • Hashimoto, 2013
    Hashimoto, M. (2013).
    Vagrant: Up and Running.
    O Reilly Media, Inc.
  • Jones and Winne, 2012
    Jones, M. and Winne, P. H. (2012).
    Adaptive Learning Environments: Foundations and Frontiers.
    Springer Publishing Company, Incorporated, 1st edition.
  • Lowe, 2014
    Lowe, D. (2014).
    MOOLs: Massive open online laboratories: An analysis of scale and feasibility.
    In Remote Engineering and Virtual Instrumentation (REV), 2014 11th International Conference on, pages 1-6. IEEE.
  • Ma and Nickerson, 2006
    Ma, J. and Nickerson, J. V. (2006).
    Hands-on, simulated, and remote laboratories: A comparative literature review.
    ACM Computing Surveys (CSUR), 38(3):7.
  • Pearson, 2013
    Pearson, S. (2013).
    Privacy, security and trust in cloud computing.
    In Privacy and Security for Cloud Computing, pages 3-42. Springer.
  • Prince, 2004
    Prince, M. (2004).
    Does active learning work? A review of the research.
    Journal of engineering education, 93(3):223-231.
  • Romero-Zaldivar et al., 2012
    Romero-Zaldivar, V.-A., Pardo, A., Burgos, D., and Delgado Kloos, C. (2012).
    Monitoring student progress using virtual appliances: A case study.
    Computers & Education, 58(4):1058-1067.
  • Sumner, 2000
    Sumner, J. (2000).
    Serving the system: A critical history of distance education.
    Open learning, 15(3):267-285.
  • Watson, 2008
    Watson, J. (2008).
    Virtualbox: Bits and bytes masquerading as machines.
    Linux J., 2008(166).
  • Winckles et al., 2011
    Winckles, A., Spasova, K., and Rowsell, T. (2011).
    Remote laboratories and reusable learning objects in a distance learning context.
    Networks, 14:43-55.
  • Yeung et al., 2010
    Yeung, H., Lowe, D. B., and Murray, S. (2010).
    Interoperability of remote laboratories systems.
    iJOE, 6(S1):71-80.

18 April 2014

Richard Hartmann: higher security

Instant classic
Trusted:
NO, there were errors:
The certificate does not apply to the given host
The certificate authority's certificate is invalid
The root certificate authority's certificate is not trusted for this purpose
The certificate cannot be verified for internal reasons
Signature Algorithm: md5WithRSAEncryption
    Issuer: C=XY, ST=Snake Desert, L=Snake Town, O=Snake Oil, Ltd, OU=Certificate Authority, CN=Snake Oil CA/emailAddress=ca@snakeoil.dom
    Validity
        Not Before: Oct 21 18:21:51 1999 GMT
        Not After : Oct 20 18:21:51 2001 GMT
    Subject: C=XY, ST=Snake Desert, L=Snake Town, O=Snake Oil, Ltd, OU=Webserver Team, CN=www.snakeoil.dom/emailAddress=www@snakeoil.dom
...
            X509v3 Subject Alternative Name: 
            email:www@snakeoil.dom

For your own pleasure:
openssl s_client -connect www.walton.com.tw:443 -showcerts

or just run
echo '
-----BEGIN CERTIFICATE-----
MIIDNjCCAp+gAwIBAgIBATANBgkqhkiG9w0BAQQFADCBqTELMAkGA1UEBhMCWFkx
FTATBgNVBAgTDFNuYWtlIERlc2VydDETMBEGA1UEBxMKU25ha2UgVG93bjEXMBUG
A1UEChMOU25ha2UgT2lsLCBMdGQxHjAcBgNVBAsTFUNlcnRpZmljYXRlIEF1dGhv
cml0eTEVMBMGA1UEAxMMU25ha2UgT2lsIENBMR4wHAYJKoZIhvcNAQkBFg9jYUBz
bmFrZW9pbC5kb20wHhcNOTkxMDIxMTgyMTUxWhcNMDExMDIwMTgyMTUxWjCBpzEL
MAkGA1UEBhMCWFkxFTATBgNVBAgTDFNuYWtlIERlc2VydDETMBEGA1UEBxMKU25h
a2UgVG93bjEXMBUGA1UEChMOU25ha2UgT2lsLCBMdGQxFzAVBgNVBAsTDldlYnNl
cnZlciBUZWFtMRkwFwYDVQQDExB3d3cuc25ha2VvaWwuZG9tMR8wHQYJKoZIhvcN
AQkBFhB3d3dAc25ha2VvaWwuZG9tMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKB
gQC554Ro+VH0dJONqljPBW+C72MDNGNy9eXnzejXrczsHs3Pc92Vaat6CpIEEGue
yG29xagb1o7Gj2KRgpVYcmdx6tHd2JkFW5BcFVfWXL42PV4rf9ziYon8jWsbK2aE
+L6hCtcbxdbHOGZdSIWZJwc/1Vs70S/7ImW+Zds8YEFiAwIDAQABo24wbDAbBgNV
HREEFDASgRB3d3dAc25ha2VvaWwuZG9tMDoGCWCGSAGG+EIBDQQtFittb2Rfc3Ns
IGdlbmVyYXRlZCBjdXN0b20gc2VydmVyIGNlcnRpZmljYXRlMBEGCWCGSAGG+EIB
AQQEAwIGQDANBgkqhkiG9w0BAQQFAAOBgQB6MRsYGTXUR53/nTkRDQlBdgCcnhy3
hErfmPNl/Or5jWOmuufeIXqCvM6dK7kW/KBboui4pffIKUVafLUMdARVV6BpIGMI
5LmVFK3sgwuJ01v/90hCt4kTWoT8YHbBLtQh7PzWgJoBAY7MJmjSguYCRt91sU4K
s0dfWsdItkw4uQ==
-----END CERTIFICATE-----
'   openssl x509 -noout -text

At least they're secure against heartbleed.

6 December 2012

Olivier Berger: A 30 minutes introduction to git

I ve been looking for a set of slides that I could have reused to make a 30 minutes introduction to my colleagues at the computer science department. Finding none that would be suitable (i.e. including graphical examples and not only being the canvas for a workshop), I tried to proceed with making one of my own. I ve reused the document git concepts simplified, whose content seemed quite good, and have converted it to a set of slides. Here s the result, which has been reworked a bit vs. the original git concepts simplified : <iframe allowfullscreen="allowfullscreen" frameborder="0" height="356" marginheight="0" marginwidth="0" mozallowfullscreen="mozallowfullscreen" scrolling="no" src="http://www.slideshare.net/slideshow/embed_code/15517872" style="border:1px solid #CCC;border-width:1px 1px 0;margin-bottom:5px" webkitallowfullscreen="webkitallowfullscreen" width="427"> </iframe>
An introduction to git from olberger
Here s the PDF version. Note that if you prefer a version that aligns more with the original, here s also an older version, in : https://github.com/olberger/git-notes/tree/master/gcs Update: For the lone reader that cannot attend my presentation, I should recommend to read first the HTML version by Sitaram at git concepts simplified, as the slides miss some details (which I ve kept in the beamer notes only), then only refer to my slides in a second time, for some additions. If you want the org-mode source (containing the dot source of the diagrams), contact me. I ll maybe upload the source into some Git repo whan I have enough demand/time ;)

17 October 2012

Gerfried Fuchs: Lindsey Stirling

Sometimes one stumbles upon artists by accident and immediately falls in love with them. A link to a video from Lindsey Stirling was dropped in a chat I was paying attention to at that time, and it immediately touched me. She's got style, and she's got great videos. Speaking of videos, here they are: I hope her plans to come to europe might work out soonish. Enjoy!

/music permanent link Comments: 4 Flattr this

7 July 2012

Petter Reinholdtsen: Free Timetabling Software - nice free software

Included in Debian Edu / Skolelinux is a large collection of end user and school specific software. It is one of the packages not installed by default but provided in the Debian archive for schools to install if they want to, is a system to automatically plan the school time table using information about available teachers, classes and rooms, combined with the list of required courses and how many hours each topic should receive. The software is named FET, and it provide a graphical user interface to input the required information, save the result in a fairly simple XML format, and generate time tables for both teachers and students. It is available both for Linux, MacOSX and Windows. This is the feature list, liftet from the project web site: I have not used it myself, as I am not involved in time table planning at a school, but it seem to work fine when I test it. If you need to set up your schools time table, and is tired of doing it manually, check it out. A quick summary on how to use it can be found in a blog post from MarvelSoft. If you find FET useful, please provide a recipe for the Debian Edu project in the Debian Edu HowTo section.

28 August 2011

Micha Lenk: Finally transitioning to a new GnuPG key

Finally I managed to write up a transition statement for my not so new, but stronger GnuPG key. See below:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1,SHA256
I am transitioning my GPG key from an old 1024-bit key to a new 4096-bit key.
The old key will continue to be valid for some time, but I prefer all new
correspondence to be encrypted for the new key, and will be making all
signatures going forward with the new key.
If you have signed my old key, I would appreciate signatures on my new key as
well, provided that your signing policy permits that without reauthenticating
me.
The old key, which I am transitioning away from, is:
pub   1024D/99E141B4 2004-02-10
      Key fingerprint = 25FE 4741 4770 0558 949D  1DB1 58DD 3FE2 99E1 41B4
The new key, to which I am transitioning, is:
pub   4096R/51B85139 2009-06-18
      Key fingerprint = A3EB B41F C5AB D675 CEE4  1C45 EA6C A6B9 51B8 5139
Thanks in advance.
Cheers,
Micha Lenk
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iEYEARECAAYFAk5aOnkACgkQWN0/4pnhQbTxPgCgzRhREZlQiKJyI9UIdJLLs3Zq
bH4AnA1myFxgDWM7aUMHXgvvsujLTjiWiQIcBAEBCAAGBQJOWjp5AAoJEOpsprlR
uFE5gCsP/0dtCUPl9aQHV1MbQl7+bMofpsC2ikkpdZmrzi68jTG16We49BuzY+PV
S8FhXqg17/YxhKYDnDNTowfztUOyjAOJxy5vrqm3X5xiLwTqN3js9mra+vb4s35k
EVbKMzLLDhj3i0FjeargWEmJmm9cVhaZWKvOvQUhDJAilqvEQ0/50P7B8I+1YvtV
UHoQKbweTljVSlK5R1YPPy9i6r2/oZBYxK4nrknWwS+qPQ5luqelJd+mZdgQ6tow
7HIvtmPgCblJ+hYZWFpoZK6vxs8RaBbuCQcKwYArNhZT/v4TeD/LAaUmIkbQMyKV
J2TKuEHya4+5GMbtg6BGKeiZpleEHPnAq1AfvGpz6opkxjxCLG3RO3X8D3EuM3RW
mkq60mWM8+Zwu1yKbb62iHplp6jpyiQgdjJlB6eHjX7SdY7CvHgYZxGDx4kclP0/
HAdig2U1T+nG6Nn5XflmmKwvNLuKlQlIwJ5NeXyCONRnYvdomQn2hgvkjwMLCdFh
ulIhxa4UvDY7/aQNPZeOrvDHb2XYpiV3TwA9hLgQXXWd0FPmUMVVPpKpRjilaWth
Mtq/QiMGP5Mq/YxgLInRZHGyajDtE67RD4+RgHYOP50cP3UGPoB4ncc2EEtM2kfe
BzJrXPHmdtyGEA6Korl0YwUTRnYsqkqkY1VqDsO0UkOLlV7RAzb9
=9vw1
-----END PGP SIGNATURE-----

25 October 2010

Axel Beckert: ratpoison and focus follows mouse

I use ratpoison as window manager on my ASUS EeePC netbook nemo for more than two years now. But although I m very happy with ratpoison in the EeePC, there are two feature wishes which have been refused by upstream: One is more flexibel window name matching for the unmanage command. The other one is focus follows mouse between ratpoison frames. Well, I always guessed that it was possible, but it took until now to find outhow to implement focus follows mouse for ratpoison. There s an ancient but still useful tool called Not a Window Manager (nawm) which is a small awk-like interpreter offering mostly window handling functions. The following .nawmrc implements focus follows mouse in nawm:
window newwin;  # stores window to raise
window lastwin; # stores previous window to prevent race conditions
leave  
    lastwin = currentwindow;
 
enter  
    newwin = pointerwindow();
    if (name(newwin) != "" && newwin != lastwin)  
        raise newwin;
        sync;
     
 
The leave hook is necessary to prevent flapping between two windows if switched between them via ratpoison s commands. I also had to add the following hook to my .ratpoisonrc to work around some cases where ratpoison s own window switching didn t work anymore. Only happened with more than one frame with one frame banishing the mouse cursor was annoying, so I filtered that case:
addhook switchwin exec if [  ratpoison -c fdump fgrep -o frame wc -l  -gt 1 ]; then ratpoison -c banish; fi
Unfortunately nawm has been removed from Debian Sid about a year ago due to being buggy and orphaned. There was not upstream development for seven years or so either. So for the moment you can get nawm either from Debian Lenny or from snapshot.debian.org. But I had to fix a segfault in nawm when calling name() on a window without name to be able to use it at all, so you will probably have to rebuild it anyway with the following patch:
diff -u nawm-0.0.20030130/builtins.c nawm-0.0.20030130-patched/builtins.c
--- nawm-0.0.20030130/builtins.c        2010-10-25 06:00:02.000000000 +0200
+++ nawm-0.0.20030130-patched/builtins.c        2010-10-25 04:15:25.000000000 +0200
@@ -546,8 +546,12 @@
     *name = gcstrdup("");
   else
      
-      *name = gcstrdup((char *)nm);
-      XFree(nm);
+      if ((char *)nm)  
+        *name = gcstrdup((char *)nm);
+        XFree(nm);
+        else  
+        *name = gcstrdup("");
+       
      
  
And yes, I m thinking about adopting and reintroducing the nawm package into Debian Sid. But I d prefer if anyone could give me a hint how to do this with more current and still maintained tools (or a patch against ratpoison :-). I looked into suckless-tools, but I haven t found anything in there which provides hooks on X events. And the Perl module Tk seems to be able to set X event hooks, but only within the application being written itself.

20 August 2010

Marco T&uacute;lio Gontijo e Silva: Immix on GHC Summer of Code final report

My project. This part of this post assumes that the reader has read my last post. The Summer of Code is over. It was great to spend time working on GHC and getting money for it. Although the implementation is not mature enough to be included in the repository, I m happy with the state it is now. I think it s a good start, and I plan to keep working on it. It s good to see how my motivation increased now that the program is over, and that I m free to not work on it if I want to. So I m going to take a look at it again, as soon as I do the stuff I was postergating during the program. I ve created a wiki page for it, and my plans are to implement and measure the Remove partial lists patch, and then to debug Allocate in lines in minor GCs . Any help is very welcome.

11 August 2010

Marco T&uacute;lio Gontijo e Silva: Immix on GHC Summer of Code report #11

My project. This post assumes that the reader has read my last post. I ve been trying to find a reasonable method (which computer to use, single user or multi user) to benchmark for some time, and now an e-mail from Simon Marlow calmed me down. He suggests that in this stage it s not necessary to run a long benchmark, but only a short one, that will be enought to have an idea about what each change is causing to performance. For some time I was struggling with the benchmarks to get a reasonable result. I ve started more than once to write a blog post analysing the results I ve got, until in the middle of the post I notice the results are not consistent. This usually happens when there s a big distortion in benchmarks that shouldn t present no change. For instance, in a patch that only changes code that is executed with +RTS -w -RTS, there was a difference of 50% on execution time in default mode, that is, without +RTS -w -RTS. I think that only disabling CPU scaling and running a small number of tests, which make it easier to reproduce a possibly inconsistent result, will be enough to get a good data. I ve made a stack of patches, splitting each part of my work. The stack does not reflect my preferences between the patches, that is, the complete stack applied is not what I think that is the best change to GHC. It s only a set of patches applied linearly, in a way that it s easy to see the impact in performance of each change. The patches are: Don t check for swept blocks in -DS.
The checkHeap function assumed the allocated part of the block contained only alive objects and slops. This was not true for blocks that are collected using mark sweep. The code in this patch skip the test for this kind of blocks.
This patch is only needed to make it possible to run the binary with the parameters +RTS -w -DS -RTS. I ve already explained this patch. I didn t measure the difference in performance before this patch, since it only changes code that is not going to be executed in the benchmarks, since none of them are runned with -DS. Immix: allocate and free memory on lines. This is the main patch with the initial implementation of Immix. As it is the initial implementation, it has it s problems, that will be treated in the following patches. The comparison was done using 3 programs from nofib/gc: fibheaps, fulsom and constraints, as suggested by Simon Marlow. Two tests were made for each version of the compiler code: one using the default GC strategie (copy-collection with mark-compact if there s few heap space left), another one using mark-sweep (or Immix after my changes). When comparing the default GC strategie between before and after this change, the program got 0.4% slower, with the collection of the generation 0 being 0.1% slower and the collection of the generation 1 0.4% slower. Comparing the mark-sweep with Immix, it got 13.7% slower, 15% in GC0 and 7.2% in GC1. The change in memory used is irrelevant. Chose between allocating in line or blocks in todo_block_full The first improvement I ve made is to change the place where the allocation was searching for free lines. Most of the changes were done in the function alloc_for_copy(), in rts/sm/Evac.c. This is bad because this function is called a lot of times, and should be kept fast. So I moved them to ctodo_block_full() in rts/sm/GCUtils.c. It got about 9.9% and 9.6% faster in default and sweep, which makes it 1% faster and 2.8% slower than in the original code. Improvements in sweep() I was bothered that the code I ve written to free memory in lines was messy, and I thought it could be simplified and maybe even turn faster. The general execution time increased 0.4%, but the GC1 time, which is the only one that should be affected, was reduced by 0.6%. In the default GC strategie, there was also a 0.2% increase in time, so I guess this is inside the margin of change. I have to investigate better on this one, but it seems to have improved slightly the performance. Line before inscresing block size. In a conversatio with Simon Marlow, we thought about these two options: one is to search for a line after trying to increase the block size, and that s what I did in Chose between allocating in line or blocks in todo_block_full; the other is trying to search a line first, and just increase the block size if a block (and not a line) was being used for allocation and there is no line available. With this patch I ve changed it to use the second one. Using sweep, the GC time got 0.5% slower, and with default it got 0.1%. rts/sm/Sweep.c: Mark all BF_MEDIUM blocks as BF_FRAGMENTED In my first implementation, the blocks that contain objects bigger than a line (of medium size) are marked with BF_MEDIUM, and are treated as the usual block in mark-sweep: if the block is empty, it s freed; if it s very fragmented, it s marked to be collected using copy-collection. In this patch, instead of making it use the rules of mark-sweep, I just mark it as fragmented and make it be collected by copy-collection. This made the GC0 time be reduced by 3%, but caused an increase of 509.7% in the GC1 time. This was also the first time the memory used was reduced, by 25.4%. The results in the default mode are insignificant. What it seems to me is that the copy-collection code is more efficient with memory usage, but also more slow. Marking objects as BF_FRAGMENTED make this algorithm be used even for full objects. rts/sm/Sweep.c: Don t use 3/4 heuristics to mark as BF_FRAGMENTED The mark-sweep algorithm considered that an object was fragmented if more than 1/4 of the word groups are completely unmarked. I kept using this heuristic, and this change removes it, making more use of the Immix, and less use of copy-collection. This didn t change the memory used, and made the code 10.4% slower in sweep. In default GC mode, there was no relevant difference, as expected. Allocate in lines in minor GCs. In the initial implementation I was only allocating on lines only in major GCs, because I needed the mark stack, which was only available in these kind of allocations. In this change, I created the mark stack on all GCs, and used the allocation on lines. The results for the default mode are insignificant, and the code got 9.0% faster in sweep, using the same memory. Selection I ve made a selection with the presumably better patches, which are: Don t check for swept blocks in -DS, Immix: allocate and free memory on lines, Chose between allocating in line or blocks in todo_block_full, Improvements in sweep() and Allocate in lines in minor GCs. This is an attempt to achieve the best set of patches possible, to see how it improved the original code. Comparing the default strategie for this selection and the original code, it got 3.7% slower. Comparing both sweep, it got 3.9% slower. Comparing the original default with the final sweep, it got 4.2% slower and uses 4.7% more memory. There s a lot of room for improvement, and I m willing to hear suggestions of what I could change in the code to achieve this improvement.
The results are available. Every time I had doubt about a comparison, I d run both versions again to check. This is way there are some backup files. The data presented here is not from all the most recent versions of each measurement, but from the ones that I thought were more similar in conditions.

7 July 2010

Marco T&uacute;lio Gontijo e Silva: Immix on GHC Summer of Code weekly report #9

My project. This post assumes that the reader has read my last post. I m posting this weekly report earlier this week because there are too many things to tell already. I ve found the reason behind the segfault I m looking for so much time. The last problem, which is the only I know exactly when was fixed, because the programs started working, was related to allocating two times in the same region of memory. This happened because in each major GC the list of free line groups is generated again, but my old code was still allocating in the same line group of the last generation. So the last part of the line group, which was not yet used, would be a part of a line group in generated in the new collection, and it will be used for allocation two times: one in the allocation of the current line group, and another when this new line group starts being allocated. The implementation of the allocation of memory in lines is not very complicated, but it has some details that should be paid attention, and that were the cause of most trouble last weeks, and still need improvement. Initially I was allocating one object per line, just to see if it would work. As it didn t, I kept on improving the approach until I could find the problem. The next attempt was by setting ws->todo_free and ws->todo_lim in alloc_for_copy() in rts/sm/Evac.c. I think this is not ideal, because I didn t want the code to become too inconsistent with the way memory was allocated using these pointers before my changes. So I created new variables, line_free and line_lim, at first in the gen_workspace ws, the same place that todo_free and todo_lim are, but because of the last problem I described in the previous paragraph I changed it to generation. I m still not sure about where to place these pointers, this is something that can be improved. Another problem that I took a long time to understand is that the object need to be scavenged after being allocated. When it was allocated in todo_free, it was being scavenged by scavenge_block() in rts/sm/Scav.c, because the block in which it s in, todo_bd, is scavenged by this function. As I didn t wanted to the whole block where the free line group is to get scavenged again, I didn t want to send it to this function. So I thought about creating a way to scavenge only part of a block, that is, the space in the free line group that was allocated. This is still a valid idea, but I noticed that it was easier to use the mark stack. So I mark the object that is allocated in the line and push it in the mark stack. The main problem with this approach it s only possible to allocate in lines during major GCs, since only in this kind of GCs the mark stack is active. This is certainly the place where I can make more improvement. The patch of these changes and another one for the sanity checking explained in the last post. I m now benchmarking these changes with nofib, to see how much it affects the performance.

4 June 2010

Marco T&uacute;lio Gontijo e Silva: Summer of Code weekly report #4

My project. I m publishing my report earlier this week, because there was a lot to talk about. This week I started to make changes in the code going in the direction of what I want to do. I haven t started a final implementation, but I m studying about how what I want to do will affect the rest of the Garbage Collection (GC). I noticed the code in rts/sm/Sweep.c was simple and similar to what I m planning to do, so I started changing how it works. Sweep in the Glasgow Haskell Compiler (GHC) is done by a bitmap, which contains a bit for each word in a memory block, and is set to 1 when there s an object starting in the mapped area and 0 otherwise. When there s a block with no objects starting at it, that is, all bits of the bitmap are set to 0, the block is freed.
        if (resid == 0)
         
            freed++;
            gen->n_old_blocks--;
            if (prev == NULL)  
                gen->old_blocks = next;
              else  
                prev->link = next;
             
            freeGroup(bd);
         
The bits are analyzed in a group of BITS_IN(W_), where BITS_IN(W_) is the number of bits in a word.
        for (i = 0; i < BLOCK_SIZE_W / BITS_IN(W_); i++)
         
            if (bd->u.bitmap[i] != 0) resid++;
         
If more than of the groups are completely set to 0, the block is considered fragmented.
            if (resid < (BLOCK_SIZE_W * 3) / (BITS_IN(W_) * 4))  
                fragd++;
                bd->flags  = BF_FRAGMENTED;
             
Immix, the GC algorithm I plan to implement in GHC, divides the blocks of memory in lines. My initial plan was to identify free lines. I decided to consider a the size of a line fixed in BITS_IN(W_) words, because it will map to a word in the bitmap, and the code was already analyzing in groups of BITS_IN(W_) words. This was very easy with the current code.
            if (bd->u.bitmap[i] != 0) resid++;
            else printf("DEBUG: line_found(%p)\n", bd->start + BITS_IN(W_) * i);
This worked, and showed some free lines. I m sure there are other ways of logging in GHC, but printf was the simplest way I could thought of. I measured the occurrence of free lines using the bernouilli program from the NoFib benchmark suite, calling it with 500 +RTS -w, to make it uses sweep. In 782 calls to GarbageCollect(), sweep() was called 171 times, for 41704 blocks to be swept and found 230461 free lines. This gives us about 5.5 free lines per block, from the 8 lines in each block (on 64 bits systems). The problem is that the bitmap is marked only in the start of the objects allocated, so even in a line that all bits are marked with 0 we can t assume that it s completely free, because there may be an object that starts in the previous line that is using the space of the line. Checking only for the previous line doesn t work either, because a big object can span several lines. What we can do here is a variation of conservative marking, as proposed in the Immix paper, checking only the previous line and working only with objects smaller than a line. To make sure I was working only with objects smaller than a line, I had to mark the blocks that contains medium objects and avoid them when seeking free lines. The block flags are defined in includes/rts/storage/Block.h, so I included another flag in this file, BF_MEDIUM.
/* Block contains objects evacuated during this GC */
#define BF_EVACUATED 1
/* Block is a large object */
#define BF_LARGE     2
/* Block is pinned */
#define BF_PINNED    4
/* Block is to be marked, not copied */
#define BF_MARKED    8
/* Block is free, and on the free list  (TODO: is this used?) */
#define BF_FREE      16
/* Block is executable */
#define BF_EXEC	     32
/* Block contains only a small amount of live data */
#define BF_FRAGMENTED 64
/* we know about this block (for finding leaks) */
#define BF_KNOWN     128
/* Block contains objects larger than a line */
#define BF_MEDIUM    256
The GHC GC is generational, there is, objects are allocated in a generation and, after a time, the ones that are still being used are moved the next generation. This idea assumes the death probability of younger objects is higher, so few objects are moved to the next generation. Sweep and Immix work only in the last generation so, to mark blocks with medium objects we have to check the size of the objects that are moved to the next generation. This is done in the copy_tag function of rts/sm/Evac.c. I inserted a code that checks for the object size and marks the block when it s bigger than BITS_IN(W_).
STATIC_INLINE GNUC_ATTR_HOT void
copy_tag(StgClosure **p, const StgInfoTable *info,
         StgClosure *src, nat size, generation *gen, StgWord tag)
 
    StgPtr to, from;
    nat i;
    to = alloc_for_copy(size,gen);
    if(size > 8)  
        Bdescr(to)->flags  = BF_MEDIUM;
     
So I updated the code in rts/sm/Sweep.c to only inspect for free lines in blocks without BF_MEDIUM mark.
            if (bd->u.bitmap[i] != 0) resid++;
            else if(!(bd->flags & BF_MEDIUM))  
                printf("DEBUG: line_found(%p)\n", bd->start + BITS_IN(W_) * i);
             
This also worked. Now, in the 32012 blocks there were 189015 free lines, found in the same number of GCs, making about 5.9 free lines per block. We considered only blocks with small objects, but we didn t ignore the first line of each group of free lines. This can be achieved by checking if the previous line was also free.
            if (bd->u.bitmap[i] != 0) resid++;
            else if(!(bd->flags & BF_MEDIUM) && i > 0 && bd->u.bitmap[i] == 0)  
                printf("DEBUG: line_found(%p)\n", bd->start + BITS_IN(W_) * i);
             
Now, from the 32239 blocks, 165547 free lines were found, giving 5.1 free lines per block. But there are more things to improve. If the whole block is free, we want to free it, instead of marking it s lines as free. So it s better to mark the lines after we know that the blocks are not completely free. So I left the code that checks the bitmap as it was, and included a line check only for blocks that are not completely free. At this point, I also associated the fragmentation test with blocks with medium objects, because in blocks of small objects we plan to allocate in free lines, so fragmentation is not a (big) issue.
            if (resid < (BLOCK_SIZE_W * 3) / (BITS_IN(W_) * 4) &&
                (bd->flags & BF_MEDIUM))  
                fragd++;
                printf("DEBUG: BF_FRAGMENTED\n");
                bd->flags  = BF_FRAGMENTED;
             
            else if(!(bd->flags & BF_MEDIUM))  
                for(i = 1; i < BLOCK_SIZE_W / BITS_IN(W_); i++)
                 
                    if(bd->u.bitmap[i] == 0 && bd->u.bitmap[i - 1] == 0)  
                        printf("DEBUG: line_found(%p)\n", bd->start + BITS_IN(W_) * i);
                     
                 
             
The total ammount of blocks increased dramatically: the blocks that become fragmented and were not called again in sweep made a huge difference. From the 345143 blocks, 1633268 free lines were found, or about 4.7 free lines per block. 9434 blocks were free, so, from the remaining blocks, we have about 4.9 free lines per block. Something we ll need then is a way to access these lines latter. The simplest way I thought to achieve it is constructing a list of lines, in each the first word of each free line is a pointer to the next free line, and the first word of the last free line is 0. It s useful to keep reporting the lines to stdout, so that we can then follow the list and check if we went to the same lines.
                    if(bd->u.bitmap[i] == 0 && bd->u.bitmap[i - 1] == 0)  
                        StgPtr start = bd->start + BITS_IN(W_) * i;
                        printf("DEBUG: line_found(%p)\n", start);
                        if(line_first == NULL)  
                            line_first = start;
                         
                        if(line_last != NULL)  
                            *line_last = (StgWord) start;
                         
                        line_last = start;
                        *line_last = 0;
                     
                 
             
         
     
    for(line_last = line_first; line_last; line_last = (StgPtr) *line_last)  
        fprintf(stderr, "DEBUG: line_found(%p)\n", line_last);
     
I printed the inclusion of the lines on the list to stdout, and the walking on the list in stderr, so that it d be easy to diff. There was no difference between the lists. There re another improvements that can be made, like using a list of groups of free lines, but I think it ll be better to think about this after studying how the allocation in the free lines will be done. That s where I m going to now.
There are some minor things I learned, and thought they worth blogging. The current GHC uses three strategies for collecting the last generation: copying, mark-compact and mark-sweep. Copying is the default until the memory reaches 30% of the maximum heap size; after that, mark-compact is used. Sweep can be chosen by a Real Time System (RTS) flag, -w. To use mark-compact always, the flag is -c. I ve been submitting small patches to the cvs-ghc mailling list, mostly about outdated comments. Most of them were accepted, except for one which contained a lot of commentary, and that indeed was not completely correct. I corrected it and resend to the list, but the message is waiting for approval because the message header matched a filter rule. I believe this is because I replied the message generated from darcs. There s a very useful ghc option, specially for testing the compiler, because in this case you need to rebuild the source, even when there re no changes in it. It s -fforce-recomp, and it makes only sense when used with --make.

Next.