Gunnar Wolf: The author has been doctored.



This post is an unpublished review for ChatGPT is bullshitAs people around the world understand how LLMs behave, more and more people wonder as to why these models hallucinate, and what can be done about to reduce it. This provocatively named article by Michael Townsen Hicks, James Humphries and Joe Slater bring is an excellent primer to better understanding how LLMs work and what to expect from them. As humans carrying out our relations using our language as the main tool, we are easily at awe with the apparent ease with which ChatGPT (the first widely available, and to this day probably the best known, LLM-based automated chatbot) simulates human-like understanding and how it helps us to easily carry out even daunting data aggregation tasks. It is common that people ask ChatGPT for an answer and, if it gets part of the answer wrong, they justify it by stating that it s just a hallucination. Townsen et al. invite us to switch from that characterization to a more correct one: LLMs are bullshitting. This term is formally presented by Frankfurt [1]. To Bullshit is not the same as to lie, because lying requires to know (and want to cover) the truth. A bullshitter not necessarily knows the truth, they just have to provide a compelling description, regardless of what is really aligned with truth. After introducing Frankfurt s ideas, the authors explain the fundamental ideas behind LLM-based chatbots such as ChatGPT; a Generative Pre-trained Transformer (GPT) s have as their only goal to produce human-like text, and it is carried out mainly by presenting output that matches the input s high-dimensional abstract vector representation, and probabilistically outputs the next token (word) iteratively with the text produced so far. Clearly, a GPT s ask is not to seek truth or to convey useful information they are built to provide a normal-seeming response to the prompts provided by their user. Core data are not queried to find optimal solutions for the user s requests, but are generated on the requested topic, attempting to mimic the style of document set it was trained with. Erroneous data emitted by a LLM is, thus, not equiparable with what a person could hallucinate with, but appears because the model has no understanding of truth; in a way, this is very fitting with the current state of the world, a time often termed as the age of post-truth [2]. Requesting an LLM to provide truth in its answers is basically impossible, given the difference between intelligence and consciousness: Following Harari s definitions [3], LLM systems, or any AI-based system, can be seen as intelligent, as they have the ability to attain goals in various, flexible ways, but they cannot be seen as conscious, as they have no ability to experience subjectivity. This is, the LLM is, by definition, bullshitting its way towards an answer: their goal is to provide an answer, not to interpret the world in a trustworthy way. The authors close their article with a plea for literature on the topic to adopt the more correct bullshit term instead of the vacuous, anthropomorphizing hallucination . Of course, being the word already loaded with a negative meaning, it is an unlikely request. This is a great article that mixes together Computer Science and Philosophy, and can shed some light on a topic that is hard to grasp for many users. [1] Frankfurt, Harry (2005). On Bullshit. Princeton University Press. [2] Zoglauer, Thomas (2023). Constructed truths: truth and knowledge in a post-truth world. Springer. [3] Harari, Yuval Noah (2023. Nexus: A Brief History of Information Networks From the Stone Age to AI. Random House.
This post is a review for Computing Reviews for The science of detecting LLM-generated text , a article published in Communications of the ACMWhile artificial intelligence (AI) applications for natural language processing (NLP) are no longer something new or unexpected, nobody can deny the revolution and hype that started, in late 2022, with the announcement of the first public version of ChatGPT. By then, synthetic translation was well established and regularly used, many chatbots had started attending users requests on different websites, voice recognition personal assistants such as Alexa and Siri had been widely deployed, and complaints of news sites filling their space with AI-generated articles were already commonplace. However, the ease of prompting ChatGPT or other large language models (LLMs) and getting extensive answers its text generation quality is so high that it is often hard to discern whether a given text was written by an LLM or by a human has sparked significant concern in many different fields. This article was written to present and compare the current approaches to detecting human- or LLM-authorship in texts. The article presents several different ways LLM-generated text can be detected. The first, and main, taxonomy followed by the authors is whether the detection can be done aided by the LLM s own functions ( white-box detection ) or only by evaluating the generated text via a public application programming interface (API) ( black-box detection ). For black-box detection, the authors suggest training a classifier to discern the origin of a given text. Although this works at first, this task is doomed from its onset to be highly vulnerable to new LLMs generating text that will not follow the same patterns, and thus will probably evade recognition. The authors report that human evaluators find human-authored text to be more emotional and less objective, and use grammar to indicate the tone of the sentiment that should be used when reading the text a trait that has not been picked up by LLMs yet. Human-authored text also tends to have higher sentence-level coherence, with less term repetition in a given paragraph. The frequency distribution for more and less common words is much more homogeneous in LLM-generated texts than in human-written ones. White-box detection includes strategies whereby the LLMs will cooperate in identifying themselves in ways that are not obvious to the casual reader. This can include watermarking, be it rule based or neural based; in this case, both processes become a case of steganography, as the involvement of a LLM is explicitly hidden and spread through the full generated text, aiming at having a low detectability and high recoverability even when parts of the text are edited. The article closes by listing the authors concerns about all of the above-mentioned technologies. Detecting an LLM, be it with or without the collaboration of the LLM s designers, is more of an art than a science, and methods deemed as robust today will not last forever. We also cannot assume that LLMs will continue to be dominated by the same core players; LLM technology has been deeply studied, and good LLM engines are available as free/open-source software, so users needing to do so can readily modify their behavior. This article presents itself as merely a survey of methods available today, while also acknowledging the rapid progress in the field. It is timely and interesting, and easy to follow for the informed reader coming from a different subfield.
drupal7
package,
and by April 2013 I became its primary maintainer. I kept the drupal7
package
up to date in Debian until 2018; the supported build methods for Drupal 8 are
not compatible with Debian (mainly, bundling third-party libraries and updating
them without coordination with the rest of the ecosystem), so towards the end of
2016, I announced I would not package Drupal 8 for
Debian.
By March 2016, we migrated our main page to Drupal
7. By
then, we already had several other sites for our academics projects, but my
narrative follows our main Web site. I did manage to migrate several Drupal 6
(D6) sites to Drupal 7 (D7); it was quite involved process, never transparent to
the user, and we did have the backlash of long downtimes (or partial downtimes,
with sites half-available only) with many of our users. For our main site, we
took the opportunity to do a complete redesign and deployed a fully new site.
You might note that March 2016 is after the release of D8 (November 2015). I
don t recall many of the specifics for this decision, but if I m not mistaken,
building the new site was a several months long process not only for the
technical work of setting it up, but for the legwork of getting all of the
needed information from the different areas that need to be represented in the
Institute. Not only that: Drupal sites often include tens of contributed themes
and modules; the technological shift the project underwent between its 7 and 8
releases was too deep, and modules took a long time (if at all many themes and
modules were outright dumped) to become available for the new release.
Naturally, the Drupal Foundation wanted to evolve and deprecate the old
codebase. But the pain to migrate from D7 to D8 is too big, and many sites have
remained under version 7 Eight years after D8 s release, almost 40% of Drupal
installs are for version 7,
and a similar proportion runs a currently-supported release (10 or 11). And
while the Drupal Foundation made a great job at providing very-long-term support
for D7, I understand the burden is becoming too much, so close to a year ago
(and after pushing several times the D7, they finally announced support will
finish this upcoming January 5.
D2B Migrate
module, I found it is quite easy
to migrate a live site from Drupal 7 to Backdrop.
mysqldump
output, and it got me close to 3GB of data. And given the
D2B_migrate
is meant to work via a Web interface (my playbook works around it
by using a client I wrote with Perl s
WWW::Mechanize), I repeatedly
stumbled with PHP s maximum POST size, maximum upload size, maximum memory
size
I asked for help in Backdrop s Zulip chat
site,
and my attention was taken off fixing PHP to something more obvious: Why is the
database so large? So I took a quick look at the database (or rather: my first
look was at the database server s filesystem usage). MariaDB stores each table
as a separate file on disk, so I looked for the nine largest tables:
# ls -lhS head
total 3.8G
-rw-rw---- 1 mysql mysql 2.4G Dec 10 12:09 accesslog.ibd
-rw-rw---- 1 mysql mysql 224M Dec 2 16:43 search_index.ibd
-rw-rw---- 1 mysql mysql 220M Dec 10 12:09 watchdog.ibd
-rw-rw---- 1 mysql mysql 148M Dec 6 14:45 cache_field.ibd
-rw-rw---- 1 mysql mysql 92M Dec 9 05:08 aggregator_item.ibd
-rw-rw---- 1 mysql mysql 80M Dec 10 12:15 cache_path.ibd
-rw-rw---- 1 mysql mysql 72M Dec 2 16:39 search_dataset.ibd
-rw-rw---- 1 mysql mysql 68M Dec 2 13:16 field_revision_field_idea_principal_articulo.ibd
-rw-rw---- 1 mysql mysql 60M Dec 9 13:19 cache_menu.ibd
D2B_migrate
works with a mysqldump
output, and
given that mysqldump
locks each table before starting to modify it and unlocks
it after its job is done, I can just do the following:
$ perl -e '$output = 1; while (<>) $output=0 if /^LOCK TABLES (accesslog search_index watchdog cache_field cache_path) /; $output=1 if /^UNLOCK TABLES/; print if $output ' < /tmp/d7_backup.sql > /tmp/d7_backup.eviscerated.sql; ls -hl /tmp/d7_backup.sql /tmp/d7_backup.eviscerated.sql
-rw-rw-r-- 1 gwolf gwolf 216M Dec 10 12:22 /tmp/d7_backup.eviscerated.sql
-rw------- 1 gwolf gwolf 2.1G Dec 6 18:14 /tmp/d7_backup.sql
D2B_migrate
is happy to take it. And I m a big step closer to finishing my
reliance on (this bit of) legacy code for my highly-visible sites
This post is a review for Computing Reviews for Why academics under-share research data - A social relational theory , a article published in Journal of the Association for Information Science and TechnologyAs an academic, I have cheered for and welcomed the open access (OA) mandates that, slowly but steadily, have been accepted in one way or another throughout academia. It is now often accepted that public funds means public research. Many of our universities or funding bodies will demand that, with varying intensities sometimes they demand research to be published in an OA venue, sometimes a mandate will only prefer it. Lately, some journals and funder bodies have expanded this mandate toward open science, requiring not only research outputs (that is, articles and books) to be published openly but for the data backing the results to be made public as well. As a person who has been involved with free software promotion since the mid 1990s, it was natural for me to join the OA movement and to celebrate when various universities adopt such mandates. Now, what happens after a university or funder body adopts such a mandate? Many individual academics cheer, as it is the right thing to do. However, the authors observe that this is not really followed thoroughly by academics. What can be observed, rather, is the slow pace or feet dragging of academics when they are compelled to comply with OA mandates, or even an outright refusal to do so. If OA and open science are close to the ethos of academia, why aren t more academics enthusiastically sharing the data used for their research? This paper finds a subversive practice embodied in the refusal to comply with such mandates, and explores an hypothesis based on Karl Marx s productive worker theory and Pierre Bourdieu s ideas of symbolic capital. The paper explains that academics, as productive workers, become targets for exploitation: given that it s not only the academics sharing ethos, but private industry s push for data collection and industry-aligned research, they adapt to technological changes and jump through all kinds of hurdles to create more products, in a result that can be understood as a neoliberal productivity measurement strategy. Neoliberalism assumes that mechanisms that produce more profit for academic institutions will result in better research; it also leads to the disempowerment of academics as a class, although they are rewarded as individuals due to the specific value they produce. The authors continue by explaining how open science mandates seem to ignore the historical ways of collaboration in different scientific fields, and exploring different angles of how and why data can be seen as under-shared, failing to comply with different aspects of said mandates. This paper, built on the social sciences tradition, is clearly a controversial work that can spark interesting discussions. While it does not specifically touch on computing, it is relevant to Computing Reviews readers due to the relatively high percentage of academics among us.
python-fuse
module Stavros
passthrough filesystem,
Dave Filesystem based upon, and further explaining,
Stavros ,
and several others) explaining how to provide basic functionality. I found a
particularly useful presentation by Matteo
Bertozzi presented
~15 years ago at PyCon4 But none of those is IMO followable enough by
itself. Also, most of them are very old (maybe the world is telling me
something that I refuse to understand?).
And of course, there isn t a single interface to work from. In Python only, we
can find
python-fuse,
Pyfuse,
Fusepy Where to start from?
So I setup to try and help.
Over the past couple of weeks, I have been slowly working on my own version, and
presenting it as a progressive set of tasks, adding filesystem calls, and
being careful to thoroughly document what I write (but maybe my documentation
ends up obfuscating the intent? I hope not and, read on, I ve provided some
remediation).
I registered a GitLab project for a hand-holding guide to writing FUSE-based
filesystems in Python. This
is a project where I present several working FUSE filesystem implementations,
some of them RAM-based, some passthrough-based, and I intend to add to this also
filesystems backed on pseudo-block-devices (for implementations such as my
FIUnamFS).
So far, I have added five stepwise pieces, starting from the barest possible
empty
filesystem,
and adding system calls (and functionality) until (so far) either a read-write
filesystem in RAM with basicstat()
support
or a read-only passthrough
filesystem.
I think providing fun or useful examples is also a good way to get students to
use what I m teaching, so I ve added some ideas I ve had: DNS
Filesystem,
on-the-fly markdown compiling
filesystem,
unzip
filesystem
and uncomment
filesystem.
They all provide something that could be seen as useful, in a way that s easy to
teach, in just some tens of lines. And, in case my comments/documentation are
too long to read, uncommentfs
will happily strip all comments and whitespace
automatically!
So I will be delivering my talk tomorrow (2024.10.10, 18:30 GMT-6) at
ChiPy (virtually). I am also presenting
this talk virtually at Jornadas Regionales de Software
Libre in Santa Fe,
Argentina, next week (virtually as well). And also in November, in person, at
nerdear.la, that will be held in Mexico City for the
first time.
Of course, I will also share this project with my students in the next couple of
weeks And hope it manages to lure them into implementing FUSE in Python. At
some point, I shall report!
Update: After delivering my ChiPy talk, I have uploaded it to YouTube: A
hand-holding guide to writing FUSE-based filesystems in
Python, and after presenting at Jornadas
Regionales, I present you the video in Spanish here: Aprendiendo y ense ando a
escribir sistemas de archivo en espacio de usuario con FUSE y
Python.
This post is a review for Computing Reviews for 50 years of queries , a article published in Communications of the ACMThe relational model is probably the one innovation that brought computers to the mainstream for business users. This article by Donald Chamberlin, creator of one of the first query languages (that evolved into the ubiquitous SQL), presents its history as a commemoration of the 50th anniversary of his publication of said query language. The article begins by giving background on information processing before the advent of today s database management systems: with systems storing and processing information based on sequential-only magnetic tapes in the 1950s, adopting a record-based, fixed-format filing system was far from natural. The late 1960s and early 1970s saw many fundamental advances, among which one of the best known is E. F. Codd s relational model. The first five pages (out of 12) present the evolution of the data management community up to the 1974 SIGFIDET conference. This conference was so important in the eyes of the author that, in his words, it is the event that starts the clock on 50 years of relational databases. The second part of the article tells about the growth of the structured English query language (SEQUEL) eventually renamed SQL including the importance of its standardization and its presence in commercial products as the dominant database language since the late 1970s. Chamberlin presents short histories of the various implementations, many of which remain dominant names today, that is, Oracle, Informix, and DB2. Entering the 1990s, open-source communities introduced MySQL, PostgreSQL, and SQLite. The final part of the article presents controversies and criticisms related to SQL and the relational database model as a whole. Chamberlin presents the main points of controversy throughout the years: 1) the SQL language lacks orthogonality; 2) SQL tables, unlike formal relations, might contain null values; and 3) SQL tables, unlike formal relations, may contain duplicate rows. He explains the issues and tradeoffs that guided the language design as it unfolded. Finally, a section presents several points that explain how SQL and the relational model have remained, for 50 years, a winning concept, as well as some thoughts regarding the NoSQL movement that gained traction in the 2010s. This article is written with clear language and structure, making it easy and pleasant to read. It does not drive a technical point, but instead is a recap on half a century of developments in one of the fields most important to the commercial development of computing, written by one of the greatest authorities on the topic.
This post is a review for Computing Reviews for Free and open source software and other market failures , a article published in Communications of the ACMUnderstanding the free and open-source software (FOSS) movement has, since its beginning, implied crossing many disciplinary boundaries. This article describes FOSS s history, explaining its undeniable success throughout the 1990s, and why the movement today feels in a way as if it were on autopilot, lacking the steam it once had. The author presents several examples of different industries where, as it happened with FOSS in computing, fundamental innovations happened not because the leading companies of each field are attentive to customers needs, but to a certain degree, despite them not even considering those needs, it is typically due to the hubris that comes from being a market leader. Kemp exemplifies his hypothesis by presenting the messy landscape of the commercial, mutually incompatible systems of Unix in the 1980s. Different companies had set out to implement their particular flavor of open Unix computers, but with clear examples of vendor lock-in techniques. He speculates that, if we had been able to buy a reasonably priced and solid Unix for our 32-bit PCs nobody would be running FreeBSD or Linux today, except possibly as an obscure hobby. He states that the FOSS movement was born out of the utter market failure of the different Unix vendors. The focus of the article shifts then to the FOSS movement itself: 25 years ago, as FOSS systems slowly gained acceptance and then adoption in the serious market and at the center of the dot-com boom of the early 2000s, Linux user groups (LUGs) with tens of thousands of members bloomed throughout the world; knowing this history, why have all but a few of them vanished into oblivion? Kemp suggests that the strength and vitality that LUGs had ultimately reflects the anger that prompted technical users to take the situation into their own hands and fix it; once the software industry was forced to change, the strongly cohesive FOSS movement diluted. The frustrations and anger of [information technology, IT] in 2024, Kamp writes, are entirely different from those of 1991. As an example, the author closes by citing the difficulty of maintaining despite having the resources to do so an aging legacy codebase that needs to continue working year after year.
gwolf@gwolf.org
).$ sha256sum dc24_fprs.txt
11daadc0e435cb32f734307b091905d4236cdf82e3b84f43cde80ef1816370a5 dc24_fprs.txt
/etc/get_weather/
, that
currently reads:
# Home, Mexico City
LAT=19.3364
LONG=-99.1819
# # Home, Paran , Argentina
# LAT=-31.7208
# LONG=-60.5317
# # PKNU, Busan, South Korea
# LAT=35.1339
#LONG=129.1055
APPID=SomeLongRandomStringIAmNotSharing
/usr/local/bin/get_weather
, that fetches the
current weather and the forecast, and stores them as /run/weather.json
and
/run/forecast.json
:
#!/usr/bin/bash
CONF_FILE=/etc/get_weather
if [ -e "$CONF_FILE" ]; then
. "$CONF_FILE"
else
echo "Configuration file $CONF_FILE not found"
exit 1
fi
if [ -z "$LAT" -o -z "$LONG" -o -z "$APPID" ]; then
echo "Configuration file must declare latitude (LAT), longitude (LONG) "
echo "and app ID (APPID)."
exit 1
fi
CURRENT=/run/weather.json
FORECAST=/run/forecast.json
wget -q "https://api.openweathermap.org/data/2.5/weather?lat=$ LAT &lon=$ LONG &units=metric&appid=$ APPID " -O "$ CURRENT "
wget -q "https://api.openweathermap.org/data/2.5/forecast?lat=$ LAT &lon=$ LONG &units=metric&appid=$ APPID " -O "$ FORECAST "
/etc/systemd/system/get_weather.service
:
[Unit]
Description=Get the current weather
[Service]
Type=oneshot
ExecStart=/usr/local/bin/get_weather
/etc/systemd/system/get_weather.timer
:
[Unit]
Description=Get the current weather every 15 minutes
[Timer]
OnCalendar=*:00/15:00
Unit=get_weather.service
[Install]
WantedBy=multi-user.target
"custom/weather"
module in the desired position of my
~/.config/waybar/waybar.config
, and define it as:
"custom/weather":
"exec": "while true;do /home/gwolf/bin/parse_weather.rb;sleep 10; done",
"return-type": "json",
,
#!/usr/bin/ruby
require 'json'
Sources = :weather => '/run/weather.json',
:forecast => '/run/forecast.json'
Icons = '01d' => ' ', # d day
'01n' => ' ', # n night
'02d' => ' ',
'02n' => ' ',
'03d' => ' ',
'03n' => ' ',
'04d' => ' ',
'04n' => ' ',
'09d' => ' ',
'10n' => ' ',
'10d' => ' ',
'13d' => ' ',
'50d' => ' '
ret = 'text': nil, 'tooltip': nil, 'class': 'weather', 'percentage': 100
# Current weather report: Main text of the module
begin
weather = JSON.parse(open(Sources[:weather],'r').read)
loc_name = weather['name']
icon = Icons[weather['weather'][0]['icon']] ' ' + f['weather'][0]['icon'] + f['weather'][0]['main']
temp = weather['main']['temp']
sens = weather['main']['feels_like']
hum = weather['main']['humidity']
wind_vel = weather['wind']['speed']
wind_dir = weather['wind']['deg']
portions =
portions[:loc] = loc_name
portions[:temp] = '%s %2.2f C (%2.2f)' % [icon, temp, sens]
portions[:hum] = ' %2d%%' % hum
portions[:wind] = ' %2.2fm/s %d ' % [wind_vel, wind_dir]
ret['text'] = [:loc, :temp, :hum, :wind].map p portions[p] .join(' ')
rescue => err
ret['text'] = 'Could not process weather file (%s %s: %s)' % [Sources[:weather], err.class, err.to_s]
end
# Weather prevision for the following hours/days
begin
cast = []
forecast = JSON.parse(open(Sources[:forecast], 'r').read)
min = ''
max = ''
day=Time.now.strftime('%Y.%m.%d')
by_day =
forecast['list'].each_with_index do f,i
by_day[day] = []
time = Time.at(f['dt'])
time_lbl = '%02d:%02d' % [time.hour, time.min]
icon = Icons[f['weather'][0]['icon']] ' ' + f['weather'][0]['icon'] + f['weather'][0]['main']
by_day[day] << f['main']['temp']
if time.hour == 0
min = '%2.2f' % by_day[day].min
max = '%2.2f' % by_day[day].max
cast << ' min: <b>%s C</b> max: <b>%s C</b>' % [min, max]
day = time.strftime('%Y.%m.%d')
cast << ' <b>%04d.%02d.%02d</b> ' %
[time.year, time.month, time.day]
end
cast << '%s %2.2f C %2d%% %s %s' % [time_lbl,
f['main']['temp'],
f['main']['humidity'],
icon,
f['weather'][0]['description']
]
end
cast << ' min: <b>%s</b> C max: <b>%s C</b>' % [min, max]
ret['tooltip'] = cast.join("\n")
rescue => err
ret['tooltip'] = 'Could not process forecast file (%s %s)' % [Sources[:forecast], err.class, err.to_s]
end
# Print out the result for Waybar to process
puts ret.to_json
From: Hermine Wolf <hwolf850@gmail.com>
To: me, obviously
Date: Mon, 15 Jul 2024 22:18:58 -0700
Subject: Make sure that your manuscript gets indexed and showcased in the prestigious Scopus database soon.
Message-ID: <CAEZZb3XCXSc_YOeR7KtnoSK4i3OhD=FH7u+A5xSMsYvhQZojQA@mail.gmail.com>
This message has visual elements included. If they don't display, please
update your email preferences.
*Dear Esteemed Author,*
Upon careful examination of your recent research articles available online,
we are excited to invite you to submit your latest work to our esteemed
journal, '*WULFENIA*'. Renowned for upholding high standards of excellence
in peer-reviewed academic research spanning various fields, our journal is
committed to promoting innovative ideas and driving advancements in
theoretical and applied sciences, engineering, natural sciences, and social
sciences. 'WULFENIA' takes pride in its impressive 5-year impact factor of
*1.000* and is highly respected in prestigious databases including the
Science Citation Index Expanded (ISI Thomson Reuters), Index Copernicus,
Elsevier BIOBASE, and BIOSIS Previews.
*Wulfenia submission page:*
[image: research--check.png][image: scrutiny-table-chat.png][image:
exchange-check.png][image: interaction.png]
.
Please don't reply to this email
We sincerely value your consideration of 'WULFENIA' as a platform to
present your scholarly work. We eagerly anticipate receiving your valuable
contributions.
*Best regards,*
Professor Dr. Vienna S. Franz
noreply-findmydevice@google.com
) notifying me that they would
unconditionally enable the Find my device functionality I have been
repeatedly marking as unwanted in my Android phone.
The mail goes on to explain this functionality works even when the
device is disconnected, by Bluetooth signals (aha, so turn off
Bluetooth will no longer turn off Bluetooth? Hmmm )
Of course, the mail hand-waves that only I can know the location of
my device. Google cannot see or use it for other ends . First, should
we trust this blanket statement? Second, the fact they don t do it
now means they won t ever? Not even if law enforcement requires
them to? The devices will be generating this information whether we
want it or not, so it s just a matter of opening the required
window.
This post is a review for Computing Reviews for How computers make books from graphics rendering, search algorithms, and functional programming to indexing and typesetting , a book published in ManningIf we look at the age-old process of creating books, how many different areas can a computer help us with? And how can each of them be used to teach computer science (CS) fundamentals to a nontechnical audience? This is the premise of John Whitington s enticing book and the result is quite amazing. The book immediately drew my attention when looking at the titles available for review. After all, my initiation into computing as a kid was learning the LaTeX typesetting system while my father worked on his first book on scientific language and typography [1]. Whitington picks 11 different technical aspects of book production, from how dots of ink are transferred to a white page and how they are made into controllable, recognizable shapes, all the way to forming beautiful typefaces and the nuances of properly addressing white-space to present aesthetically pleasing paragraphs, building it all into specific formats aimed at different ends. But if we dig beyond just the chapter titles, we will find a very interesting book on CS that, without ever using technical language or notation, presents aspects as varied as anti-aliasing, vector and raster images, character sets such as ASCII and Unicode, an introduction to programming, input methods for different writing systems, efficient encoding (compression) methods, both for text and images, lossless and lossy, and recursion and dithering methods. To my absolute surprise, while the author thankfully spared the reader the syntax usually associated with LISP-related languages, the programming examples clearly stem from the LISP school, presenting solutions based on tail recursion. Of course, it is no match for Donald Knuth s classic book on this same topic [2], but could very well be a primer for readers to approach it. The book is light and easy to read, and keeps a very informal, nontechnical tone throughout. My only complaint relates to reading it in PDF format; the topic of this book, and the care with which the images were provided by the author, warrant high resolution. The included images are not only decorative but an integral part of the book. Maybe this is specific to my review copy, but all of the raster images were in very low resolution. This book is quite different from what readers may usually expect, as it introduces several significant topics in the field. CS professors will enjoy it, of course, but also readers with a humanities background, students new to the field, or even those who are just interested in learning a bit more.
This post is a review for Computing Reviews for Hacks, leaks and revelations The art of analyzing hacked and leaked data , a book published in No Starch PressImagine you ve come across a trove of files documenting a serious deed and you feel the need to blow the whistle. Or maybe you are an investigative journalist and this whistleblower trusts you and wants to give you said data. Or maybe you are a technical person, trusted by said journalist to help them do things right not only to help them avoid being exposed while leaking the information, but also to assist them in analyzing the contents of the dataset. This book will be a great aid for all of the above tasks. The author, Micah Lee, is both a journalist and a computer security engineer. The book is written entirely from his experience handling important datasets, and is organized in a very logical and sound way. Lee organized the 14 chapters in five parts. The first part the most vital to transmitting the book s message, in my opinion begins by talking about the care that must be taken when handling a sensitive dataset: how to store it, how to communicate it to others, sometimes even what to redact (exclude) so the information retains its strength but does not endanger others (or yourself). The first two chapters introduce several tools for encrypting information and keeping communication anonymous, not getting too deep into details and keeping it aimed at a mostly nontechnical audience. Something that really sets this book apart from others like it is that Lee s aim is not only to tell stories about the hacks and leaks he has worked with, or to present the technical details on how he analyzed them, but to teach readers how to do the work. From Part 2 onward the book adopts a tutorial style, teaching the reader numerous tools for obtaining and digging information out of huge and very timely datasets. Lee guides the reader through various data breaches, all of them leaked within the last five years: BlueLeaks, Oath Keepers email dumps, Heritage Foundation, Parler, Epik, and Cadence Health. He guides us through a tutorial on using the command line (mostly targeted at Linux, but considering MacOS and Windows as well), running Docker containers, learning the basics of Python, parsing and filtering structured data, writing small web applications for getting at the right bits of data, and working with structured query language (SQL) databases. The book does an excellent job of fulfilling its very ambitious aims, and this is even more impressive given the wide range of professional profiles it is written for; that being said, I do have a couple critiques. First, the book is ideologically loaded: the datasets all exhibit the alt-right movement that has gained strength in the last decade. Lee takes the reader through many instances of COVID deniers, rioters for Donald Trump during the January 2021 attempted coup, attacks against Black Lives Matter activists, and other extremism research; thus this book could alienate right-wing researchers, who might also be involved in handling important whistleblowing cases. Second, given the breadth of the topic and my 30-plus years of programming experience, I was very interested in the first part of each chapter but less so in the tutorial part. I suppose a journalist reading through the same text might find the sections about the importance of data handling and source protection to be similarly introductory. This is unavoidable, of course, given the nature of this work. However, while Micah Lee is an excellent example of a journalist with the appropriate technical know-how to process the types of material he presents as examples, expecting any one person to become a professional in both fields is asking too much. All in all, this book is excellent. The writing style is informal and easy to read, the examples are engaging, and the analysis is very good. It will certainly teach you something, no matter your background, and it might very well complement your professional skills.
santiago debacle eamanu dererk gwolf @debian.org
. My main contact to
kickstart organization was Mart n Bayo. Mart n was for many years the leader of
the Technical Degree on Free Software at Universidad Nacional del
Litoral,
where I was also a teacher for several years. Together with Leo Mart nez, also a
teacher at the tecnicatura, they contacted us with Guillermo and Gabriela,
from the APUL non-teaching-staff union of said university.
Hour | Title (Spanish) | Title (English) | Presented by |
---|---|---|---|
10:00-10:25 | Introducci n al Software Libre | Introduction to Free Software | Mart n Bayo |
10:30-10:55 | Debian y su comunidad | Debian and its community | Emanuel Arias |
11:00-11:25 | Por qu sigo contribuyendo a Debian despu s de 20 a os? | Why am I still contributing to Debian after 20 years? | Santiago Ruano |
11:30-11:55 | Mi identidad y el proyecto Debian: Qu es el llavero OpenPGP y por qu ? | My identity and the Debian project: What is the OpenPGP keyring and why? | Gunnar Wolf |
12:00-13:00 | Explorando las masculinidades en el contexto del Software Libre | Exploring masculinities in the context of Free Software | Gora Ortiz Fuentes - Jos Francisco Ferro |
13:00-14:30 | Lunch | ||
14:30-14:55 | Debian para el d a a d a | Debian for our every day | Leonardo Mart nez |
15:00-15:25 | Debian en las Raspberry Pi | Debian in the Raspberry Pi | Gunnar Wolf |
15:30-15:55 | Device Trees | Device Trees | Lisandro Dami n Nicanor Perez Meyer (videoconferencia) |
16:00-16:25 | Python en Debian | Python in Debian | Emmanuel Arias |
16:30-16:55 | Debian y XMPP en la medici n de viento para la energ a e lica | Debian and XMPP for wind measuring for eolic energy | Martin Borgert |
Next.