I wrote this blog post with Kaylea Champion and a version of this post was originally posted on the Community Data Science Collective blog. Critical software we all rely on can silently crumble away beneath us. Unfortunately, we often don t find out software infrastructure is in poor condition until it is too late. Over the last year or so, I have been supporting Kaylea Champion on a project my group announced earlier to measure software underproduction a term we use to describe software that is low in quality but high in importance. Underproduction reflects an important type of risk in widely used free/libre open source software (FLOSS) because participants often choose their own projects and tasks. Because FLOSS contributors work as volunteers and choose what they work on, important projects aren t always the ones to which FLOSS developers devote the most attention. Even when developers want to work on important projects, relative neglect among important projects is often difficult for FLOSS contributors to see. Given all this, what can we do to detect problems in FLOSS infrastructure before major failures occur? Kaylea Champion and I recently published a paper laying out our new method for measuring underproduction at the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2021 that we believe provides one important answer to this question.
For more details, check out the full paper which is available as a freely accessible preprint.
This project was supported by the Ford/Sloan Digital Infrastructure Initiative. Wm Salt Hale of the Community Data Science Collective and Debian Developers Paul Wise and Don Armstrong provided valuable assistance in accessing and interpreting Debian bug data. Ren Just generously provided insight and feedback on the manuscript.
Paper Citation: Kaylea Champion and Benjamin Mako Hill. 2021. Underproduction: An Approach for Measuring Risk in Open Source Software. In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2021). IEEE.
Contact Kaylea Champion (firstname.lastname@example.org) with any questions or if you are interested in following up.
The Wikipedia Adventure was designed to address issues that new editors frequently encountered while learning how to contribute to Wikipedia. It is structured into different missions that guide users through various aspects of participation on Wikipedia, including how to communicate with other editors, how to cite sources, and how to ensure that edits present a neutral point of view. The sequence of the missions gives newbies an overview of what they need to know instead of having to figure everything out themselves. Additionally, the theme and tone of the tutorial sought to engage new users, rather than just redirecting them to the troves of policy pages.
Those who play the tutorial receive automated badges on their user page for every mission they complete. This signals to veteran editors that the user is acting in good-faith by attempting to learn the norms of Wikipedia.
This blog post, and the open access paper that it describes, is a collaborative project with Sneha Narayan, Jake Orlowitz, Jonathan Morgan, and Aaron Shaw. Financial support came from the US National Science Foundation (grants IIS-1617129 and IIS-1617468), Northwestern University, and the University of Washington. We also published all the data and code necessary to reproduce our analysis in a repository in the Harvard Dataverse. Sneha posted the material in this blog post over on the Community Data Science Collective Blog.
At one point the follower blocks, it said I have slightly more followers than I do. And, that was kind of confusing when I was trying to make the project. [ ] I pulled up a second [browser] tab and compared the [data from Scratch Community Blocks and the data in my profile]. Third, children discussed the hidden assumptions and decisions that drive the construction of metrics. For example, the number of views received for each project in Scratch is counted using an algorithm that tries to minimize the impact of gaming the system (similar to, for example, Youtube). As children started to build programs with data, they started to uncover and speculate about the decisions behind metrics. For example, they guessed that the view count might only include unique views and that view counts may include users who do not have accounts on the website. Fourth, children building projects with Scratch Community Blocks realized that an algorithm driven by social data may cause certain users to be excluded. For example, a 13-year-old expressed concern that the system could be used to exclude users with few social connections saying:
I think this was a great idea! I am just a bit worried that people will make these projects and take it the wrong way, saying that followers are the most important thing in on Scratch. Kids conversations around Scratch Community Blocks are good news for educators who are starting to think about how to engage young learners in thinking critically about the implications of data. Although no kid using Scratch Community Blocks discussed each of the five literacies described above, the themes reflect starting points for educators designing ways to engage kids in thinking critically about data. Our work shows that if children are given opportunities to actively engage and build with social and behavioral data, they might not only learn how to do data analysis, but also reflect on its implications.
This blog post was written with Charlie Kiene. The paper and work this post describes is collaborative work with Charlie Kiene and Andr s Monroy-Hern ndez. The paper was published in the Proceedings of CHI 2016 and is released as open access so anyone can read the entire paper here. A version of this post was published on the Community Data Science Collective blog.
As children use digital media to learn and socialize, others are collecting and analyzing data about these activities. In school and at play, these children find that they are the subjects of data science. As believers in the power of data analysis, we believe that this approach falls short of data science s potential to promote innovation, learning, and power.
Motivated by this fact, we have been working over the last three years as part of a team at the MIT Media Lab and the University of Washington to design and build a system that attempts to support an alternative vision: children as data scientists. The system we have built is described in a new paper Scratch Community Blocks: Supporting Children as Data Scientists that will be published in the proceedings of CHI 2017.
Our system is built on top of Scratch, a visual, block-based programming language designed for children and youth. Scratch is also an online community with over 15 million registered members who share their Scratch projects, remix each others work, have conversations, provide feedback, bookmark or love projects they like, follow other users, and more. Over the last decade, researchers including us have used the Scratch online community s database to study the youth using Scratch. With Scratch Community Blocks, we attempt to put the power to programmatically analyze these data into the hands of the users themselves.
To do so, our new system adds a set of new programming primitives (blocks) to Scratch so that users can access public data from the Scratch website from inside Scratch. Blocks in the new system gives users access to project and user metadata, information about social interaction, and data about what types of code are used in projects. The full palette of blocks to access different categories of data is shown below.
I was trying to think of something that somebody hadn t done yet, and I didn t see that. And also I really like to do art on Scratch and that was a good opportunity to use that and mix the two [art and data] together.We also found at least some evidence that the system supported self-reflection with data. For example, after seeing a project that showed its viewers a visualization of their past coding vocabulary, a 15-year-old realized that he does not do much programming with the pen-related primitives in Scratch, and wrote in a comment, epic! looks like we need to use more pen blocks. :D.
Adults worry a lot these days. Especially, they worry about how to make other people learn more about computers. They want to make us all computer-literate. Literacy means both reading and writing, but most books and courses about computers only tell you about writing programs. Worse, they only tell about commands and instructions and programming-language grammar rules. They hardly ever give examples. But real languages are more than words and grammar rules. There s also literature what people use the language for. No one ever learns a language from being told its grammar rules. We always start with stories about things that interest us.In a new paper titled Remixing as a pathway to Computational Thinking that was recently published at the ACM Conference on Computer Supported Collaborative Work and Social Computing (CSCW) conference, we used a series of quantitative measures of online behavior to try to uncover evidence that might support the theory that remixing in Scratch is positively associated with learning. Of course, because Scratch is an informal environment with no set path for users, no lesson plan, and no quizzes, measuring learning is an open problem. In our study, we built on two different approaches to measure learning in Scratch. The first approach considers the number of distinct types of programming blocks available in Scratch that a user has used over her lifetime in Scratch (there are 120 in total) something that can be thought of as a block repertoire or vocabulary. This measure has been used to model informal learning in Scratch in an earlier study. Using this approach, we hypothesized that users who remix more will have a faster rate of growth for their code vocabulary. Controlling for a number of factors (e.g. age of user, the general level of activity) we found evidence of a small, but positive relationship between the number of remixes a user has shared and her block vocabulary as measured by the unique blocks she used in her non-remix projects. Intriguingly, we also found a strong association between the number of downloads by a user and her vocabulary growth. One interpretation is that this learning might also be associated with less active forms of appropriation, like the process of reading source code described by Minksy. The second approach we used considered specific concepts in programming, such as loops, or event-handling. To measure this, we utilized a mapping of Scratch blocks to key programming concepts found in this paper by Karen Brennan and Mitchel Resnick. For example, in the image below are all the Scratch blocks mapped to the concept of loop . We looked at six concepts in total (conditionals, data, events, loops, operators, and parallelism). In each case, we hypothesized that if someone has had never used a given concept before, they would be more likely to use that concept after encountering it while remixing an existing project. Using this second approach, we found that users who had never used a concept were more likely to do so if they had been exposed to the concept through remixing. Although some concepts were more widely used than others, we found a positive relationship between concept use and exposure through remixing for each of the six concepts. We found that this relationship was true even if we ignored obvious examples of cutting and pasting of blocks of code. In all of these models, we found what we believe is evidence of learning through remixing. Of course, there are many limitations in this work. What we found are all positive correlations we do not know if these relationships are causal. Moreover, our measures do not really tell us whether someone has understood the usage of a given block or programming concept.However, even with these limitations, we are excited by the results of our work, and we plan to build on what we have. Our next steps include developing and utilizing better measures of learning, as well as looking at other methods of appropriation like viewing the source code of a project.
This blog post and the paper it describes are collaborative work with Sayamindu Dasgupta, Andr s Monroy-Hern ndez, and William Hale. The paper is released as open access so anyone can read the entire paper here. This blog post was also posted on Sayamindu Dasgupta s blog and on Medium by the MIT Media Lab.
Unfortunately, a series of recent legal rulings have forced us to suspend our campaign. In 2015, Time Warner s copyright claim to Happy Birthday was declared invalid. In 2016, a settlement was announced that calls for a judge to officially declare that the song is in the public domain. This is horrible news for the future of music. It is horrible news for anybody who cares that creators, their heirs, etc., are fairly remunerated when their work is performed. What incentive will there be for anybody to pen the next Happy Birthday knowing that less than a century after their deaths their estates and the large multinational companies that buy their estates might not be able to reap the financial rewards from their hard work and creativity? We are currently planning a campaign to push for a retroactive extension of copyright law to place Happy Birthday, and other works, back into the private domain where they belong! We believe this is a winnable fight. After all, copyright has been retroactively extended before! Stay tuned! In the meantime, we ll keep this page here for historical purposes.
Copyrighteous Benjamin Mako Hill (2016-02-11)
WHEN: Wednesday, January 13 at 6:30-9:30 p.m.
WHERE: Communications Building (CMU) 120, University of Washington
We invite you to celebrate the life and activism efforts of Aaron Swartz, hosted by UW Communication professor Benjamin Mako Hill. The event is next week and will consist of a short book reading, a screening of a documentary about Aaron s life, and a Q&A with Mako who knew Aaron well details are below. No RSVP required; we hope you can join us.
Aaron Swartz was a programming prodigy, entrepreneur, and information activist who contributed to the core Internet protocol RSS and co-founded Reddit, among other groundbreaking work. However, it was his efforts in social justice and political organizing combined with his aggressive approach to promoting increased access to information that entangled him in a two-year legal nightmare that ended with the taking of his own life at the age of 26.
January 11, 2016 marks the third anniversary of his death. Join us two days later for a reading from a new posthumous collection of Swartz s writing published by New Press, a showing of The Internet s Own Boy (a documentary about his life), and a Q&A with UW Communication professor Benjamin Mako Hill a former roommate and friend of Swartz and a contributor to and co-editor of the first section of the new book. If you re not in Seattle, there are events with similar programs being organized in Atlanta, Chicago, Dallas, New York, and San Francisco. All of these other events will be on Monday January 11 and registration is required for all of them. I will be speaking at the event in San Francisco.