Thoughts on the Damore Manifesto

I’ve shared a few articles on Facebook regarding the now infamous “manifesto” (available in full here) written by James Damore.  But I’m (finally) writing my own response to it because being black makes me part of a group even more poorly represented in computer science (to say nothing of other STEM fields) than women (though black women are even less represented in STEM fields).

One of my many disagreements with Damore’s work (beyond its muddled and poorly written argument) is how heavily it leans on citations of very old studies. Even if such old studies were relevant today, more current and relevant data debunks the citations Damore uses. To cite just two examples:

Per these statistics, women are not underrepresented at the undergraduate level in these technical fields and only slightly underrepresented once they enter the workforce.  So how is it that we get to the point where women are so significantly underrepresented in tech?  Multiple recent studies suggest that factors such as isolation, hostile male-dominated work environments, ineffective executive feedback, and a lack of effective sponsors lead women to leave science, engineering and technology fields at double the rate of their male counterparts.  So despite Damore’s protestations, women are earning entry-level STEM degrees at roughly the same rate as men and are pushed out.

Particularly in the case of computing, the idea that women are somehow biologically less-suited for software development as a field is proven laughably false by simply looking at the history of computing as a field.  Before computers were electro-mechanical machines, they were actually human beings–often women. The movie Hidden Figures dramatized the role of black women in the early successes of the manned space program, but many women were key to advances in computing both before and after that time.  Women authored foundational work in computerized algebra, wrote the first compiler, were key to the creation of Smalltalk (the first object-oriented programming language), helped pioneer information retrieval and natural language process, and much more.

My second major issue with the paper is its intellectual dishonesty.  The Business Insider piece I linked earlier covers the logical fallacy at the core of Damore’s argument very well.  This brilliant piece by Dr. Cynthia Lee (computer science lecturer at Stanford) does it even better and finally touches directly on the topic I’m headed to next: race.  Dr. Lee notes quite insightfully that Damore’s citations on biological differences don’t extend to summarizing race and IQ studies as an explanation for the lack of black software engineers (either at Google or industry-wide).  I think this was a conscious omission that enabled at least some in the press who you might expect to know better (David Brooks being one prominent example) to defend this memo to the point of saying the CEO should resign.

It is also notable that though Damore claims to “value diversity and inclusion”, he objects to every means that Google has in place to foster them.  His objections to programs that are race or gender-specific struck a particular nerve with me as a University of Maryland graduate who was attending the school when the federal courts ruled the Benjamin Banneker Scholarship could no longer be exclusively for black students.  The University of Maryland had a long history of discrimination against blacks students (including Thurgood Marshall, most famously).  The courts ruled this way despite the specific history of the school (which kept blacks out of the law school until 1935 and the rest of the university until 1954.  In the light of that history, it should not be a surprise that you wouldn’t need an entire hand to count the number of black graduates from the School of Computer, Mathematical and Physical Sciences in the winter of 1996 when I graduated.  There were only 2 or 3 black students, and I was one of them (and I’m not certain the numbers would have improved much with a spring graduation).

It is rather telling how seldom preferences like legacy admissions at elite universities (or the preferential treatment of the children of large donors) are singled out for the level of scrutiny and attack that affirmative action receives.  Damore and others of his ilk who attack such programs never consider how the K-12 education system of the United States, funded by property taxes, locks in the advantages of those who can afford to live in wealthy neighborhoods (and the disadvantages of those who live in poor neighborhoods) as a possible cause for the disparities in educational outcomes.

My third issue with Damore’s memo is the assertion that Google’s hiring practices can effectively lower the bar for “diversity” candidates.  I can say from my personal experience with at least parts of the interviewing processes at Google (as well as other major names in technology like Facebook and Amazon) that the bar to even get past the first round, much less be hired is extremely high.  They were, without question, the most challenging interviews of my career to date (19 years and counting). A related issue with representation (particularly of blacks and Hispanics) at major companies like these is the recruitment pipeline.  Companies (and people who were computer science undergrads with me who happen to be white) often argue that schools aren’t producing enough black and Hispanic computer science graduates.  But very recent data from the Department of Education seems to indicate that there are more such graduates than companies acknowledge. Furthermore, these companies all recruit from the same small pool of exclusive colleges and universities despite the much larger number of schools that turn out high quality computer science graduates on an annual basis (which may explain the multitude of social media apps coming out of Silicon Valley instead of applications that might meaningfully serve a broader demographic).

Finally, as Yonatan Zunger said quite eloquently, Damore appears to not understand engineering.  Nothing of consequence involving software (or a combination of software and hardware) can be built successfully without collaboration.  The larger the project or product, the more necessary collaboration is.  Even the software engineering course that all University of Maryland computer science students take before they graduate requires you to work with a team to successfully complete the course.  Working effectively with others has been vital for every system I’ve been part of delivering, either as a developer, systems analyst, dev lead or manager.

As long as I have worked in the IT industry, regardless of the size of the company, it is still notable when I’m not the only black person on a technology staff.  It is even rarer to see someone who looks like me in a technical leadership or management role (and I’ve been in those roles myself a mere 6 of my 19 years of working).  Damore and others would have us believe that this is somehow the just and natural order of things when nothing could be further from the truth.  If “at-will employment” means anything at all, it appears that Google was within its rights to terminate Damore’s employment if certain elements of his memo violated the company code of conduct.  Whether or not Damore should have been fired will no doubt continue to be debated.  But from my perspective, the ideas in his memo are fairly easily disproven.

Entity Framework Code First to a New Database (Revised Again)

As part of hunting for a new employer (an unfortunate necessity due to layoffs), I’ve been re-acquainting myself with the .NET stack after a couple of years building and managing teams of J2EE developers.  MSDN has a handy article on Entity Framework Code First, but the last update was about a year ago and some of the information hasn’t aged so well.

The first 3 steps in the article went as planned (I’m using Visual Studio 2017 Community Edition).  But once I got to step 4, neither of the suggested locations of the database worked per the instructions.  A quick look in App.config revealed what I was missing:

Once I provided the following value for the server name:

(localhostdb)\mssqllocaldb

database I could connect to revealed themselves and I was able to inspect the schema.  Steps 5-7 worked without modifications as well.  My implementation of the sample diverged slightly from the original in that I refactored the five classes out of Program.cs into separate files.  This didn’t change how the program operated at all–it just made for a simpler Program.cs file.  The code is available on GitHub.

Podcast Episodes Worth Hearing

Since I transitioned from a .NET development role into a management role 2 years ago, I hadn’t spent as much time as I used to listening to podcasts like Hanselminutes and .NET Rocks.  My commute took longer than usual today though, so I listened to two Hanselminutes episodes from December 2016.  Both were excellent, so I’m thinking about how to apply what I’ve heard directing an agile team on my current project.

In Hanselminutes episode 556, Scott Hanselman interviews Amir Rajan.  While the term polyglot programmer is hardly new, Rajan’s opinions on what programming languages to try next based on the language you know best were quite interesting.  While my current project is J2EE-based, between the web interface and test automation tools, there are plenty of additional languages that my team and others have to work in (including JavaScript, Ruby, Groovy, and Python).

Hanselminutes episode 559 was an interview with Angie Jones.  I found this episode particularly useful because the teams working on my current project include multiple automation engineers.  Her idea to include automation in the definition of done is an excellent one.  I’ll definitely be sharing her slide deck on this topic with my team and others..

Best Practices for Software Testing

I originally wrote the following as an internal corporate blog post to guide a pair of business analysts responsible for writing and unit testing business rules. The advice below applies pretty well to software testing in general.

80/20 Rule

80% of your test scenarios should cover failure cases, with the other 20% covering success cases.  Too much of testing (unit testing or otherwise) seems to cover the happy path.  A 4:1 ratio of failure case tests to success case tests will result in more durable software.

Boundary/Range Testing

Given a range of valid values for an input, the following tests are strongly recommended:

  • Test of behavior at minimum value in range
  • Test of behavior at maximum value in range
  • Tests outside of valid value range
    • Below minimum value
    • Above maximum value
  • Test of behavior within the range

The following tests roughly conform to the 80/20 rule, and apply to numeric values, dates and times.

Date/Time Testing

Above and beyond the boundary/range testing described above, the testing of dates creates a need to test how code handles different orderings of those values relative to each other.  For example, if a method has a start and end date as inputs, you should test to make sure that the code responds with some sort of error if the start date is later than the end date.  If a method has start and end times as inputs for the same day, the code should respond with an error if the start time is later than the end time.  Testing of date or date/time-sensitive code must include an abstraction to represent current date and time as a value (or values) you choose, rather than the current system date and time.  Otherwise, you’ll have no way to test code that should only be executed years in the future.

Boolean Testing

Given that a boolean value is either true or false, testing code that takes a boolean as an input seems quite simple.  But if a method has multiple inputs that can be true or false, testing that the right behavior occurs for every possible combination of those values becomes less trivial.  Combine that with the possibility of a null value, or multiple null values being provided (as described in the next section) and comprehensive testing of a method with boolean inputs becomes even harder.

Null Testing

It is very important to test how a method behaves when it receives null values instead of valid data.  The method under test should fail in graceful way instead of crashing or displaying cryptic error messages to the user.

Arrange-Act-Assert

Arrange-Act-Assert is the organizing principle to follow when developing unit tests.  Arrange refers to the work your test should do first in order to set up any necessary data, creation of supporting objects, etc.  Act refers to executing the scenario you wish to test.  Assert refers to verifying that the outcome you expect is the same as the actual outcome.  A test should have just one assert.  The rationale for this relates to the Single Responsibility Principle.  That principles states that a class should have one, and only one, reason to change.  As I apply that to testing, a unit test should test only one thing so that the reason for failure is clear if and when that happens as a result of subsequent code changes.  This approach implies a large number of small, targeted tests, the majority of which should cover failure scenarios as indicated by the 80/20 Rule defined earlier.

Test-First Development & Refactoring

This approach to development is best visually explained by this diagram.  The key thing to understand is that a test that fails must be written before the code that makes the test pass.  This approach ensures that test is good enough to catch any failures introduced by subsequent code changes.  This approach applies not just to new development, but to refactoring as well.  This means, if you plan to make a change that you know will result in broken tests, break the tests first.  This way, when your changes are complete, the tests will be green again and you’ll know your work is done.  You can find an excellent blog post on the subject of test-driven development by Bob Martin here.

Other Resources

I first learned about Arrange-Act-Assert for unit test organization from reading The Art of Unit Testing by Roy Osherove.  He’s on Twitter as @RoyOsherove.  While it’s not just about testing, Clean Code (by Bob Martin) is one of those books you should own and read regularly if you make your living writing software.

Software Development Roles: Lead versus Manager

I’ve held the title of development lead and development manager at different points in my technology career. With the benefit of hindsight, one of the roles advertised and titled as the latter was actually the former. One key difference between the two roles boils down to how much of your time you spend writing code. If you spend half or more your time writing code, you’re a lead, even if your business cards have “manager” somewhere in the title. If you spend significantly less than half your time writing code, then the “manager” in your title is true to your role. When I compare my experience between the two organizations, the one that treats development lead and development manager as distinct roles with different responsibilities has been not only been a better work environment for me personally, but has been more successful at consistently delivering software that works as advertised.

A company can have any number of motivations for giving management responsibilities to lead developers. The organization may believe that a single person can be effective both in managing people and in delivering production code. They may have a corporate culture where only minimal amount of management is needed and developers are self-directed. Perhaps their implementation of a flat organizational structure means that developers take on multiple tasks beyond development (not uncommon in startup environments). If a reasonably-sized and established company gives lead and management responsibilities to an individual developer or developers however, it is also possible that there are budgetary motivations for that decision. The budgetary motivation doesn’t make a company bad (they’re in business to make money after all). It is a factor worth considering when deciding whether or not a company is good for you and your career goals.

Being a good lead developer is hard. In addition to consistently delivering high-quality code, you need to be a good example and mentor to less-senior developers. A good lead developer is a skilled troubleshooter (and guide to other team members in the resolution of technical problems). Depending on the organization, they may hold significant responsibility for application architecture. Being a good development manager is also hard. Beyond the reporting tasks that are part of every management role, they’re often responsible for removing any obstacles that are slowing or preventing the development team from doing work. They also structure work and assign it in a way that contributes to timely delivery of functionality. The best development managers play an active role in the professional growth of developers on their team, along with annual reviews. Placing the responsibility for these two challenging roles on a single person creates a role that is incredibly demanding and stressful. Unless you are superhuman, sooner or later your code quality, your effectiveness as a manager, or both will suffer. That outcome isn’t good for you, your direct reports, or the company you work for.

So, if you’re in the market for a new career opportunity, understand what you’re looking for. If a development lead position is what you want, scrutinize the job description. Ask the sort of questions that will make clear that a role being offered is truly a development lead position. If you desire a development management position, look at the job description. If hands-on development is half the role or more, it’s really a development lead position. If you’re indeed superhuman (or feel the experience is too valuable to pass up), go for it. Just be aware of the size of the challenge you’re taking on and the distinct possibility of burnout. If you’re already in a job that was advertised as a management position but is actually a lead position, learn to delegate. This will prove especially challenging if you’re a skilled enough developer to have landed a lead role, but allowing individual team members to take on larger roles in development will create the bandwidth you need to spend time on the management aspects of your job. Finally, if you’re an employer staffing up a new development team or re-organizing existing technology staff, ensure the job descriptions for development lead and development manager are separate. Whatever your software product, the end result will be better if you take this approach.

Security Breaches and Two-Factor Authentication

It seems the news has been rife with stories of security breaches lately.  As a past and present federal contractor, the OPM breach impacted me directly.  That and one other breach impacted my current client.  The lessons I took from these and earlier breaches were:

  1. Use a password manager
  2. Enable 2-factor authentication wherever it’s offered

To implement lesson 1, I use 1Password.  It runs on every platform I use (Mac OS X, iOS and Windows), and has browser plug-ins for the browsers I use most (Chrome, Safari, IE).  Using the passwords 1Password generates means I no longer commit the cardinal security sin of reusing passwords across multiple sites.  Another nice feature specific to 1Password is Watchtower.  If a site where you have a username and password is compromised, the software will indicate that site is vulnerable so you know to change your password.  1Password even has a feature to flag sites with the Heartbleed vulnerability.

The availability of two-factor authentication has been growing (somewhat unevenly, but any growth is good), but it wasn’t until I responded to a tweet from @felixsalmon asking about two-factor authentication that I discovered how loosely some people define two-factor authentication.  According to this New York Times interactive piece, most U.S. banks offer two-factor authentication.  That statement can only be true if “two-factor” is defined as “any item in addition to a password”.  By that loose standard, most banks do offer two-factor authentication because the majority of them will prompt you for an additional piece of “out of wallet” information if you attempt to log in from a device with an IP address they don’t recognize.  Such out-of-wallet information could be a parent’s middle name, your favorite food, the name of your first pet, or some other piece of information that only you know.  While it’s better than nothing, I don’t consider it true two-factor authentication because:

  1. Out-of-wallet information has to be stored
  2. The out-of-wallet information might be stored in plain-text
  3. Even if out-of-wallet information is stored hashed, hashed & salted, or encrypted with one bank, there’s no guarantee that’s true everywhere the information is stored (credit bureaus, health insurers, other financial institutions you have relationships with, etc)

One of the things that seems clear after the Get Transcript breach at IRS is that the thieves had access to the out-of-wallet information of their victims, either because they purchased the information, stole it, or found it on social media sites they used.

True two-factor authentication requires a time-limited, randomly-generated piece of additional information that must be provided along with a username and password to gain access to a system.  Authentication applications like the ones provided by Google or Authy provide a token (a 6-digit number) that is valid for 30-60 seconds.  Some systems provide this token via SMS so a specific application isn’t required.  By this measure, the number of banks and financial institutions that support is quite a bit smaller.  One of the other responses to the @felixsalmon tweet was this helpful URL: https://twofactorauth.org/.  The list covers a lot of ground, including domain registrars and cryptocurrencies, but might not cover the specific companies and financial institutions you work with.  In my case, the only financial institution I currently work with that offers true two-factor authentication is my credit union–Tower Federal Credit Union.  Hopefully every financial institution and company that holds our personal information will follow suit soon.

Pseudo-random Sampling and .NET

One of the requirements I received for my current application was to select five percent of entities generated by another process for further review by an actual person. The requirement wasn’t quite a request for a simple random sample (since the process generates entities one at a time instead of in batches), so the code I had to write needed to give each entity generated a five percent chance of being selected for further review.  In .NET, anything involving percentage chances means using the Random class in some way.  Because the class doesn’t generate truly random numbers (it generates pseudo-random numbers), additional work is needed to make the outcomes more random.

The first part of my approach to making the outcomes more random was to simplify the five percent aspect of the requirement to a yes or no decision, where “yes” meant treat the entity normally and “no” meant select the entity for further review.  I modeled this as a collection of 100 boolean values with 95 true and five false.  I ended up using a for-loop to populate the boolean list with 95 true values.  Another option I considered was using Enumerable.Repeat (described in great detail in this post), but apparently that operation is quite a bit slower.  I could have used Enumerable.Range instead, and may investigate the possibility later to see what advantages or disadvantages there are in performance and code clarity.

Having created the list of decisions, I needed to randomize their order.  To accomplish this, I used LINQ to sort the list by the value of newly-generated GUIDs:

decisions.OrderBy(d => Guid.NewGuid()) //decisions is a list of bool

With a randomly-ordered list of decisions, the final step was to select a decision from a random location in the list.  For that, I turned to a Jon Skeet post that provided a provided a helper class (see the end of that post) for retrieving a thread-safe instance of Random to use for generating a pseudo-random value within the range of possible decisions.  The resulting code is as follows:

return decisions.OrderBy(d => Guid.NewGuid()).ToArray()[RandomProvider.GetThreadRandom().Next(100)]; //decisions is a list of bool

I used LINQPad to test my code and over multiple executions, I got between 3 and 6 “no” results.

Complex Object Model Binding in ASP.NET MVC

In the weeks since my last post, I’ve been doing more client-side work and re-acquainting myself with ASP.NET MVC model binding.  The default model binder in ASP.NET MVC works extremely well.  In the applications I’ve worked on over the past 2 1/2 years, there have been maybe a couple of instances where the default model binder didn’t work the way I needed. The problems I’ve encountered with model binding lately have had more to do with read-only scenarios where certain data still needs to be posted back.  In the Razor template, I’ll have something like the following:

@Html.LabelFor(m => m.Entity.Person, "Person: ")
@Html.DisplayFor(m => m.Entity.Person.Name)
@Html.HiddenFor(m => m.Entity.Person.Name)

Nothing is wrong with the approach above if Name is a primitive (i.e. string).  But if I forgot that Name was a complex type (as I did on one occasion), the end result was that no name was persisted to our datastore (RavenDB) which meant that there was no data to bring back when the entity was retrieved.  The correct approach for leveraging the default model binder in such cases is this:

@Html.LabelFor(m => m.Entity.Person, "Person: ")
@Html.DisplayFor(m => m.Entity.Person.Name)
@Html.HiddenFor(m => m.Entity.Person.Name.FirstName)
@Html.HiddenFor(m => m.Entity.Person.Name.LastName)
@Html.HiddenFor(m => m.Entity.Person.Name.Id)

Since FirstName, LastName and Id are all primitives, the default model binder handles them appropriately and the data is persisted on postback.

XUnit: Beyond the Fact Attribute (Part 2)

One thing I initially missed about NUnit compared to XUnit (besides built-in support for it in tools like TeamCity) is attributes like SetUp and TestFixtureSetUp that enable you to decorate a method with variables that need to be set (or any other logic that needs to run) before each test or before all the tests in a test fixture.  When I first adopted test-driven development as a work practice, I felt it made things easier.  But the authors of NUnit eventually came to a different conclusion about those attributes, and implemented XUnit without those attributes as a result.

Rather than define attributes for per-test and per-fixture setup, the authors of XUnit recommend using a no-arg constructor where you’d use SetUp and IUseFixture where you’d use TestFixtureSetUp or TestFixtureTearDown.  While this took me some time to get used to, leveraging the interface made it easier to handle the external dependencies of code I needed to implement and test.  One technique I’ve adopted to give myself additional flexibility in my test implementations is to add an extension point to the implementation of the SetFixture method

In this example, the extension point is a method named AdditionalFixtureConfiguration.  Calling it inside SetFixture ensures it will be called before each test class derived from UnitTestBase.  Making the method virtual and keeping it free of implementation means that I only need to override it if I need additional pre-test setup for particular test scenarios.  Because we use StructureMap as our IOC container, the equivalent of UnitTestFixture class in my example has a public attribute of type StructureMap.IContainer.  The AdditionalFixtureConfiguration method provides a natural home for any setup code needed to configure additional mappings between interfaces and implementations, set up mocks, and even inject mocks into the container if a concrete implementation isn’t available (or isn’t needed).

While this is the implementation I’ve chosen, there are certainly other ways to accomplish the same thing.  Instead of defining AdditionalFixtureConfiguration in UnitTestBase, I could define it in UnitTestFixture instead and call it in every SetFixture implementation (or not , if that customization wasn’t needed).  I prefer having the abstract class because it makes for simpler actual test code.

When Third-Party Dependencies Attack

Last week provided our office with an inconvenient lesson in what can happen when third-party dependencies break in unanticipated ways.  PostSharp is a key third-party dependency in the line of business web application we sell.  On the morning of May 20, our continuous integration server (we use TeamCity) began indicating a build failure with the following message:

  • PostSharp.3.1.34\tools\PostSharp.targets(313, 5): error MSB6006: “postsharp.4.0-x86.exe” exited with code -199.

The changed file was a Razor template file–nothing at all to do with PostSharp.  Only one person on our development team was experiencing this error on their local machine, but the end result–not being able to compile the solution locally–pretty much eliminated the possibility of that person being productive for the rest of the day.  As the day progressed, the CI server began showing exactly the same error in other branches–even with no changes to code.  It wasn’t until the next day that we received the explanation (and a resolution).

Reading the entire explanation is worthwhile, but the key reason for the failure is this:

“we … assumed that all failures would be in the form of a managed exceptions. We did not anticipate that the library would terminate the process.”

The fail-safe code that PostSharp implemented around a third-party licensing component assumed all failures would be managed exceptions (which they could catch and deal with accordingly).  Instead, this third-party component simply terminated the process.  The end result–any of their customers using the latest version of PostSharp couldn’t compile any solution that included it.  There’s no way of knowing for sure how many hours of productivity (and money) was lost as a result of this component, but the amounts were probably significant.  To his credit, the CEO apologized, his development team removed the offending dependency and sacrificed the feature which depended on it.

There are many lessons to be learned (or re-learned) from what we experienced with PostSharp, but I’ll talk about three. First, if a third-party dependency is critical to your application and has a licensing option that includes support, it is best to pay the money so that you have recourse if and when there’s an issue. On the Microsoft stack, this is proving increasing costly as more third-party .NET libraries and tools raise their prices (there are quite a few formerly free .NET tools that have been purchased by companies and re-released as rather costly closed-source software).

Second, whether or not there are licensing costs, it’s a good idea to have more than one option for critical third-party dependencies. In the case of aspect-oriented programming on .NET, there are a number of alternatives to PostSharp. The vendor is even confident enough to list them on their website. So if licensing costs are significant enough a concern, it may be better to choose an open-source option that is less-convenient but gives you the ability to customize it than a paid option which doesn’t (and yokes you to a specific vendor).

Third, It may make sense to avoid taking on a third-party dependency altogether. When it comes to the Microsoft stack, it’s likely that they offer a framework or API with at least some of the capabilities you need for your solution. In the case of AOP, Microsoft offers Unity to support those capabilities. Especially in the case where you’re only considering the free tier of capabilities for a third-party dependency where Microsoft offers a product, if that free tier functionality isn’t a significant improvement, it may be best to stick with the Microsoft option.