Bug in git svn clone, failing at r1000

There’s an interesting bug I ran across tonight while converting a large SVN repository (containing over 4100 revisions and 52 400 objects) to Git, using git svn clone. The clone failed at r1000 and left me with an incomplete repository.

I did some searching and came across several articles on StackOverflow. The one that is still open in my browser is this one, which sums it up quite well:

Is it possible to clone a git repository which has more than one revisions? We tried to do it, and after the 1000′th commit it does a GC and exits leaving the clone in a unusable state.

While I didn’t have duplication in my .config file per the above question, I did have to run git svn fetch manually, several more times, to get the rest of the SVN revisions down properly.

My whole aim of converting is fairly obvious: Git is faster and uses less space than SVN, doesn’t have those crazy .svn folders everywhere, not to mention its superior branch and merge functionality. For comparative purposes, I was able to go from 176MB to 86.5MB (a saving of 51%) by converting this single repository.

The original source control used on this project was Visual SourceSafe, which I converted to SVN a couple of years ago using VSS2SVN. The VSS repository was over 1GB in size, so I’ve effectively recovered over 91% of my storage, just by using proper source control.

ShutOff 2000 progress report

I got a new laptop in March, which includes an SSD and 16GB of RAM, and it was enough inspiration (over and above the virtual machines for SQL Server that I’ve been playing with) to work some more on ShutOff 2000.

I’ve made astounding progress on it, considering the lethargic approach the C# version has had since 2002. Ten years? Ouch …

The latest version of the VB6 edition (v2.8.6) was released on 31 March 2011, just over a year ago. After that, I worked on version 2.8.7, which included some neat new features for the software updater code. However, this code remains unreleased to the public because, well, 3.0.0 (the C# version) is almost done!

In the last few weeks, I’ve completely refactored some chunks of code that were badly designed in the first place (As Jeff Atwood says, the worst code you’ve looked at is your own). Also, instead of using the Windows Registry to store settings, I’m going with a SQL Compact Edition database file. I picked this over SQLite for the easier coding (though I prefer the idea of SQLite and may change my mind in a later version).

My opinion on the piracy problem is pragmatic, as long-time readers will already know. When the Core group cracked ShutOff 2000 in the early part of the last decade, I was honoured that they took the time. In fact, their key generator had a better UI than my own and I took some design tips from it.

In any event, I think the C# version is ready to go in the next couple of weeks. I need to do some testing still, and then there’s the small matter of how to package and release it. I’ve toyed with the idea of an automated registration system, which is very different to the manual process I currently have in place. For every new registration, I manually generate a registration code and email the purchaser directly. Madness, but there you are.

The latest beta is available from the information page.

The other thing I’ve been playing with a lot more is Git for source control. I decided that I didn’t like SVN after all, and I already use Git for managing my website source control on the Mac, so it was a no-brainer to use it on Windows too.

To sum up this rambling post, this is one more step in the process of sorting out my life. I’m getting on top of things I’ve left too long. Spring cleaning, I suppose.

Permanent Hazard

This Easter weekend, we went to Los Angeles to sort out our permanent residence for Canada.

Yes, I realise how silly that sounds, but there’s a good reason.

When you live in Canada, you’re supposed to send off your PR application to somewhere in the US (usually Buffalo, NY). In our case, we did this, but they ran into delays due to the massive volume, and forwarded our application to Los Angeles.

Now the fun part: in order to have our passports sent back to us, we had to send them in with a USPS pre-paid envelope of certain size, to get the passports back. We could also have organised for a courier to collect the passports.

As it turns out (we visited many websites and contacted people to send us envelopes from the US – thanks, Dan and Aidan for the offer!), it was less stressful for us to travel there directly, because the passports would be stamped on the same day.

Off we went, and while there, did some touring of Hollywood, including Universal Studios and the Key Club, where we saw Sam Sparro live (and Adam Lambert dancing on a table).

All in all, a most enlightening experience.

Adventures in @BlobEater, sp_WhoIsActive and DBCC CHECKDB

The latest version of sp_WhoIsActive (and apparently last for SQL Server 2005/2008, according to Adam Machanic), was released a week ago.

Yesterday, out of curiosity piqued by my recent three-week SQLskills training, I decided to run sp_WhoIsActive while doing a DBCC CHECKDB on a moderately large database (mainly because it takes longer, so it’s easier to poll). The version was SQL Server 2008 R2 with Service Pack 1 (10.50.2500).

Referring to my notes of how CHECKDB works, I was excited to find sp_WhoIsActive capturing one of the queries that runs during the fact-checking phase:

DECLARE @BlobEater VARBINARY(8000)

SELECT @BlobEater = CheckIndex (ROWSET_COLUMN_FACT_BLOB)
FROM { IRowset 0x30FAC98600000000 }
GROUP BY ROWSET_COLUMN_FACT_KEY
>> WITH ORDER BY
ROWSET_COLUMN_FACT_KEY,
ROWSET_COLUMN_SLOT_ID,
ROWSET_COLUMN_COMBINED_ID,
ROWSET_COLUMN_FACT_BLOB
OPTION (ORDER GROUP)

I thought this was cool. Knowing how CHECKDB checks that the relationships are consistent in theory is one thing; seeing it in practice is something better. I’ve often found myself preferring an example to demonstrate a theory, to improve my understanding of that theory. Hopefully this will be helpful to you as well.

I also love the name “BlobEater”.

EDIT: I also ran across this query during the Update Statistics phase of a maintenance plan:

SELECT StatMan([SC0], [LC0])
FROM (SELECT TOP 100 PERCENT SUBSTRING ([object], 1, 100)++substring([object],
case when datalength([object])<=200 then 101 else datalength([object])-99 end,
datalength([object])) AS [SC0],
datalength([object]) AS [LC0]
FROM [dbo].[activitylog] WITH (READUNCOMMITTED)
ORDER BY [SC0] ) AS _MS_UPDSTATS_TBL

My laptop is crying

I’ve finally got all four virtual machines required (per Jonathan’s blog series), up and running on my laptop, and it is taking serious strain. I’m inclined, in fact, to change all the machines from 2 vCPUs to a single vCPU each, because that’s what’s killing the machine.

Then again, I never considered until February 2012 that I’d be running Windows Server 2008 R2 Enterprise Edition on four virtual machines, simultaneously, on an Asus Core 2 Duo (T7250) 2GHz laptop with 4GB RAM, purchased in February 2008.

The computer is under so much strain that I’ve offloaded the browser to my Mac, to free up some RAM.

Wish me luck with the SQL Server installation …

EDIT: Changing the CPUs to 1 virtual CPU, and dropping the cap from 100% to 40% has made all the difference to the performance. Wonderful!

And he’s back …

A week ago, I had just returned from a three-week sojourn in sunny Tampa, where I spent most of the time indoors, far away from the sun, relearning everything about SQL Server. Those three weeks are best summarised in my previous blog entry, where I described Immersion Event 1.

The result was that I wanted to dive deep into the product. I wanted to spend hours reading through transaction logs, my new books, the training material, and blogs of the people I met. Most of all, I wanted to really focus on getting the MCM.

After catching my breath, and reading email (over 1000 emails were summarily “filed”), and various other things related to home life, I felt that motivation to get my MCM slipping away. Real life, “normal”, day to day activities are holding my attention more and more. For instance, our permanent residence letter arrived while I was away, and we will fly to LA in a couple of weeks to hand in our passports (why LA? It’s a very long story).

At the clinic, I had to set up a new firewall which has been sitting in a box for a couple of months. We are also hiring new staff, so I had to get involved there too. And at home, I’ve started capturing the hundreds of receipts for our annual tax return.

However, it wasn’t all bad. I’ve managed to get three out of four virtual machines up and running on my antique laptop to start playing with SQL failover clustering. And I’m slowly making plans in my head for tackling the four preliminary exams that are required before I can do the MCM knowledge exam.

So my point is, I need to maintain my focus. It is way too easy to be distracted. So next week, in between capturing receipts and so on, I’ll do some practise exams. I’ll play with clustering. I’ll revise the clinic’s disaster recovery strategy. Anything to keep my eye on the prize. I want a SQL Server MCM, and I’m going to get it.

Remove duplicate indexes in SQL Server 2000

With permission from Kimberly Tripp, the creator of the fantastic duplicate index finder for SQL Server 2005 and 2008, I have ported this duplicate index finder to SQL Server 2000.

I am fortunate in many respects, in that SQL Server 2000 does not support included columns, nor disabled indexes, so the scripts themselves were straightforward.

The duplicate finder is based on a modified sp_helpindex (the standard stored procedure in SQL Server 2000 for providing information on indexes), using the general layout of the sp_SQLskills_SQL2008_helpindex script.

My original approach was to do a line-by-line conversion. As I pointed out in a previous post, the bitwise operators in the system tables are not for the faint-hearted, especially since all the hard work has already been done in the built-in stuff.

With that in mind, I threw away all my code and started afresh on Monday. It took about two hours in total, with some testing here and there, leveraging the existing sp_helpindex, and this evening I presented the greatly simplified scripts at the Immersion Event I’m attending (Week 2) in Tampa.

Bitwise operators aside, the challenge came in with the duplicate finder portion, which in Kimberly’s script relies a lot on NVARCHAR(MAX). Using the same method Paul Randal did for coming up with the index fragmentation thresholds in Books Online, I figured that 1200 characters should be sufficient to carry the values in the column, index and DROP fields respectively.

I believe Kimberly will make the scripts available on her blog at some stage, but you can grab them from here. I’ve zipped both scripts (sp_SQLskills_SQL2000_helpindex and sp_SQLskills_SQL2000_finddupes) into one zip file, which takes up a whole 5.30KB.

My brain hurts

Kimberly Tripp has a very neat series of stored procedures on her blog which are rewrites of sp_helpindex.

Unfortunately for those of us with customers still using SQL Server 2000, these scripts are useless, because they are based on DMVs and catalog views built into SQL Server 2005 and higher.

With her permission, I have begun porting the scripts to SQL Server 2000. My biggest challenge so far has been trying to identify whether a column has a unique constraint that is not the primary key. I spent over an hour going through the various system tables (and Google), and came across this piece of code, courtesy of a user on dbForums called “fadace”:

SELECT object_name(id), name
FROM sysindexes
WHERE status & 2 = 2

Combining this into an EXISTS clause has solved my problem, but this is just one example of why our jobs on SQL Server 2000 are more complex to solve, given the advances in SQL Server 2005 and higher.

For reference, my new code looks like this:

SELECT @nonclus_uniq = CAST(status & 2 AS bit)
FROM sysindexes
WHERE id = @object_id AND indid = @index_id

My first week of three with SQLskills Immersion Events

There are words that can accurately describe how I’m feeling right now. These would include exhausted, drained, tired, sleepy, somnolent, and soporific.

Then there are other words that can describe my state of mind, which would also be accurate: inspired, grateful, excited, awed, amazed, and invigorated.

The latter have resulted in the former, and while I take the weekend to catch up and relax, I need to gush a little.

Without question, this last week has been the single best week of training I’ve had in fifteen years. Not since 1997 have I felt this close to the material, this excited about learning, and wanting to spend every night working with what I’ve learnt during the day.

In fact, I’ve been so invested in this training that I forgot about many things that are happening elsewhere in my life. “Immersion Event” is more than just a name. I think Paul and Kimberly have produced a startlingly brilliant course that, without preparation, would have taken me off-guard. The many hours I spent (and continue to spend) watching MCM videos and reading books have paid off. The effort I made to play with SQL Server outside of my comfort zone is also starting to pay off.

As I revise the information we’ve gone through this past week, I wonder on the one hand whether I can absorb any more information, and on the other hand whether there is more information I can find to supplement what I have learnt.

I realised this week that I had large gaps in my knowledge. I knew this going in, but I underestimated those gaps. To be so immersed in something that it becomes a thing of beauty is a wonderful, tiring experience. And there are two more weeks to go!

It certainly didn’t hurt winning an MCM knowledge exam voucher in the process, because now I am even more driven to achieve what I set out to in 1997: to be the best I can be at whatever I set my mind to.

Thank you, SQL Skills. Thank you, Paul and Kimberly. And of course my wonderful “Spouse A” for making this possible.