Mon, 21 May 2007
Playing with the package file
For Meikes thesis, we needed a csv-File from the Debian-Package file, so I wrote a small script to do that. You can get it from my repository (or via websvn). (Not yet perfect, but worked well enough for us; patches welcome.)
If you just need the csv to copy the data in a database (like she
did), don't try to guess, how long a textfield (e.g. for depends or the
long description) should be; we tried and failed several times
(Sure... 2000 characters should be enough for a long
description.
). We finally used the following database (We used
PostgreSQL, should work similar on other database systems): CREATE TABLE packages
( package character varying (75) NOT NULL, source character varying (75),
priority character varying(10) NOT NULL, section character varying(20)
NOT NULL, installedsize integer NOT NULL, maintainer character
varying(150) NOT NULL, architecture character varying(4) NOT NULL,
version character varying(40) NOT NULL, depends character varying(5000),
conflicts character varying(5000), recommends character varying(5000),
suggests character varying(5000), enhances character varying(500),
predepends character varying(500), provides character varying(500),
replaces character varying(500), buildessential character varying(3),
essential character varying(3), filename character varying(200) NOT NULL,
md5sum character(33) NOT NULL, origin character varying(100), sha1
character(41) NOT NULL, sha256 character(65) NOT NULL, size integer NOT
NULL, tag character varying(1000), task character varying(400),
description text NOT NULL, longdescription character varying(30000),
CONSTRAINT pk PRIMARY KEY (package) );
.
Now that you've created a table, you need to fill it. Easiest method
is to use the COPY
statement, as follows: copy packages
(package, source, priority, section, installedsize, maintainer, version,
depends, conflicts, recommends, suggests, enhances, predepends, replaces,
buildessential, essential, filename, md5sum, origin, sha1, sha256, size,
tag, task, description, longdescription) FROM '/path/to/your/csv-file'
DELIMETER ';' CSV;
.
You should now have a nice database to play with :)
Some examples... Which are the packages with the longest longdescription?
debian_packages=> select package, length(longdescription) as length from packages order by length desc limit 5; package | length ------------------------+-------- texlive-latex-extra | 25337 texlive-fonts-extra | 5719 emacs-goodies-el | 4502 xbase-clients | 4403 postgresql-contrib-7.4 | 4223 (5 rows)
Or... Which maintainer have the most packages?
debian_packages=> select maintainer, count(*) as anzahl from packages group by maintainer order by anzahl desc limit 5; maintainer | anzahl ------------------------------------------------------------------+-------- Debian Qt/KDE Maintainers <debian-qt-kde@lists.debian.org> | 465 Debian QA Group <packages@qa.debian.org> | 458 Debian X Strike Force <debian-x@lists.debian.org> | 274 Debian Perl Group <pkg-perl-maintainers@lists.alioth.debian.org> | 261 Daniel Baumann <daniel@debian.org> | 255 (5 rows)
Or... Is there a package, whose sha256-sum contains Meikes birthday?
debian_packages=> select package, sha256 from packages where sha256 like '%261081%'; package | sha256 ------------------------------+------------------------------------------------------------------- libobject-realize-later-perl | 5188126108146fc0b1328473125687fb388bd8d87f460e335952dbb1b66aa3d1 (1 row)
And another goodie (We needed to alter the database several times, to make this package fit into it): Which is the package with the longest name?
debian_packages=> select package, length(package) from packages order by length(package) desc limit 1; package | length ---------------------------------------------------------+-------- libmaypole-plugin-authentication-usersessioncookie-perl | 55 (1 row)
A lot of fun... and a lot of interesting facts to be discovered...
postet at 13:09 into [Debian] permanent link