Name: Anonymous 2011-10-26 8:00
Recently I downloaded the visual novel "Kara no Shoujo". It wasn't as exciting as I had hoped so I thought: "I'll just extract the pngs and mp3s and ignore the story."
And there actually is a unpacker for the xp3-format. (http://www.insani.org/tools/) However, xp3 is one of those formats that are different from game to game. So it did not work on the Kara no Shoujo archives.
But I didn't give up yet. Here is a relevant part of the unpacker:
The code fails at "assert(indexoffset+compsize+17 == filesize)". The first bytes of the file are:
00000000 58 50 33 0d 0a 20 0a 1a 8b 67 01 17 00 00 00 00 |XP3.. ...g......|
00000010 00 00 00 01 00 00 00 80 00 00 00 00 00 00 00 00 |................|
00000020 95 d1 d8 32 00 00 00 00 89 50 4e 47 0a 1a 0a 00 |...2.....PNG....|
compsize is 0 and origsize is 0x32d8d195. LONG_LENGTH means 8 bytes btw.
The file size is 0x32da1715 which is slightly more than the number at offset 0x20: 0x32d8d195. (little endian, so reverse order) Now it would help to find the index but it is read from a zlib-compressed area, so 'grep'ping for the phrase 'File' (which appears in the index) doesn't help. I wrote this:
$ python find_data.py karanoshojo.xp3 0
(257, 4)
(343, 1197)
(951, 215)
(1090, 2831)
(1182, 3144)
(3868, 2260)
(3932, 3142)
(4024, 3144)
(6710, 11220)
(7085, 330)
(7239, 2878)
(10107, 38199)
I also searched far beyond 10107 but didn't find anything. I couldn't scan the whole 800MB of course. Then I tried:
$ python find_data.py karanoshojo.xp3 0x32d8d195
(17, 526522)
Obviously the index is at 0x32d8d195+17, but the unpacker assumes it's in the header. I added
arcfile.seek(origsize+17)
uncompressed = arcfile.read().decode('zlib')
and deactivated a few failing asserts and the extraction worked perfectly. Backgrounds, Sounds,... everything was there except for the text.
Does anyone know why the text is missing? I scanned the exe for zipped content too, but nothing.
And there actually is a unpacker for the xp3-format. (http://www.insani.org/tools/) However, xp3 is one of those formats that are different from game to game. So it did not work on the Kara no Shoujo archives.
But I didn't give up yet. Here is a relevant part of the unpacker:
# Read header and index structure
assert_string(arcfile,'XP3\x0D\x0A \x0A\x1A\x8B\x67\x01',ERROR_ABORT)
indexoffset = read_unsigned(arcfile,LONG_LENGTH)
assert (indexoffset < filesize)
arcfile.seek(indexoffset)
assert_string(arcfile,'\x01',ERROR_WARNING)
compsize = read_unsigned(arcfile,LONG_LENGTH)
origsize = read_unsigned(arcfile,LONG_LENGTH)
assert (indexoffset+compsize+17 == filesize)
uncompressed = arcfile.read(compsize).decode('zlib')
assert (len(uncompressed) == origsize)
indexbuffer = StringIO(uncompressed)The code fails at "assert(indexoffset+compsize+17 == filesize)". The first bytes of the file are:
00000000 58 50 33 0d 0a 20 0a 1a 8b 67 01 17 00 00 00 00 |XP3.. ...g......|
00000010 00 00 00 01 00 00 00 80 00 00 00 00 00 00 00 00 |................|
00000020 95 d1 d8 32 00 00 00 00 89 50 4e 47 0a 1a 0a 00 |...2.....PNG....|
compsize is 0 and origsize is 0x32d8d195. LONG_LENGTH means 8 bytes btw.
The file size is 0x32da1715 which is slightly more than the number at offset 0x20: 0x32d8d195. (little endian, so reverse order) Now it would help to find the index but it is read from a zlib-compressed area, so 'grep'ping for the phrase 'File' (which appears in the index) doesn't help. I wrote this:
import sys, zlib
assert len(sys.argv) == 3
e = open(sys.argv[1], "rb")
for possible_offset in range(11000):
e.seek(int(sys.argv[2], 16)+possible_offset, 0)
try:
u = zlib.decompress(e.read(100000))
print(possible_offset, len(u))
#print(u)
except: pass$ python find_data.py karanoshojo.xp3 0
(257, 4)
(343, 1197)
(951, 215)
(1090, 2831)
(1182, 3144)
(3868, 2260)
(3932, 3142)
(4024, 3144)
(6710, 11220)
(7085, 330)
(7239, 2878)
(10107, 38199)
I also searched far beyond 10107 but didn't find anything. I couldn't scan the whole 800MB of course. Then I tried:
$ python find_data.py karanoshojo.xp3 0x32d8d195
(17, 526522)
Obviously the index is at 0x32d8d195+17, but the unpacker assumes it's in the header. I added
arcfile.seek(origsize+17)
uncompressed = arcfile.read().decode('zlib')
and deactivated a few failing asserts and the extraction worked perfectly. Backgrounds, Sounds,... everything was there except for the text.
Does anyone know why the text is missing? I scanned the exe for zipped content too, but nothing.