Adium

Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#8692 closed defect (fixed)

During start up invalid memory is accessed, crashes with EXC_BAD_ACCESS

Reported by: tgos Owned by: evands
Milestone: Adium 1.2.2 Component: Adium Core
Version: Severity: normal
Keywords: Cc:
Patch Status:

Description

I just started Adium from within GDB with set env DYLD_INSERT_LIBRARIES /usr/lib/libgmalloc.B.dylib set and it crashes during launch.

2007-12-28 14:26:29.370 Adium[6448] *** hostReachabilityChangedCallback got flags: -r----- 

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x74067fd8
0x03273b0a in adium_source_remove (tag=10) at Adium/Plugins/Purple Service/adiumPurpleEventloop.m:131
131                     if (sourceInfo->timer_tag == tag) {
(gdb) 

sourceInfo itself is not NULL here, but it also does not seem to point towards valid memory.

(gdb) print sourceInfo
$1 = (struct SourceInfo *) 0x74067fcc
(gdb) print *sourceInfo
Cannot access memory at address 0x74067fcc

Backtrace is

#0  0x03273b0a in adium_source_remove (tag=10) at Adium/Plugins/Purple Service/adiumPurpleEventloop.m:131
#1  0x06d13926 in purple_input_remove ()
#2  0x06d2a2f5 in purple_proxy_connect_data_disconnect ()
#3  0x06d2a444 in purple_proxy_connect_data_connected ()
#4  0x06d2a57a in socket_ready_cb ()
#5  0x0327434b in socketCallback (s=0x74069fa8, callbackType=kCFSocketWriteCallBack, address=0x0, data=0x0, infoVoid=0x74067fcc) at Adium/Plugins/Purple Service/adiumPurpleEventloop.m:345
#6  0x90842f8d in __CFSocketDoCallback ()
#7  0x90842cb4 in __CFSocketPerformV0 ()
#8  0x9082cfe2 in CFRunLoopRunSpecific ()
#9  0x9082ca56 in CFRunLoopRunInMode ()
#10 0x92df0878 in RunCurrentEventLoopInMode ()
#11 0x92deff82 in ReceiveNextEventCommon ()
#12 0x92defdd9 in BlockUntilNextEventMatchingListInMode ()
#13 0x93276485 in _DPSNextEvent ()
#14 0x93276076 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#15 0x9326fdfb in -[NSApplication run] ()
#16 0x93263d4f in NSApplicationMain ()
#17 0x000037ee in main (argc=1, argv=0xbffff5c4) at Adium/Source/main.m:3

SVN Revision is 22048

sourceInfo is retrieved with that call

struct SourceInfo *sourceInfo = (struct SourceInfo*)
	[[sourceInfoDict objectForKey:[NSNumber numberWithUnsignedInt:tag]] pointerValue];

Change History (21)

comment:1 Changed 12 years ago by tgos

Some more GDB debugging

(gdb) po sourceInfoDict
<NSCFDictionary 0x2b1fafd4>{
11 = <cc5f1374 >;
12 = <cc8fea7d >;
14 = <cc6ffd7d >;
8 = <cc6f3b6a >;
9 = <cc3ff773 >;
10 = <cc7f0674 >;
}

Assuming that all the cc-entries are memory pointers, none of them is valid, they all fail if I try to access them. Under some strang condition, it must be possible to free the SourceInfo struct without removing it from the dictionary, but I fail to see how this can happen. According to the code, it is only freed if it never was added or after it got removed. And once it is freed, the socket is also always invalidated and released, hence the socket context can't keep a reference to it any longer either.

comment:2 Changed 12 years ago by tgos

Okay, it is reproducible on each start. Just wanted to let you know.

comment:3 Changed 12 years ago by tgos

Okay, it's accessed after being free'd. Somehow it's added twice to the dictionary with different tags (once 10 and once 17) and the 17 case it free'd, now it comes back with tag 10 and then crashes on access.

comment:4 Changed 12 years ago by tgos

Okay, the following is just a guess from what I found out:

The issue could be that the retain count of socket never reaches 1 before it is passed to CFRelease, hence the socket keeps existing and will be returned next time you ask for a socket for the same file descriptor. In that case this socket will still have the same old context value.

comment:5 Changed 12 years ago by tgos

The first time the socket is created

	CFSocketRef socket = CFSocketCreateWithNative(kCFAllocatorDefault,
	 fd,
	 (kCFSocketReadCallBack | kCFSocketWriteCallBack),
	 socketCallback,									 &context);
												  
	NSLog(@"Created a new socket, retain count is %u", CFGetRetainCount(socket));

it already has a retain count of 2, so a SocketObject for that socket has been created before.

comment:6 Changed 12 years ago by David Smith

Milestone: Adium X 1.2

Setting to 1.2 so this doesn't get lost; nice investigative work :D

comment:7 Changed 12 years ago by tgos

I think it could be related to Bonjour messaging (will disable it and see if that helps). It looks like the following happens:

  • A CFSocket Object is created for a native File Descriptor.
  • The method is aware, that such an object might have been created before (and can be returned by the method, which also increases its retain count by one), so if it was created before, it will dump the SourceInfo object it has created a couple of lines above, release the object once to keep retain count low and instead take the SourceInfo struct that is in the CFSocket Object's context.
  • If certain conditions are set, it will release the CFSocket object (assuming it goes away) and then free the SourceInfo struct in memory (not needed any longer).

So far this works well, but it won't if one condition is true: If the first time the socket is created it already has a retain count higher than 1. In that case the CFRelease on the CFSocket object will not destroy it, however the SourceInfo struct it points to is still free'd. Now next time a CFSocket Object shall be created for that File Descriptor, the system will return the old one that has not been released before. This is no problem. The problem is that this old one has still the old socket context set (the one that has already been free'd). Now the method does as described above, it dumps the SourceInfo and takes the one from the CFSocket returned, however, this one has already been free'd.

Accesing it without Malloc Debug enabled might work nicely, or it might crash or it might show any other unpredictable result (too many factors are influencing it). With Malloc Debug set, it will crash for sure, revealing the problem.

There are only two other places where CFSockets are created in the code. One place is for incoming sockets and async sockets and the other place is within the Bonjour code.

Also the fix is not as trivial as it seems. The probem with the fix is, if the CFSocket is really returned from elsewhere, it's unsure whether it's a good idea to just overwrite it's context info with the local SourceInfo struct, since some other function might be relying on some other context info type being there.

comment:8 Changed 12 years ago by Evan Schoenberg

But could the system return the same file descriptor for use by Bonjour and by libpurple?

comment:9 Changed 12 years ago by Evan Schoenberg

Does this remain the case in 1.2b7 (or with HEAD, your choice)?

comment:10 Changed 12 years ago by tgos

Disabling Bonjour made no difference. And no, it's not the same socket ID ever used before. The strange thing is that the CFSocketRef has a retain-count of 2 after it was freshly created for the socket for the first time. It should have one of 1 I guess. In the callback function it has a retain count of 3, which is okay, because it is retained by the runloop source. So it won't go away while being released in the callback function. However, since it gets invalidated there, it should be released after the function, but it's not. In the function it has a count of 3, it is released once there, so it has a count of 2. After the callback function went through and the source is invalidated, it gets a count of 1; but it won't be released unless it gets a count of 0. Next time a CFSocket is created for this socket, the same socket being created before will be retained and returned. So the same socket keeps hanging around forever.

I wonder why the newly created object has a retain count of 2, that makes no sense. I fonud a mail thread on Apple's mailing list, where someone reported the same thing, that his newly created CFSocket has a retain count of 2. I'm close to believing that this is a bug of MacOS X (maybe fixed under Leopard, I have only Tiger for testing).

comment:11 Changed 12 years ago by Evan Schoenberg

So do you see this crash when running without libgmalloc? (I can't get Adium to load at all using libgmalloc because the Security framework is unbearably slow when libgmalloc is loaded).

The retain count could be 2 because internal tracking manages all valid sockets so that it can return the existing instance if one exists when creating a CFSocket. As long as we have properly balanced retain/release cycles, we shouldn't have to worry about the absolute retain count.

comment:12 Changed 12 years ago by Evan Schoenberg

Milestone: Adium X 1.2Adium X 1.2.1

comment:13 Changed 12 years ago by tgos

Yes, it is extreme slow with libmalloc, I have to wait about 15 minutes till the crash and I'm on a Mac Pro ;) Also having at least 2 GB of RAM is recommend (as virtual memory is never returned to the system with libgmalloc loaded)

It is actually balanced, that is not the problem. The problem is that it has a context that is malloc'ed. Now if you release it, you also free the context (manually), but since the object never goes away, the next time you create an object for the same socket, you receive the old object again, just with the retain count being increased. Now you try to set a new context, but this fails (as is documented in the comments, just have a look), thus you keep using the old one... but the old one has been free'd before, the memory is not valid anymore and if it does not crash without libgmalloc, it's plain luck.

comment:14 Changed 12 years ago by Evan Schoenberg

Milestone: Adium X 1.2.1Adium X 1.2.2

comment:15 Changed 12 years ago by Evan Schoenberg

It seems like if we create a reference counting system for SourceInfo and tell the CFSocketContext or CFRunLoopTimerContext to make use of retain/release methods for it, this should be avoided as an even remote possibility.

comment:16 Changed 12 years ago by Evan Schoenberg

Unfortunately, the retain and release functions passed as context are never called... I don't know why.

comment:17 Changed 12 years ago by Evan Schoenberg

Milestone: Adium X 1.2.2Adium X 1.3
Owner: changed from nobody to Evan Schoenberg
Status: newassigned

Well, never called if we define random functions; works okay using CFRetain and CFRelease.

comment:18 Changed 12 years ago by Evan Schoenberg

Resolution: fixed
Status: assignedclosed

(In [22436]) Use a NSObject subclass rather than a malloc'd struct for SourceInfo, allowing real reference counting to take place easily. Pass CFRetain/CFRelease in the context information so that as the SourceInfo is passed around it's properly managed. This means we don't have to do manual tracking of the memory... and should fix any problems with the socket being retained elsewhere such that we free the SourceInfo before it should be freed.

Fixes #8692, so far as I can tell; thanks to tgos for detailed bug reporting and investigating of this issue which could cause seemingly random crashes. tgos, I can no longer reproduce the crash after this change; please let us know if that's true for you, as well.

comment:19 Changed 12 years ago by tgos

Hi evands,

Thanks a lot for the fix, this works great (good work!) and the solution is even very sound IMHO (using an object here instead of a C struct is a very good code change and having the system auto-release the object for you seems like a perfect solution). I really hope this solves some of the crash issues some people experienced (and that are impossible to debug, since the stack trace does not reveal a lot in case of memory corruption). So consider the fix as confirmed, the bug can stay closed.

Unfortunately I found another reproducibale crasher when using Malloc Debug, not related to this problem. I'm just analyzing it in the debugger and will file another bug in the near future.

Keep up the good work!

comment:20 Changed 12 years ago by Evan Schoenberg

(In [22447]) Merged [22436]: Use a NSObject subclass rather than a malloc'd struct for SourceInfo, allowing real reference counting to take place easily. Pass CFRetain/CFRelease in the context information so that as the SourceInfo is passed around it's properly managed. This means we don't have to do manual tracking of the memory... and should fix any problems with the socket being retained elsewhere such that we free the SourceInfo before it should be freed.

Fixes #8692

comment:21 Changed 12 years ago by Evan Schoenberg

Milestone: Adium X 1.3Adium X 1.2.2
Note: See TracTickets for help on using tickets.