Investigation

Running sssd with full debug output: sssd -d 0x0fffff -i, gives lots of output, and I suspect the following snippet of containing hints as to the cause of the problem:

(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [ad_set_search_bases] (0x0100): Search base not set. SSSD will attempt to discover it later, when connecting to the LDAP server.
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [sdap_id_setup_tasks] (0x0400): Setting up cleanup task for mydomain.com
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [be_ptask_create] (0x0400): Periodic task [Cleanup of mydomain.com] was created
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [be_ptask_schedule] (0x0400): Task [Cleanup of mydomain.com]: scheduling task 10 seconds from now [1390897344]
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [sysdb_idmap_get_mappings] (0x4000): cn=id_mappings,cn=mydomain.com,cn=sysdb
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [ldb] (0x4000): Added timed event "ltdb_callback": 0x17ac750

(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [ldb] (0x4000): Added timed event "ltdb_timeout": 0x17ac810

(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [ldb] (0x4000): Running timer event 0x17ac750 "ltdb_callback"

(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [ldb] (0x4000): Destroying timer event 0x17ac810 "ltdb_timeout"

(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [ldb] (0x4000): Ending timer event 0x17ac750 "ltdb_callback"
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [sdap_idmap_init] (0x0100): Initializing [1] domains for ID-mapping
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [sdap_idmap_add_domain] (0x0020): Could not add domain [mydomain.com] to the map: [11]
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [sdap_idmap_init] (0x0020): Could not add domain [mydomain.com][S-1-5-21-4050310001-4028088272-1611817660][3791] to ID map: [Input/output error]
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [load_backend_module] (0x0010): Error (5) in module (ad) initialization (sssm_ad_id_init)!
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [be_process_init] (0x0010): fatal error initializing data providers
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [be_ptask_destructor] (0x0400): Terminating periodic task [Cleanup of mydomain.com]
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [sbus_remove_watch] (0x2000): 0x1793f90/0x1795410
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [remove_socket_symlink] (0x4000): The symlink points to [/var/lib/sss/pipes/private/sbus-dp_mydomain.com.11985]
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [remove_socket_symlink] (0x4000): The path including our pid is [/var/lib/sss/pipes/private/sbus-dp_mydomain.com.11985]
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [remove_socket_symlink] (0x4000): Removed the symlink
(Tue Jan 28 09:22:14 2014) [sssd[be[mydomain.com]]] [main] (0x0010): Could not initialize backend [5]
(Tue Jan 28 09:22:14 2014) [sssd] [sbus_remove_watch] (0x2000): 0xe8af50/0xe84a30
(Tue Jan 28 09:22:14 2014) [sssd] [sbus_remove_watch] (0x2000): 0xe8af50/0xe800e0
(Tue Jan 28 09:22:14 2014) [sssd] [sss_child_handler] (0x2000): waitpid failed [10]: No child processes
(Tue Jan 28 09:22:14 2014) [sssd] [mt_svc_exit_handler] (0x0040): Child [mydomain.com] exited with code [3]
(Tue Jan 28 09:22:14 2014) [sssd] [sbus_dispatch] (0x4000): dbus conn: 0xe8a7c0
(Tue Jan 28 09:22:14 2014) [sssd] [sbus_dispatch] (0x0080): Connection is not open for dispatching.

So my guess is that there is a function load_backend_module that calls ssm_ad_id_init, which fails because sdap_idmap_add_domain cannot add my domain to the id mapping.

Let's fetch the source: for i in sssd sssd-ad sssd-ldap ; do apt-get source $i ; done and look at it. In particular, look at src/providers/ldap/sdap_idmap.c.

The function sdap_idmap_init returns Input/output error, which is strerror's description of EIO. That value was set by sdap_idmap_add_domain:

errno_t
sdap_idmap_add_domain(struct sdap_idmap_ctx *idmap_ctx,
                      const char *dom_name,
                      const char *dom_sid,
                      id_t slice)
{
    errno_t ret;
    struct sss_idmap_range range;
    enum idmap_error_code err;
    id_t idmap_upper;
    bool external_mapping = true;

    ret = sss_idmap_ctx_get_upper(idmap_ctx->map, &idmap_upper);
    if (ret != IDMAP_SUCCESS) {
        DEBUG(SSSDBG_CRIT_FAILURE,
              ("Failed to get upper bound of available ID range.\n"));
        ret = EIO;
        goto done;
    }

    if (dp_opt_get_bool(idmap_ctx->id_ctx->opts->basic, SDAP_ID_MAPPING)) {
        external_mapping = false;
        ret = sss_idmap_calculate_range(idmap_ctx->map, dom_sid, &slice, &range);
        if (ret != IDMAP_SUCCESS) {
            DEBUG(SSSDBG_CRIT_FAILURE,
                  ("Failed to calculate range for domain [%s]: [%d]\n", dom_name,
                   ret));
            ret = EIO;
            goto done;
        }
        DEBUG(SSSDBG_TRACE_LIBS,
              ("Adding domain [%s] as slice [%"SPRIid"]\n", dom_sid, slice));

        if (range.max > idmap_upper) {
            /* This should never happen */
            DEBUG(SSSDBG_CRIT_FAILURE,
                  ("BUG: Range maximum exceeds the global maximum: "
                   "%u > %"SPRIid"\n", range.max, idmap_upper));
            ret = EINVAL;
            goto done;
        }
    } else {
        ret = sdap_idmap_get_configured_external_range(idmap_ctx, &range);
        if (ret != EOK) {
            DEBUG(SSSDBG_OP_FAILURE,
                  ("sdap_idmap_get_configured_external_range failed.\n"));
            return ret;
        }
    }

    /* Add this domain to the map */
    err = sss_idmap_add_domain_ex(idmap_ctx->map, dom_name, dom_sid, &range,
                                  NULL, 0, external_mapping);
    if (err != IDMAP_SUCCESS) {
        DEBUG(SSSDBG_CRIT_FAILURE,
              ("Could not add domain [%s] to the map: [%d]\n",
               dom_name, err));
        ret = EIO;
        goto done;
    }

    /* If algorithmic mapping is used add this domain to the SYSDB cache so it
     * will survive reboot */
    if (!external_mapping) {
        ret = sysdb_idmap_store_mapping(idmap_ctx->id_ctx->be->domain->sysdb,
                                        idmap_ctx->id_ctx->be->domain,
                                        dom_name, dom_sid,
                                        slice);
        if (ret != EOK) {
            DEBUG(SSSDBG_OP_FAILURE, ("sysdb_idmap_store_mapping failed.\n"));
            goto done;
        }
    }

done:
    return ret;
}
    

So it returns EIO in three possible cases. From the logging, the third is the matter: sss_idmap_add_domain_ex returns failure. It returns [11], of enum idmap_error_code, defined in src/lib/idmap/sss_idmap.h. That would be IDMAP_COLLISION. IDMAP_COLLISION is only returned from dom_check_collision, which returns that value for either of four reasons.

So we add some debug output (don't forget to #include util.h), change the changelog, install the build dependencies, and dpkg-buildpackage... Then install it: dpkg -i ../*.deb. That's a bit ugly: forgot to disable sssd, and now it hangs during update. But Ctrl-C and the installation half finishes. And we do get our output: IDMAP Collision because name and SID are inconsistent. Further debug output tells us that the domains being compared are mydomain.com and mydomain.com. Apparently, their SIDs are different. The one being added is actually set, the other, already existing one, is NULL.

Installing all the debs on every try gets hairy though. The library we're modifying is in libsss_idmap.so.0.3.0, which is in package libsss-idmap0_1.11.3-3_amd64.deb.

Now we know where in the code the problem surfaces. But not what the problem is. Should those SIDs be identical? Should only one exist? https://fedorahosted.org/sssd/wiki/DesignDocs