January 21, 2026

5 minutes

AI in IT

I used Cursor as my copilot helping me deliver a complex multi-vendor unstructured data migration project. Here's what I learned about AI-assisted troubleshooting and systems administration—and why it changes the math on what a single engineer can deliver.

The Engagement

This was a complex unstructured data migration and infrastructure project for a well-known organization, one of hundreds of migrations I have helped deliver in my career. The environment:

A high-performance parallel filesystem with native ILM and multi-protocol NAS subsystems (the kind you see in broadcast media, sports production, post-production houses)
1+ PB of archive data tiered to cloud storage
SMB access for editing workstations
Active/Active controllers

Before kickoff, I voiced my concern to all parties engaged on this endeavour about the hidden risks in these types of environments, but I failed to convince most of the difficulty and complexity we would encounter. This project, it turns out, was by far the most difficult migration project I ever worked on. Nearly a decade of legacy techops debt—acquired from initial deployment, through various tech refreshes, archive storage vendor migrations, product feature evolution with data in-place, and edge case bugs—left for me a very difficult minefield to navigate.

Over 90 days, I worked through dozens of distinct problems—ranging from quick command references to multi-day root cause investigations. Cursor was involved in almost every one. Below are detailed accounts of two hard problems I encountered as well as general takeaways from the engagement.

Hard Problem #1: Named Streams, xattrs, and atime auditing

Symptoms

A third-party scan tool was crawling through a 6.5 million file directory at 600 files per second. The SCSI device backing the metadata archive subsystem was pegged at 100% utilization. Nothing was failing. No errors. Just... extremely slow.

SCSI device: 100% utilization
Scan speed: 600 files/second (expected: 6,000+)
Metadata archive backlog: 14,000+ bundles pending

The system wasn't broken per se, but it was struggling. Traversing the filesystem was laggy and maintenance operations such as backups were failing.

The assumption was simple: it's just metadata operations—reading file attributes, building an inventory. Should be fast.

My gut told me that in this environment, that assumption was wrong, and the system was not behaving as intended. Metadata operations should only be fast, clean stat() calls—not logged nor audited, no state tracking overhead. Operations like open() can trigger atime updates that cascade through systems when you're scanning tens or hundreds of millions of files, but our scans shouldn't be opening any files.

Investigation

Finding the Activity

I started where I always start—looking at what the system was actually doing. Using cvadmin stats, I identified the metadata archive database as the I/O hotspot. The files getting hammered weren't user data, they were internal tracking databases.

But why would a metadata archive be so active during a metadata-only scan?

I know metadata operations in this filesystem to be fast—very fast, especially compared to other scale-out filesystems in the space. Crawling the filesystem for changes in files, directories, and their metadata should not cause filesystem access logs to grow nor mdarchive to behave as it was. My years of experience working with StorNext made me keenly aware of specific features and best practices that have undergone significant changes from the time this system went into production until now—such as named streams in the filesystem, which can be turned on or off dynamically, and the Samba services that ship with StorNext and the tunables that can interact with these features in various configurable ways. I knew where to look, I just didn't know what I was looking for.

._ Hidden Files

I enabled FSM debug tracing and watched the log fill with entries like this:

metadb_dump_insert_file: name '._PGR_5373.JPG'
metadb_dump_insert_file: name '._DSC_0452.JPG'
metadb_dump_insert_file: name '._20230516_JFV_104.JPG'

The trace showed hundreds of these entries every second, all ._* files—AppleDouble sidecars and not a single regular file. I knew ._* files to be the resource fork artifacts that macOS creates on non-HFS filesystems, but I did not know Samba can also make use of these sidecars to store xattr metadata.

I dug deeper into debug traces with other debug flags enabled until I got the system telling me the exact state changes it was having difficulty logging:

mmdupdate_inode_trans: ATIME-ONLY tp/0x7f922c816e40 inode/0x6780004ef3e4e3

ATIME-ONLY: Access time updates. The scan wasn't just reading file metadata attributes—it had to be opening files. Given the ._* metadb entries and atime updates logged, I was certain the system was not configured correctly.

I looked at the global and share-level Samba configuration parameters and confirmed that Samba was configured to hide ._* files from SMB clients. I was then certain that there must be something else in the Samba configuration causing smbd, on its own without interaction from smb clients, to open ._* files. I used Cursor to help me learn about and parse the various Samba VFS modules and parameters to get to the root cause.

Samba and StorNext Settings and Parameters

Samba's configuration for this share should have, on the surface, prevented any ._* file activity:

# Global settings
fruit:veto_appledouble = yes

# Share-level settings
veto files = /.StorNext/.__radf/._*/
delete veto files = yes
vfs objects = fruit streams_xattr snnas_radf override_masks acl_xattr snfs fileid
fruit:resource = stream
fruit:metadata = stream
fruit:encoding = native

This configuration says:

veto files: Hide ._* files from SMB clients—they can't see or request them
fruit:veto_appledouble: Don't even create AppleDouble files
fruit:resource = stream: Store macOS resource forks in NTFS-style named streams, not ._* files
fruit:metadata = stream: Store macOS metadata in named streams, not ._* files

SMB clients couldn't see the ._* files. The scan tool couldn't request them. So who was accessing them?

Root Cause Identified

Thanks to my nearly two decades of working with StorNext, I was familiar with the Named Streams feature, a setting that can be turned on or off for any filesystem. I asked Cursor to help me cross-reference the Samba share configuration with the underlying filesystem configuration, and here I found a mismatch:

System	Setting	Value
Filesystem	namedStreams	false
Samba	fruit:resource	stream
Samba	fruit:metadata	stream

The Samba fruit VFS module was configured to store macOS resource forks in NTFS-style named streams. But the underlying filesystem had named streams disabled.

When fruit can't use streams, Cursor told me, it falls back to AppleDouble ._* files.

So every benign stat() call from the scan tool triggered Samba to check for a corresponding ._* file, and if it found one, it would open it. That check updated the file's access time. The access time update generated a metadata archive bundle. The bundle got written to the SCSI device pegged at 100% utilization.

The Fix

Cursor recommended several options: remove the fruit VFS module from the Samba share, enable namedStreams on the filesystem, or switch the scan tool to NFS.

I chose something else entirely.

Changing how the system handled named streams and xattrs dynamically—on a project nearly completed—would have introduced massive hidden risk: Will the metadata stored in ._ files be lost? What remediation is necessary to preserve that metadata? How long will that process take? I knew enough to know the risk was too high for any of the paths Cursor recommended. I chose instead to work with the vendor directly on my own ideas.

By now, I was certain the mdarchive backlog was due to atime updates. So how would I prevent them from happening? Because this is a managed volume, I couldn't disable atime. Managed volumes also require mdarchive to be enabled, so disabling it wasn't an option either.

However, I could disable snaudit—managed volumes don't require that it be on. I disabled snaudit and that did it—the system stopped updating the mdarchive table during our scans. I could see the mdarchive "pending bundles" count finally start decreasing.

But I still had a backlog problem: 124,000+ bundles were already in the queue, and the SCSI devices pegged at 100% for over 48 hours after the last scan was completed.

Though how to clear this queue is undocumented, Cursor gathered enough insight from the man pages to suggest an mdarchive rebuild. I ran this by support, and though they were certain this would not help me, I tried the rebuild anyway, and it worked.

cvadmin (QUANTUM) > mdarchive status
Status: Updating (0 bundles pending apply)

The rebuild cleared the backlog entirely—something support did not know it would do.

Takeaways

This mismatch in configuration parameters had been silently accumulating ._* files for years. The system "worked"—until my expert intuition said this shouldn't be this slow.

Without Cursor, this would have been left unresolved and shrugged off as "it's just been like that." Instead, I traced it to the root cause in about two hours.

Problem #2: Recovering 16K Lost Files

Database and Inode State Mismatch

Separate from the performance issue, my retrieve scripts uncovered over 16K files in a state I had not seen before—the Storage Manager database entries mapping these files to objects had been cleared and fsfileinfo reported RMINFO state for copy 1 of these files:

Last Modification: 27-feb-2020 13:00:04
   Owner:             unknown            Location:        GONE (RMINFO DONE)
   Group:             games              Existing Copies: 0
   Access:            644                Target Copies:   1
                                         Expired Copies:  0
   Target Stub:       0 (KB)             Existing Stub:   0 (KB)
   File size:         3,408              Store:           MINTIME
   Affinity:          n/a                Reloc:           MINTIME
   Class:             archive_to_s3      Trunc:           MINTIME
   Alt Store Copy:    Disabled           Clean DB Info:   NO
   Media:      None
   Checksum:   N
   Encryption: N
   Object Ids: N

FS0000 21 0576485600 fsfileinfo completed: Command Successful.

Without the location information, Storage Manager did not know where in the bucket to look for these files, and we thought these files were lost. After checking with the customer, I got the green light to clear (remove) all 16K+ files in this state.

I moved onto other directories of the filesystem and found more files in the same state.

For whatever reason, rather than moving forward with the easy choice of deleting files, I chose to dig deeper, and found that I could use non-standard dm_* command line utilities to inspect copy 1 locations stored in the filesystem inode—separate and distinct from the database.

The dm_info utility showed me inode-stored file-to-object mappings that did not exist in the database:

copy: 1
  offset: 0 length: 870513 bytes: 55
    medium: B2B19A66-52EC-4047-8862-4B56AC71CCF7
    seg_time: 1563128410 [Sun Jul 14 13:20:10 2019]
    add_date: 1692236803 [Wed Aug 16 20:46:43 2023]
    oneup: 6404761
    version: 1
    seg_uuid: 01E4CA42-E7BB-4588-A854-CDC498480B35

In the example above, the medium UUID corresponds to an archive destination (in this case a bucket), and the seg_uuid corresponds to the object ID mapped to that file. For the files I spot-checked, I found these objects did in fact exist in the bucket and were, therefore, recoverable.

For the files I had yet to delete, Cursor suggested that running fsretrieve with the -c 1 flag would likely cause the command to ignore the missing database entries and instead use the inode-stored metadata, and it did. I used Cursor to quickly create some scripts to run this specific command on the remaining files affected by this issue.

For the previously deleted files, however, recovery was far from straightforward. While StorNext and Storage Manager both have their own recovery utilities, snrecover and fsrecover respectively, in this case I could not use either. Because this was a managed volume, snrecover would not recover the inode metadata from available backups, and because the database was missing archive copy information about these files, fsrecover also failed.

Nonetheless, I was certain most if not all files had been archived to S3 storage, and I had a list of all previously deleted files. Thanks to support, I was able to extract the inode metadata of these files from backups, and that gave me a manifest of files and object IDs to recover.

However, this retrieve process was also far from straightforward. The retrieve CLI tools and APIs of the system don't present an interface to do this manifest retrieve in any straightforward way. I decided to do away with all of the built-in functionality and use the AWS CLI directly with whatever Cursor recommended for parallelism. Cursor helped me do this every step of the way.

Working with Cursor for complex scripting

Like most creative environments, file and directory paths in this system were filled with spaces and special characters that are very hard to get right in scripting. Knowing this, I made sure to instruct Cursor to help me get this right with up-front planning and iterative testing.

Here is an example of paths I had to work with:

/stornext/Archive/'10 Season Work/Major Sport$$$ Event in '10/._PGR_5373.NEF

Spaces, apostrophes, and special characters—all from years of organic directory and file naming by users who never imagined their paths would have to survive scripting or automation.

Iterative Testing

My first attempt failed on apostrophes.

parallel -j 50 'aws s3 cp {1} {2}' :::: manifest.tsv

This became a 4-5 round debugging session with Cursor.

Basic quoting → Failed on apostrophes
bash -c with positional params → Failed on macOS zsh (different quoting semantics)
Added --quote flag → Still broke on spaces in dirname
bash -lc with proper escaping → Success

Each failure produced cryptic errors. Cursor helped me parse AWS CLI debug output, identify where shell expansion was breaking, and iterate until we had a command that handled every edge case:

parallel --colsep '\t' -j 50 --bar --no-run-if-empty --quote \
  bash -lc "mkdir -p {2//} && aws s3 cp \
    --endpoint-url https://s3.mystorage.com \
    {1} {2}" \
  :::: manifest.tsv

It looks objectively ugly, but it worked on all files with paths that would make any programmer pull their hair out.

I recovered about 2TB of data that, in prior projects without the use of AI, I would have made someone else's problem. Unfortunately for the customer, over 400 files we attempted to recover were gone. The prior consultants hired to assist in the migration from one archive bucket to another had not caught this issue and failed to migrate these files before that prior legacy system was decommissioned.

Sure, this work was technically out of scope, but I knew I could do it and that it wouldn't cost me more than a couple of days of work at most.

AI enabled me to deliver a higher quality of service in less time.

Quick Wins with AI

Not every problem needs a deep investigation. Across 24 sessions, many were answered in under 5 minutes:

"How do I failover the primary metadata controller?" → Yeah, it had been a while for me since I had to do this, so I made sure to double check.

"What does this S3 error mean?" → errno 104 ECONNRESET = network hiccup, retry will likely succeed. But No controllers found = configuration problem, needs investigation.

"Give me an awk command to extract paths from error logs" → One-liner delivered, tested, done.

"Gratuitous ARP not being received" → Systematic checklist: LACP hash mode, Dynamic ARP Inspection on the switch, bonding mode, kernel sysctl settings.

The pattern is clear: when you have domain knowledge but need specific command syntax or error interpretation, AI delivers instant leverage. If you know what the question is, AI can most likely get you an answer.

I found myself coming to Cursor for help with any roadblock I encountered where I didn't have an immediate answer—such as needing additional users and access keys on buckets for my custom retrieval scripts. Cursor also helped me upfront with document parsing, developing Python tools for me to convert the vendor-supplied man pages and documentation from readily available PDF files to markdown files that are easier to grep/tokenize in Cursor.

Engineering Leadership Insight

Four observations from this engagement:

AI compresses diagnostic cycles, not decision-making

Every fix required human judgment about risk, timing, and organizational impact. AI provided options and explained trade-offs. I made the calls.

The metadata archive fix, for example, had multiple possible solutions. Cursor laid them out. I chose based on factors it couldn't know—project status, objectives, a nuanced understanding of risk, and long-term architecture direction.

Documentation is king

The filesystem's man pages run 70+ pages. The administration guide is hundreds more. Cursor searched, cross-referenced, and surfaced relevant commands in seconds.

This is the actual value of "AI reading your docs"—not replacing the docs, but using them to answer complex questions that you would otherwise need a senior engineer to answer.

Legacy debt surfaces under load

The ._* file problem existed for years. The namedStreams mismatch was there from day one. Normal operations masked the issue; a scan revealed it.

AI helped me trace a symptom to a root cause that predated the current team. That's not something you can do by asking the people who built it—they're gone.

AI makes it possible to deliver 100% of the work

This is the insight I keep coming back to.

Many of the issues I used AI to troubleshoot would have, in the past, been ignored, explained away, covered up, or made someone else's problem—the customer's, the vendor's, the next consultant's. Because finishing that last 1% of any project can be just as hard as the first 99%.

AI's value here is clear: a team of one full-stack, cross-domain expert can deliver a complex multi-vendor, multi-technology project faster and better than a team of many domain experts without AI.

The 16K files are a clear example of the 1% problem. Without AI, my cross-domain expertise, and commitment to excellence, those files would likely have been deemed lost and explained away.

The Bottom Line

The value of AI in infrastructure engineering isn't writing scripts. It's having a tireless collaborator who can:

Parse vendor documentation at search speed
Hold context across multi-hour debugging sessions
Generate and iterate on solutions without ego
Answer complex questions not explicitly addressed in administration guides

Many enterprise IT teams are still struggling with the question of how and where to use AI, often going to X, Y, or Z vendors who have an AI-powered SaaS tool to solve some niche problem. I strongly believe most of those initiatives are doomed from the start and institutions should instead empower and encourage individual contributors to bring these tools and ideas to the table.

For me, that means Cursor or similar CLI-powered tools such as Claude Code, Codex, Open Code, or Gemini CLI are now part of every engagement—not as a replacement for expertise, but as an amplifier of it.

‍

Jose is the founder of nobul.tech, where he works on experiments, ideas, tools, and thoughts that don't need a pitch deck.

You Don’t Need a Platform, You Need a Spine

August 10, 2025

16 min read

Everything’s a Graph, and That’s Okay

August 9, 2025

8 min read

The Beauty of Subtraction in Tech

August 8, 2025

12 min read