DMA SD card reads on the PIC32MZ

Why read from SD card in DMA mode at all?

Warning: This post is going to be long because it's a complex topic and I've included lots of code in it.

Turns out the "next time" from last post was today, the same day. An entire day spent on writing about DMA and airing my ignorance online. Yay!

For the last several weeks / months / eternities I've been working on getting the DMA module to work with the SPI peripheral so that I can read from the SD card using DMA. My initial motivation for doing this was that when I had a few (i.e. too many) ISRs in my main code the SPI module would sometimes seem to get confused at all these interruptions and just stop working, crashing my program. However, DMA has also resulted in a nice large speed boost to SD reading, which is very useful. I've been working on this for ages in my spare time and I still don't understand all of it but today I'm going to go over my code and my findings. It works in my MP3 player and in large block transfers but I can't get over the feeling of mistrust I have for it so YMMV.

Reading from an SD card using DMA

Before we even get to using DMA, let's remind ourselves how the SD card works in SPI mode. For block reads, there is single block read mode and multi block read mode. Single block read mode needs to send the command to the SD card each time it wants to read a block. Multi block read sends a command once and then reads however many blocks it wants. Naturally, multi block read results in much faster transfer times than a single block read does. Don't forget that before the following flow chart, the SD card needs to be told to set up the multi block transfer first. This code can be found in mmcpic32_dma.c in the disk_read() function. Once we've sent this command (and sector number and dummy CRC etc), the actual reading of the data works like this (click to enlarge):

PIC32MZ - SPI SD multi block read flowchart

(Shout-out to the website https://www.draw.io for providing a way to make flowcharts easily online, though with my drawing skills maybe they don't want people to know I used their site :))

So as you can (hopefully) see there are three phases to each 512-byte sector read:

  • Send out 0xFF via SPI until the SD card replies with 0xFE
  • Send out 0xFF and receive a data byte for each of the 512 bytes in the sector
  • Send out 2 x0FF and receive the CRC to finish the sector read

After that, the process repeats until you have read as many sectors as are required. So how can we adapt this to make use of DMA? Well, let's think about what we need to do:

  • Send 0xFF to the SD card over SPI
  • Receive the data the SD sends us over SPI

In both of these cases, the PIC32MZ is the master and provides the clock signal to the SD card. This means the SD card cannot do anything unless we send it some data first. As you can see, there are two types of transactions here, the sending of the data and the receiving of the data. Assuming I'm using SPI Channel 2, without using DMA we would do this to send 0xFF to receive a byte of information from the SD card:

    SPI2BUF = 0xFF;
    while (SPI2STATbits.SPIRBE);
    data = SPI2BUF;

The overall flow of the program

OK, so one sending transaction and one receiving transaction means we will need to use two DMA channels. As I discussed last time, DMA channels need to be triggered by an Interrupt Request (IRQ), so what shall we choose? Again, let's think about the flow of this program:

  • Send 0xFF to SD card
  • Receive data in response

The SPI peripheral has both a transfer done (TX) and receive done (RX) IRQ that it generates, so this is perfect. I'm going to choose my two DMA channels as follows:

  • DMA Channel 0 is in charge of receiving data from the SPI buffer
  • DMA Channel 1 is in charge of sending data to the SPI buffer

This means that DMA Channel 0's start IRQ (SIRQ) will be SPI Channel 2's Receive Done IRQ (_SPI2_RX_VECTOR) and the source of DMA Channel 1's SIRQ will be SPI Channel 2's Transfer Done IRQ (_SPI2_TX_VECTOR). The _SPI2_RX_VECTOR is triggered whenever the SPI2 channel has finished receiving a byte of data, and the _SPI2_TX_VECTOR triggers whenever the SPI2 channel has finished sending a byte of data.
This additionally means that the source address for DMA Channel 0 is the SPI Buffer SPI2BUF, because we are reading from that and the destination address of DMA Channel 1 is SPI2BUF because we are sending to it.
We are going to send 1 byte at a time (so cell size is 1), because we are using SPI in 8-bit mode. Perhaps we could get even more speed gains in 32-bit mode but we're fast enough for the moment.
We want to generate an interrupt when the transfer is done and I've chosen to use Interrupt Priority 4, Sub-priority 1.
Finally, I've chosen to abort DMA transfers whenever there's an error on SPI channel 2.

Before we move on, let's clarify what the heck we've been talking about and see how this is going to operate (click to enlarge):

PIC32MZ - SPI SD DMA Flow

As should hopefully be clear thanks to that fantastic image, DMA Channel 0 and DMA Channel 1 are not talking to each other at all. DMA Channel 1 sends as many bytes as we tell it to to SPI Buffer 2 (SPI2BUF) until it's done and DMA Channel 0 receives as many bytes as we tell it to until it's done. This can be tricky to understand, so it bears further thought. When I send 0xFF to the SD card, what does it send in response? Due to the nature of SPI, it is sending the response to the last byte I sent it. Referring to the flow-chart above, when I'm waiting for the 0xFE token, I'm actually doing this:

    while (token != 0xFE)
    {
        SPI2BUF = 0xFF;
        while (SPI2STATbits.SPIRBE);
        token = SPI2BUF;
    }

When that's done, the SD card has already internally queued up the first byte of data to send to me, it just has no way of sending it to me. The next time I sent it an 0xFF, it will send me that queued up reply at the same time as I send it the 0xFF. What this means for my DMA approach is that when I send it 0xFF, the reply it sends me will be an answer to the previous 0xFF instruction. And then, because I've sent it 0xFF again it will have another byte of data prepared for me and will be waiting to send it. It will only be able to send me that data when I send it another 0xFF.

Secondly, looking at my fantastic picture of DMA data flow right above this, it becomes clear that we are going to make use of the SPI 2 Buffer. This means Enhanced Buffer Mode must be enabled for this to work at all. OK, theory out of the way, for now.
The happy news is that none of the above information ever changes, so we can set that all up once at the beginning of the program and never have to set it up again. I have done this in a function called SPI_DMA_init(), here's the code for it:

void SPI_DMA_init(void)
{
    DCH0SSA = virt_to_phys((void*)&SPI2BUF); // Source address
    DCH0ECONbits.CHSIRQ = _SPI2_RX_VECTOR;   // Trigger cell transfer event on SPI2 Receive IRQ
    DCH0ECONbits.CHAIRQ = _SPI2_FAULT_VECTOR;// Abort on SPI 2 error

    DCH0ECONbits.SIRQEN = 1;                 // Enable cell transfer event on IRQ
    DCH0ECONbits.AIRQEN = 1;                 // Enable cell transfer event on IRQ
    DCH0CONCLR = 1 << 4;                     // CHAEN = 0, turn off the abort enable
    DCH0CONSET = 3 << 16;                    // CHPRI = 3, set channel priority to 3

    DCH0SSIZ = 1;                            // Destination size is 1 byte
    DCH0CSIZ = 1;                            // Transfer 1 byte at a time

    DCH1DSA = virt_to_phys((void*)&SPI2BUF); // Destination address
    DCH1ECONbits.CHSIRQ = _SPI2_TX_VECTOR;   // Trigger cell transfer event on SPI2 Transmit IRQ    
    DCH1ECONbits.CHAIRQ = _SPI2_FAULT_VECTOR;// Abort on SPI 2 error

    DCH1ECONbits.SIRQEN = 1;                 // Enable cell transfer event on IRQ
    DCH1ECONbits.AIRQEN = 1;                 // Enable cell transfer event on IRQ    
    DCH1CONCLR = 1 << 4;                     // CHAEN = 0, turn off the abort enable
    DCH1CONSET = 2 << 16;                    // CHPRI = 2, set channel priority to 2

    DCH1CSIZ = 1;                            // Cell size
    DCH1DSIZ = 1;                            // Destination size

    IPC33CLR = 0b11111 << 16;                // Clear DMA1IP and DMA1IS bits
    IPC33SET = 0b10001 << 16;                // Interrupt Priority 4, Interrupt Sub-priority 1

    DMACONSET = 0x8000;                      // Enable DMA module if it hasn't been
}

You may be wondering why the Destination Size for channels 0 and 1 are set to 1. They can be set to the size of the actual transfer but the DMA module will see which one is bigger and use that anyway, so we can save having to repeat those two lines of code by doing it this way. As you'll see later, there's plenty more code to come.

Enough chat, plz give me teh codes tx

For the sake of a simpler explanation, I have made my multi block read function into a mini state machine. As a result, the code is fairly long and not as optimised as it could be but still yields good results. It looks like this:

static int rcvr_datablock_multiple(BYTE *buff, INT btr)
{
    unsigned char READ_DONE = 0;
    unsigned char READ_ERROR = 0;
    int DMA_sectors_left;
    char DMA_stage;
    int DMA_read_size;
    int DMA_bytes_left;
    BYTE crc[2];

    // Divide by 512 to get the number of 512-byte sectors to read
    DMA_sectors_left = btr >> 9;

    // Initialise the state machine
    READ_DONE = 0;
    DMA_BUSY = 0;
    DMA_stage = 0;
    READ_ERROR = 0;

    if (btr >= 512)
        DMA_read_size = 512; // Reading 512 bytes (one sector) at a time
    else
        DMA_read_size = btr; // Reading less than 512 bytes at a time

    // How many bytes do we need to read each time?
    DMA_bytes_left = btr;

    // Disable the SDO pin
    SPICONbits.DISSDO = 1;

    // Set the SDO pin to 1 so it will always output 0b11111111 (0xFF)
    SDO_PIN = 1;

    // Start waiting for the 0xFE token that precedes each sector read
    SPI_DMA_wait_token(buff, MAX_TOKEN_WAIT_BYTES);

    while (!READ_DONE)
    {
        while (DMA_BUSY)
        {
            // Could place a callback routine in here to do something while DMA is busy but haven't gotten around to that yet
        };

        switch (DMA_stage)
        {
            case 0: // Finished waiting for 0xFE token
            {
                if (DCH1SPTR > MAX_TOKEN_WAIT_BYTES) 
                {
                    // 0xFE was not found in [MAX_TOKEN_WAIT_BYTES] bytes, give up
                    READ_DONE = 1;
                    READ_ERROR = 1;
                }
                else
                {
                    if (DMA_bytes_left > DMA_read_size)
                    {
                        SPI_DMA_read(buff, DMA_read_size);
                        DMA_bytes_left -= DMA_read_size;
                    }
                    else
                    {
                        SPI_DMA_read(buff, DMA_bytes_left);
                        DMA_bytes_left = 0;
                    }

                    DMA_stage = 1;
                }
                break;
            }
            case 1: // Finished reading data
            {
                buff += DMA_read_size; // Increment buffer position by number of bytes read

                // Read the CRC data now
                SPI_DMA_read(crc, 2);        

                DMA_stage = 2;
                break;
            }
            case 2: // Finished reading CRC
            {
                DMA_sectors_left--;

                if (DMA_sectors_left > 0)
                {
                    // Restart the process
                    DMA_stage = 0;
                    SPI_DMA_wait_token(buff, 8192);
                }
                else
                {
                    READ_DONE = 1;
                }

                break;
            }
        }            
    }

    // Reset everything
    DMACON = 0;
    SPICONbits.DISSDO = 0;

    if (READ_ERROR)
        return 0;
    else
        return 1;                       
}

OK, I realise that's pretty long so let's break it up into sections.

Initialising all the variables used and starting the wait for the 0xFE token

// Divide by 512 to get the number of 512-byte sectors to read
DMA_sectors_left = btr >> 9;

// Initialise the state machine
READ_DONE = 0;
DMA_BUSY = 0;
DMA_stage = 0;
READ_ERROR = 0;

if (btr >= 512)
    DMA_read_size = 512; // Reading 512 bytes (one sector) at a time
else
    DMA_read_size = btr; // Reading less than 512 bytes at a time

// How many bytes do we need to read each time?
DMA_bytes_left = btr;

// Disable the SDO pin
SPICONbits.DISSDO = 1;

// Set the SDO pin to 1 so it will always output 0b11111111 (0xFF)
SDO_PIN = 1;

// Start waiting for the 0xFE token that precedes each sector read
SPI_DMA_wait_token(buff, MAX_TOKEN_WAIT_BYTES);

The one trick I used here, which I got from some microchip forum ages ago, is that as we need to output a constant 0xFF value, and this equates to 0b11111111 in binary, we can just disable the SDO pin of the SPI peripheral (by setting DISSDO to 1) and set the value of the port pin to 1 and it'll just output 1's. Pretty neat trick.

When I call SPI_DMA_wait_token() I decided to give it a maximum number of bytes to wait for, in my program it's 8192. The number of bytes it takes to receive 0xFE is not set and seems to differ between SD cards. Let's take a look at the code in that function:

// SPI_DMA_wait_token sends out 0xFF and waits for the 0xFE token to come in. It will send a maximum of [num_bytes] bytes
void SPI_DMA_wait_token(unsigned char *buffer, unsigned int num_bytes)
{
    DCH0CONCLR = 1 << 7;            // Disable DMA Channel 0
    DCH1CONCLR = 1 << 7;            // Disable DMA Channel 1

    DCH0DSA = virt_to_phys(buffer); // Destination address
    DCH0CONCLR = 1 << 4;            // CHAEN = 0
    DCH0CONSET = 3 << 16;           // CHPRI = 3
    DCH0INTCLR = 0xFF00FF;          // Clear all DMA Channel 0 interrupt enables and flags
    DCH0INTSET = 0x90000;           // Enable the CHBCIE interrupt for DMA channel 0
    DCH0DAT = 'þ';                  // Wait for 0xFE token
    DCH0ECONSET = 1 << 5;           // PATEN is enabled
    DCH0CONbits.CHPATLEN = 0;       // 8bit pattern

    DCH0DSIZ = num_bytes;           // Destination size is [num_bytes]

    DCH1SSIZ = num_bytes;           // Source size is [num_bytes]

    DCH1INTCLR = 0xFF00FF;          // Clear all DMA Channel 1 interrupt enables and flags
    DCH1SSA = virt_to_phys(buffer); // Source address

    IFS4CLR = 0b11 << 14;           // Clear SPI2RXIF and SPI2TXIF
    IFS4CLR = 1 << 6;               // Clear DMA1IF    
    IEC4SET = 1 << 6;               // Set DMA1IE

    DCH0CONSET = 1 << 7;            // Enable DMA Channel 0
    DCH1CONSET = 1 << 7;            // Enable DMA Channel 1
    DCH1ECONSET = 1 << 7;           // Set CFORCE on

    DMA_BUSY = 1;                   // DMA_BUSY flag set to 1 indicating active transfer
}

As always, before we configure anything, turn if off. In this case, clearing the CHEN bit of DCH0CON and DCH1CON does this fine. We do not want to disable the entire DMA module while we do this because we have no idea what the other 6 channels are doing.
The code is as discussed in my previous post, even the pattern matching which is looking for the 8-bit character 'þ' (0xFE). As we have previously set up most of the registers, we don't need to keep setting them up again. I routinely clear all the interrupt enables and flags in both DCH0INT and DCH1INT to avoid any potential problems they may cause. I am using the Channel Block Transfer Complete (CHBC) interrupt to tell me when the transfer is finished. The DMA_BUSY flag is my own internal flag that I wait for, to avoid hammering the DMA module's status bits and thus slowing down the transfer.

Once the 0xFE token is found, or MAX_TOKEN_WAIT_BYTES is exceeded, we will get to the next stage of the state machine.

Starting a 512-byte sector read

case 0: // Finished waiting for 0xFE token
{
    if (DCH1SPTR > MAX_TOKEN_WAIT_BYTES) 
    {
        // 0xFE was not found in [MAX_TOKEN_WAIT_BYTES] bytes, give up
        READ_DONE = 1;
        READ_ERROR = 1;
    }
    else
    {
        if (DMA_bytes_left > DMA_read_size)
        {
            SPI_DMA_read(buff, DMA_read_size);
            DMA_bytes_left -= DMA_read_size;
        }
        else
        {
            SPI_DMA_read(buff, DMA_bytes_left);
            DMA_bytes_left = 0;
        }

        DMA_stage = 1;
    }
    break;
}

The first thing I do here is check the DMA Channel 1 Source Pointer to see how many bytes it actually sent before receiving 0xFE. If this exceeds MAX_TOKEN_WAIT_BYTES, I abort the transfer. If not, I check to see how many bytes I need to read and then call SPI_DMA_read(). Let's take a look at the code behind it:

// SPI_DMA_read sends out 0xFF and reads in the returned data into [buffer], for a total of [num_bytes] byte transfers
void SPI_DMA_read(unsigned char *buffer, unsigned int num_bytes)
{
    DCH0CONCLR = 1 << 7;            // Disable DMA Channel 0
    DCH1CONCLR = 1 << 7;            // Disable DMA Channel 1

    DCH0DSA = virt_to_phys(buffer); // Destination address
    DCH0INTCLR = 0xFF00FF;          // All flag and ints off
    DCH0INTSET = 0x80000;           // CHBCIE = 1
    DCH0DAT = 0xFFFF;
    DCH0ECONCLR = 1 << 5;           // PATEN is disabled

    DCH0DSIZ = num_bytes;           // Source size

    DCH1SSIZ = num_bytes;           // Source size    

    DCH1INTCLR = 0xFF00FF;          // Clear all DMA Channel interrupt enables and flags
    DCH1SSA = virt_to_phys(buffer); // Source address

    IFS4CLR = 0b11 << 14;           // Clear SPI2RXIF and SPI2TXIF
    IFS4CLR = 1 << 6;               // Clear DMA1IF   
    IEC4SET = 1 << 6;               // Set DMA1IE

    DCH0CONSET = 1 << 7;            // Enable DMA Channel 0
    DCH1CONSET = 1 << 7;            // Enable DMA Channel 1
    DCH1ECONSET = 1 << 7;           // Set CFORCE on

    DMA_BUSY = 1;
}

This code is almost exactly the same as SPI_DMA_wait_token(). The only difference is that it disables pattern matching. They could easily be combined into one function, I've chosen to separate them for clarity as this is already a long and complicated subject.
Once the transfer is done, DMA_BUSY is again cleared and we move on to the next stage of the state machine, reading the 2-byte CRC.

Reading for the CRC

case 1: // Finished reading data
{
    buff += DMA_read_size; // Increment buffer position by number of bytes read

    // Read the CRC data now
    SPI_DMA_read(crc, 2);        

    DMA_stage = 2;
    break;
}

Nothing much to say here. The CRC is 2 bytes of data and it must be read before either finishing the transfer or waiting for the 0xFE token again.
Note: This could easily be combined into the last SPI_DMA_read() before reading CRC, but I've chosen not to do this for clarity.

Restarting the state machine if needed

case 2: // Finished reading CRC
{
    DMA_sectors_left--;

    if (DMA_sectors_left > 0)
    {
        // Restart the process
        DMA_stage = 0;
        SPI_DMA_wait_token(buff, MAX_TOKEN_WAIT_BYTES);
    }
    else
    {
        READ_DONE = 1;
    }

    break;
}

OK, we got the CRC. Do we have any more sectors left to read? If so, restart the wait for the 0xFE token again. In multi block reads, the first wait for the 0xFE token can requires hundreds or even thousands of 0xFF bytes to be sent while the SD card gets ready but subsequent waits for 0xFE usually require only a few to be sent. This is part of the reason multi block reads are much faster than single block ones.

Cleaning up

// Re-enable SDO
SPICONbits.DISSDO = 0;

if (READ_ERROR)
    return 0;
else
    return 1;                       

Don't forget to re-enable SDO or the SPI port is not going to work and you're going to spend hours debugging your code :)

As I've mentioned multiple times before, this code isn't perfect and it's still under development. It seems to be working so far but I wouldn't trust it in anything you truly care about. Again, the SD card SPI specification allows for a maximum of 25MHz, and my code is running at 50MHz so if you experience issues, that's the first place I'd look (set SPIBRG to 1 to get 25MHz).

Here's the code

Tags: code, DMA, SD, SPI

Direct Memory Access on the PIC32MZ

What is Direct Memory Access (DMA)?

Direct Memory Access is a way for the CPU to offload the work of data transfers either to or from a peripheral to an external module that can take care of the transfer in the background and let the CPU know when it's done.
DMA is, in my opinion, one of the most powerful things found on microcontrollers and a big differentiator between them. But how does it work and why would I want to use it?

In my LCD example I was sending an image of Tux to the LCD. What if I now want to expand that to read frames from an SD card and blit them to the LCD? As per my example, I'd do this:

while (!F_EOF)
{       
    // Read frame from disk
    f_read(&file, frame, FRAME_SIZE, &bytes_read); 

    // Set LCD window position and size
    LCD_set_address(0,0,FRAME_WIDTH - 1,FRAME_HEIGHT - 1);
    LCD_write_command(0x2C);
    PMADDR = 1;         

    // Send the pixel data to the LCD
    for (cnt = 0; cnt < FRAME_SIZE; cnt++)
    {
        PMP_wait();
        PMDIN = Tux[cnt];
    }
}

OK, that works fine but it also means the CPU is occupied 100% of the time in that for loop, just for sending pixels to the LCD. The Tux image was 210 x 248 pixels big, which is fairly big. What kind of frame rate could we expect from such an approach?
210 x 248 x 2 = 104,160 bytes per frame. If I want to do 30 frames per second, that translates to 3,124,800 bytes per second that I need to read from the SD card and send to the LCD. That might be possible, but just barely.

Let's expand this example to an actual LCD sized frame, 320 x 240 at 30 fps. This is 320 x 240 x 2 x 30 = 4,608,000 bytes per second to read and send to the LCD. Currently, even running the SD card at 50MHz I only get about 3.3MB/s reading, so this would be impossible. How could we speed this up? Well, the slow part is the SD code, the writing to the LCD is actually quite fast. So what I want is some way to spend less CPU time sending data to the LCD and devote more CPU time to reading from the SD card. In effect, I want some way to send the data to the LCD that doesn't involve me waiting around in a for loop. Well, that's what the DMA module can do for us. It can read and write from peripherals or ports in the background without using any CPU time, which means we are free to do other tasks, like reading from an SD card, while it is busy.

Let's take a look at the official block diagram of the DMA controller (click on it for an enlarged view):

PIC32MZ - I2C - PIC32MZ - DMA module

From the diagram you can see that the CPU and the DMA module are separate. The CPU can give the DMA module an instruction, like "Send the data in the frame array to the PMP module" and then DMA module will start doing that immediately. This instruction to the DMA module only takes a few lines of code to set up, and therefore is much, much faster than having to run through an entire for loop. Again, it also happens entirely in the background, without the CPU's involvement, which leaves us free to do whatever we want while it's busy.

To summarise: One of the biggest advantages of DMA is it frees up the CPU to do other work while large data transfers are happening.

Using DMA on the PIC32MZ

The PIC32MZ has eight of these DMA channels, and each of these can transfer up to 64kB at a time. It runs directly off of the System Clock (SYSCLK). There are also advanced features like chaining channels together, pattern matching and CRC generation. Today we're going to look at how to set up DMA transfers and use pattern matching too.

For starters, let's see what information we need to give the DMA controller:

  • The address of the source of the data
  • The size of the source of the data, in bytes
  • The cell size (how much data to transfer each time), in bytes
  • The address of the destination of the data
  • The size of the destination of the data, in bytes
  • The source of the "clock signal" or interrupt to tell it to move the data (covered later)

In theory, it's very simple but this is the PIC32MZ. It takes your cute "theory" and laughs at it before ripping out your heart and laughing at you. There are many things the documentation either doesn't mention or describes very poorly. The most important of them is this:

Any buffers you use **MUST be declared coherent or nothing will work**

Coherent? The memory on the PIC32 is a bit slow and peripherals use various tricks, like caching or making their own copies of data, to get better speed. The problem with this is that two devices accessing the same area of memory can end up reading different values from the same memory location due to this. In DMA, this would lead to disaster. The coherent memory space is one in which no caching or tricks are allowed and everything accesses the memory directly. This means it's slower but more reliable.

If you look in any Harmony example that use DMA, they declare their buffers like this:

unsigned short APP_MAKE_BUFFER_DMA_READY buffer[1024];

It turns out that APP_MAKE_BUFFER_DMA_READY is a friendly way of saying:

unsigned short __attribute__ ((coherent, aligned(16)))

Which tells the compiler to assign the array in coherent memory. So, where before you had to declare your buffer like this:

unsigned short read_buffer[1024]

You now need to declare it like this:

unsigned short __attribute__ ((coherent, aligned(16))) read_buffer[1024]

It looks confusing but it's not a huge change. Please remember that the 16 is the number of bits, so for an unsigned char you'd need to change that to 8.

If you prefer using heap memory and malloc() and free() the coherent memory version of those are __pic32_alloc_coherent() and __pic32_free_coherent().
Remember though, if you use heap memory that you need to specify a heap size under XC32 compiler options or it will not work.

OK, enough theory for now, let's take a look at some code to send a 16-bit buffer to the PMP:


volatile DMA_DONE_FLAG = 0; void LCD_blit(unsigned short *buffer, int num_bytes) { DCH0CONbits.CHEN = 0; // Turn off this channel DCH0SSA = virt_to_phys(buffer); // Move the data from the [buffer] array DCH0DSA = virt_to_phys((const void*)&PMDIN);// Move the data to the PMDIN register DCH0SSIZ = DMA_TRANSFER_SIZE; // Move num_bytes bytes of data in total DCH0CSIZ = 2; // Move 2 bytes at a time DCH0DSIZ = 2; // Destination size is 2 bytes DCH0ECON=0; // Clear the DMA configuration settings DCH0ECONbits.CHSIRQ = _PMP_VECTOR; // Move data on PMP interrupt DCH0ECONbits.CHAIRQ = _PMP_ERROR_VECTOR; // Abort on PMP error DCH0ECONbits.SIRQEN = 1; // Enable Start IRQ DCH0ECONbits.AIRQEN = 1; // Enable Abort IRQ DCH0CONbits.CHPRI = 3; // The priority of this channel is 3 (highest) DCH0CONbits.CHEN = 1; // Turn this channel on now IPC33bits.DMA0IP = 3; // Set DMA 0 interrupt priority to 3 IPC33bits.DMA0IS = 1; // Set DMA 0 interrupt sub-priority to 1 IFS4bits.PMPIF = 0; // Clear the PMP interrupt flag IFS4bits.DMA0IF = 0; // Clear the DMA channel 0 interrupt flag IEC4bits.DMA0IE = 1; // Enable the DMA 0 interrupt DCH0INTbits.CHBCIE = 1; // Enable the Channel Block Transer Complete (CHBC) Interrupt DCH0ECONbits.CFORCE = 1; // Force the start of the transfer now DMACONSET=0x8000; // Turn the DMA module on } // Interrupt handler void __attribute__((vector(_DMA0_VECTOR), interrupt(IPL3SRS), nomips16)) DMA0_handler() { IFS4bits.DMA0IF=0; // Clear the DMA channel 0 interrupt flag IEC4bits.DMA0IE=0; // Disable the DMA 0 interrupt DMA_DONE_FLAG = 1; // DMA transfer is done }

Important: Before continuing, I want to mention again that this can transfer a maximum of 65,536 bytes. This means is cannot transfer an entire 320x240x2 bytes frame of data at one time. That can be accomplished by DMA chaining or interrupt handling, neither of which I am going into today.

There are a few new things here. First of all, what is virt_to_phys? Then what's this IRQ-related stuff? Well, virt_to_phys is the name I copied from the datasheet. Let's take a look at what it does:

extern __inline__ unsigned int __attribute__((always_inline)) virt_to_phys(const void* p) 
{ 
 return (int)p<0?((int)p&0x1fffffffL):(unsigned int)((unsigned char*)p+0x40000000L); 
}

Easy, right? Seriously though, what it's doing is converting the virtual memory address of something to a physical memory address because the DMA module works with physical addresses.


Virtual vs Physical memory. To put it very simply, the PIC32 takes the physical memory and maps it into segments (like KSEG0, KSEG1, etc) some of which are cachable and some of which are not. It uses something called Fixed Mapping Translation (FMT) to translate these addresses to the actual physical memory location when they are used. The DMA module requires the actual physical address of the memory used, so we need to translate the pic32's virtual memory address into a physical address, which is what virt_to_phys() does.

The next thing you'll not is we have to supply a "source" interrupt for the DMA transfers. If you remember from the LCD example, in PMP_init() I had this line:

    PMMODEbits.IRQM = 1;    // IRQ at the end of the Read/Write cycle

This means that after any PMP transfers is completed an interrupt will be generated. We do not need to write the Interrupt Service Routine (ISR) for this, it's all handled internally and the DMA module will intercept the interrupt and clear the interrupt flag for us each time.
There is also the option to abort the DMA transfer if the PMP error interrupt is generated, that's what _PMP_ERROR_VECTOR is doing.
Next, we can see that each DMA channel has a priority, just like interrupts did. This priority is also important in DMA chaining.

A word on interrupts. First, why have I changed to using the Shadow Register Set instead of the software interrupts? Simply put, it's faster because it means the PIC32 doesn't have to save the contents of all the many registers to memory before it calls the interrupt service routine (ISR). Before using this feature, it needs to be enabled, usually somewhere after set_performance_mode() in your main() function like this:

PRISS = 0x76543210; // Assign shadow register sets to interrupt priorities 1 through 7

When the DMA transfer is done, it can generate an interrupt to let us know it's done. Knowing what we know about ISRs and how they take valuable time, you may be tempted to do this:

while (DCH0INTbits.CHSDIF == 0);    // Wait for DMA transfer to finish

However, this would be a big mistake. In the DMA datasheet in a code example they say:
" continuously polling the DMA controller in a tight loop would affect the performance of the DMA transfer "
You could check the flag, wait a few microseconds and check again but I prefer to use the interrupt approach as, in theory, it could result in better turn-around times. There are 8 different kinds of interrupts that can be generated which makes the DMA module very flexible.


Can't we do something to shorten that horrendous ISR declaration? Turns out yes, we can. Somewhere in your code, you can define:
#pragma interrupt DMA0_handler IPL3SRS vector _DMA0_VECTOR
and then later in your code, for the actual ISR function, you can just say:
void DMA0_handler(void)
Which is quite a nice change from that mess up above. All a matter of personal preference, really.

The last thing I want to take a look at today is a really cool ability of the DMA module called "Pattern Matching". This is a way to abort a DMA transfer upon receiving a certain byte / word. This pattern can be either 8 or 16 bits.
This is very useful in reading from SD cards, because before we read a block we have to output 0xFF until the SD card returns the 0xFE token to tell us it's ready to give us the data. You can set up a pattern match like this:

DCH0ECONbits.PATEN = 1;     // Enable abort on pattern match
DCH0CONbits.CHPATLEN = 0;   // 8-bit pattern
DCH0DAT = 'þ';              // Character 0xFE

Right, that's long enough for one day. Next time I'll write about how you can use two DMA channels to read from an SD card.

Tags: code, DMA

Working on DMA

What I've been doing and progress on DMA

These last few weeks I've been working on a project for work which is basically putting the DAC-based MP3 player onto a PIC32MX250F256B chip, attaching some serial flash memory and having it attach to the PC via USB in vendor mode, which meant I had to write an app to do that too. Harmony works fine for this and you can still open up their examples and modify them to get them to work. I'm really not a fan of how Harmony splits up everything into its own little file and then needs the Harmony library too, it makes working on the project on multiple machines - which I do - a pain. Personal preference I guess.

Anyway, I'm back to working on trying to get SD card reads to work in DMA mode. With my previous code, the maximum speed I saw was about 3302kB/s, even with huge sustained block reads of 64kB. I'm happy to report that I'm currently getting a bit over 5000kB/s with 64kB sustained block reads. Please note: I am not currently working on DMA mode to get background reads working, I am doing this solely for the speed increase it provides.

I know the SD card spec is for 25MHz but everything seems to work fine at 50MHz. I'm not putting my code into hospital equipment, it's all hobbyist / personal stuff so I just want speed. There are some reliability issues that I'm working on but I hope to upload the code in a few days. Even if it doesn't work, I'll upload it and someone smarter than me can hopefully show me where I've gone wrong. Until then, good luck with the PIC32!

Update (2019-01-07): Here are some results from my tests, comparing the original SD SPI code that I got 8 years ago, my newer 32-bit enhanced buffers code and my new 8-bit enhanced buffers DMA code. The results are fantastic at bigger block sizes!

PIC32MZ - SPI SD transfer speed comparison

HERE BE DRAGONS

There may well still be issues with this code!

I'm uploading this code for anyone who cares to test but be warned. You have to make the following changes to your code:

  • Add this line PRISS = 0x76543210; after you've set up TRIS registers, to enable the SRS registers I use in the DMA interrupt
  • Set SCK as an input (in my case, SPI2 needs TRISG6 set to 1). I have no idea why this is but it refuses to work without it and caused untold hours of debugging!
  • Change your read buffer declaration from unsigned char read_buffer[8192] to __attribute__((coherent)) unsigned char read_buffer[8192];
  • Pray

This code is provided as is and you should fully expect it to have issues. I'm providing it because it works for me, on my own board and in my own programs and I thought someone might like to get their hands on it while I slowly write the post on DMA. Good luck! Now to write that post on DMA...

Here's the code

Tags: blah, DMA

I2C on the PIC32MZ

What is I2C?

I2C is short for Inter-Intergrated Circuit, and people pronounce it I-squared-C or I2C. It's a way of connecting multiple ICs using just two wires, called the Serial Data Line (SDA) and the Serial Clock Line (SCL). Before I even begin, there is something very important to be aware of. The lines are open-drain (aka open-collector). This means that devices on the I2C bus can only pull the lines low or leave them open/floating.

This means that both SDA and SCL need a pullup resistor or they will not be able to set the lines high.

To further drive this home, this is what I mean:

PIC32MZ - I2C pullups

I've personally seen the value of these resistors be anywhere from 1.8k to 47k with many people recommending 4.7k or 10k. The important thing is that they are there.

I repeat, I2C will not work at all without these pullup resistors. Got it? Good, because that really wasted a lot of my time the first time I tried to use I2C. OK, on with the show.

An overview of the I2C protocol

Both SDA and SCL are bidirectional communication signals, they can be changed by either master or slave. Compare this to SPI which needed 3 (MISO, MOSI and CLK) just for communication and then an additional Chip Select for each device and you can see why this can be benficial in systems with multiple devices. Of course, using only two wires means to talk to a lot of devices means that I2C is going to be both more complex and slower than SPI but it's very useful nonetheless.

Today we are going to take a look at the most common situation us hobbyists find ourselves in, namely with one master device and with multiple slave devices. The master is responsible for initiating communication and providing the clock signals. For the purposes of today's article, the PIC32MZ is going to be the master device.

<TL;DR>
What if the clock signal is too fast for the slave device to handle, or it needs extra time to process data? In the I2C protocol there is a way for slave devices to force the master to wait and this is called clock stretching. In master mode, the PIC32 can detect and handle this automatically and we don't need to worry about it. If we want to implement clock stretching, we can do so by setting the STR_EN bit in I2CxCON.
</TL;DR>

So how does it all work? Well, all devices are connected to the same SDA and SCL wires, forming what is known as a bus. Each device on the bus has its own unique address. Slave devices are constantly listening out for this address to be broadcast.

Once a special signal called a start signal is seen on the bus, slave devices wake up and listen. The first data sent is always the address of the slave device we want to talk to. Once a slave device sees its address on the bus, it replies to the master and communication begins. The rest of the devices then ignore the following transfers. Today let's assume that everything is set up nicely and no devices have the same addresses or anything strange like that. I will also be assuming that the devices used have 7-bit addresses. I have read that I2C supports 10-bit devices but I've never seen one myself and I've been through tons of eBay and Aliexpress modules.

So when we write an I2C slave's address to the bus, how does it know whether we want to write to it or read from it? Well, assuming we're using 7-bit addresses it looks like this:

PIC32MZ - I2C Address

So there you can see the 7-bits of the address and two other bits. The first, labelled RW, is the Read/Write bit. This tells the device we're talking to whether we're wanting to read from it (RW set to 1) or write to it (RW set to 0). The second, labelled ACK, is the Acknowledge bit. This is used to tell the device we received data successfully from it and are ready to continue. If we do not send the ACK bit the device sending the data will assume we are not ready to receive more data.

Something to bear in mind, therefore, is that it takes 9 SCL clock pulses to send an 8-bit byte of data because the Acknowledge bit needs to be sent/received too and the master is responsible for generating that clock pulse. Also note that for data bytes there is no Read/Write bit, this is all set up initially when sending out the address. This means that if you want to write some data and then immediately read some data again you will need to send another start signal and another address byte with the read/write bit set.

There are several kinds of signals that you need to be aware of when using I2C:

  • Start - Tells the slave devices on the bus to start listening.
  • Stop - Ends communication, telling slave devices they can go back into idle mode. To restart communication a new start signal must be sent.
  • Acknowledge (ACK) - Used by devices receiving data to acknowledge they have received the data and are ready for more data.
  • Not Acknowledge (NACK) - Not really a signal of its own, just a way to skip the acknowledge signal. This could be because either no more data is wanted or the device is not ready to receive more data.
  • Write - Writes a byte to the I2C bus. For this to be a write, the Read/Write bit must be set to low.
  • Read - Read a byte from the I2C bus. For this to work, the Read/Write bit must be set to high.

There is one more called the repeated start. This is used when we want to continue sending data and don't want to have to stop and start again with the same slave device.

<TL;DR>

Although the PIC32MZ hardware handles the signals electrically, let's take a look at what those signals mean in terms of setting SDA and SCL high and low. Bear in mind that, due to the pullup resistors that you did not forget about, the default values of SCL and SDA are high.

  • Start - SCL remains high and SDA changes from high to low.
  • Stop - SCL remains high but SDA changes from low to high.
  • Repeated start - Electrically the same as start
  • Acknowledge - SDA is set low while SCL sends a clock pulse
  • Not acknowledge - SDA is set high while SCL sends a clock pulse

</TL;DR>

Order of operations

Let's take a look at two example. In the first, a byte (8-bits) of data is being written to a slave that has an I2C address of 0x68. The slave's internal register number is 0x56 and we want to set it to a value of 0x23.
The order of operations would be:

  • Send start signal
  • Write slave address with Read/Write bit set to 0 (Send 0x68 << 1 = 0xD0)
  • Receive ACK
  • Write register address (0x56)
  • Receive ACK
  • Write data value (0x23)
  • Send stop signal

That second line is going to be confusing. Take a look again at how the 9-bit number is made up above. We are actually only sending 8 bits, the ACK is automatic so let's look at the first 8 bits. The upper 7 bits contain the slave's address and the least significant bit is the R/W bit. This means that 7-bit I2C addresses need to be shifted left by 1 when writing them to the I2C bus.

For the second example, 8 bits of data will be read from the same slave, register number 0x73.
The order of operations would be:

  • Send start signal
  • Write slave address with Read/Write bit set to 1 (Send 0x68 << 1 | 1 = 0xD1)
  • Receive ACK
  • Write register address (0x75)
  • Receive ACK
  • Receive data byte (8-bits)
  • Send NACK
  • Send stop signal

Again, that second line. If we want to set the least significant bit, we can either add 1 to the address or OR it by 1, which I prefer because it looks more confusing.

Also, why am I sending a NACK instead of an ACK? This is because I only want to receive one 8-bit number and end the transaction. If I send an ACK, the slave device will start sending me more data.

The PIC32MZ I2C hardware and registers

The PIC32MZ I2C hardware is generally very easy to use. There is no Peripheral Pin Select (PPS) stuff to worry about, the pins are hardwired. The 144-pin device I uses has five I2C peripherals. Today I will be looking at using the first one, namely I2C1. This means the two physical pins I will connect to are SDA1 (on port RA15) and SCL1 (on port RA14).

Although I've made I2C seem incredibly difficult, it's very easy to use on the PIC32MZ. The following registers are used:

  • I2C1BRG - I2C1 Baud Rate Generator Register - Used for setting the speed of the I2C peripheral
  • I2C1CON - I2C1 Control Register - Used to set up I2C
  • I2C1TRN - I2C1 Transmit Data Register - Contains data we want to send onto the I2C bus
  • I2C1RCV - I2C1 Receive Data Register - Contains data received from the I2C bus
  • I2C1STAT - I2C1 Status Register - Contains the status of the I2C peripheral

Setting the speed of the I2C peripheral

OK, before we can jump straight into the code, let's take a look at that I2C1BRG register. It can run from 100kHz to 1Mhz, according to the product page for the PIC32MZ. I often run it at 400kHz in fact. However, as with all things PIC32MZ there are surprises contained in documents. In this case, the "PIC32MZ Embedded Connectivity with Floating Point Unit (EF) Family Silicon Errata and Data Sheet Clarification".

PIC32MZ - I2C pullups

In short, Microchip thinks that "software" solutions for their terrible I2C peripheral are acceptable. By software, they mean implementing the entire I2C protocol yourself on a port you want. Thankfully, 100kHz seems to work fine on I2C1. Please note, I2C3 straight up doesn't work according to the datasheet. I repeat, I2C3 does not work at all, do not use it!

So today, we're going to run at 100kHz because nobody has the time to debug impossible to find errors later on. Here's the formula for setting I2C1BRG to 100kHz:

PIC32MZ - I2C - BRG Formula

This formula is a bit harder than the SPI formula, but not too much so:

  • TPGD is a propagation delay, defined in the PIC32MZ EF datasheet as 104ns.
  • FSCK is the speed we want, so 100kHz for my example.
  • PBCLK is the speed of the peripheral bus clock. I2C uses Peripheral Bus Clock 2 (PBCLK2), which I've set to 100MHz.

So let's take a look at the code to do that:

// I2C_init() initialises I2C1 at at frequency of [frequency]Hz  
void I2C_init(double frequency)
{
    double BRG;

    I2C1CON = 0;            // Turn off I2C1 module
    I2C1CONbits.DISSLW = 1; // Disable slew rate for 100kHz

    BRG = (1 / (2 * frequency)) - 0.000000104;
    BRG *= (SYS_FREQ / 2) - 2;    

    I2C1BRG = (int)BRG;     // Set baud rate
    I2C1CONbits.ON = 1;     // Turn on I2C1 module
}

For stuff like this I always use double precision floating point numbers instead of setting registers directly. This makes debugging easier when stuff doesn't work later on :)

<TL;DR>

Slew rate? For bus speeds of 400kHz, the I2C specification requires that we have slew rate (or rate of change of output) control. For 100kHz, this should be disabled by setting DISSLW to 1.

</TL;DR>

Start, stop, restart, ACK and NACK

All of these are contained in one register, I2C1CON:

  • I2C1CONbits.SEN - Start Condition Enable bit
  • I2C1CONbits.PEN - Stop Condition Enable bit
  • I2C1CONbits.RSEN - Restart (Repeated start) Condition Enable bit
  • I2C1CONbits.ACKDT - Acknowledge Data bit. Set to 0 to ACK and 1 for NACK.
  • I2C1CONbits.ACKEN - Acknowledge Sequence Enable bit

There's a bunch of code here but it's fairly straightforward. I've separated them all into their own functions for readability:

// I2C_wait_for_idle() waits until the I2C peripheral is no longer doing anything  
void I2C_wait_for_idle(void)
{
    while(I2C1CON & 0x1F); // Acknowledge sequence not in progress
                                // Receive sequence not in progress
                                // Stop condition not in progress
                                // Repeated Start condition not in progress
                                // Start condition not in progress
    while(I2C1STATbits.TRSTAT); // Bit = 0 ? Master transmit is not in progress
}

// I2C_start() sends a start condition  
void I2C_start()
{
    I2C_wait_for_idle();
    I2C1CONbits.SEN = 1;
    while (I2C1CONbits.SEN == 1);
}

// I2C_stop() sends a stop condition  
void I2C_stop()
{
    I2C_wait_for_idle();
    I2C1CONbits.PEN = 1;
}

// I2C_restart() sends a repeated start/restart condition
void I2C_restart()
{
    I2C_wait_for_idle();
    I2C1CONbits.RSEN = 1;
    while (I2C1CONbits.RSEN == 1);
}

// I2C_ack() sends an ACK condition
void I2C_ack(void)
{
    I2C_wait_for_idle();
    I2C1CONbits.ACKDT = 0; // Set hardware to send ACK bit
    I2C1CONbits.ACKEN = 1; // Send ACK bit, will be automatically cleared by hardware when sent  
    while(I2C1CONbits.ACKEN); // Wait until ACKEN bit is cleared, meaning ACK bit has been sent
}

// I2C_nack() sends a NACK condition
void I2C_nack(void) // Acknowledge Data bit
{
    I2C_wait_for_idle();
    I2C1CONbits.ACKDT = 1; // Set hardware to send NACK bit
    I2C1CONbits.ACKEN = 1; // Send NACK bit, will be automatically cleared by hardware when sent  
    while(I2C1CONbits.ACKEN); // Wait until ACKEN bit is cleared, meaning NACK bit has been sent
}

Writing data to the I2C bus

This is simply a matter of writing to the I2C1TRN register, waiting for the Transmit Buffer to be empty by checking Transmit Buffer Full (TBF) flag and then waiting for the slave device to Acknowledge receipt.

// address is I2C slave address, set wait_ack to 1 to wait for ACK bit or anything else to skip ACK checking  
void I2C_write(unsigned char address, char wait_ack)
{
    I2C1TRN = address | 0;              // Send slave address with Read/Write bit cleared
    while (I2C1STATbits.TBF == 1);      // Wait until transmit buffer is empty
    I2C_wait_for_idle();                // Wait until I2C bus is idle
    if (wait_ack) while (I2C1STATbits.ACKSTAT == 1); // Wait until ACK is received  
}

Reading data from the I2C bus

For reading, we tell the I2C module to receive data, wait for it to clear the flag and until the receive buffer is full by checking Receive Buffer Full (RBF) flag and then send either an ACK or a NACK.

// value is the value of the data we want to send, set ack_nack to 0 to send an ACK or anything else to send a NACK  
void I2C_read(unsigned char *value, char ack_nack)
{
    I2C1CONbits.RCEN = 1;               // Receive enable
    while (I2C1CONbits.RCEN);           // Wait until RCEN is cleared (automatic)  
    while (!I2C1STATbits.RBF);          // Wait until Receive Buffer is Full (RBF flag)  
    *value = I2C1RCV;                   // Retrieve value from I2C1RCV

    if (!ack_nack)                      // Do we need to send an ACK or a NACK?  
        I2C_ack();                      // Send ACK  
    else
        I2C_nack();                     // Send NACK  
}

Actually using all this stuff in the real world

Theory is fun and all, but let's look at an example of using this in an actual slave device. I have a MPU-9250 module (9 degrees of freedom) module. It's a very complex module and I'm not going into it deeply today. However, as a test of whether or not I2C is working it'll do. Let's look in the datasheet for the MPU-9250 to find the address:

PIC32MZ - I2C - MPU9250 AD0

OK good, so I connect AD0 to ground and then the address will be 0x68. Excellent. Let's see what registers I can read from to make sure I2C is working:

PIC32MZ - I2C - MPU9250 WHOAMI

Excellent again. Register 117 (0x75) should return 0x68. Or 0x71...? Or in my case, 0x73. I think it must be a revision number or clones use something different. Point is, it's consistent and I've tested it with multiple modules to make sure it works and isn't random :) Let's see what the MPU9250's datasheet says about reading and writing:

PIC32MZ - I2C - MPU9250 Read
PIC32MZ - I2C - MPU9250 Write
PIC32MZ - I2C - MPU9250 Legend

OK, nothing too hard there. Let's see how the code for that looks:

#define MPU9250_ADDRESS 0x68            // The address of MPU9250 when the AD0 pin is connected to ground
#define MPU9250_WHOAMI  0x75            // Will return a set value based on device, in my case 0x73

// Write byte value to register at reg_address
void MPU9250_write(unsigned char reg_address, unsigned char value)
{
    I2C_start();                        /* Send start condition */  
    I2C_write(MPU9250_ADDRESS << 1, 1); /* Send MPU9250's address, read/write bit not set (AD + R) */  
    I2C_write(reg_address, 1);          /* Send the register address (RA) */  
    I2C_write(value, 1);                /* Send the value to set it to */  
    I2C_stop();                         /* Send stop condition */  
}

// Read a byte from register at reg_address and return in *value
void MPU9250_read(unsigned char reg_address, unsigned char *value)
{
    I2C_start();                        /* Send start condition */  
    I2C_write(MPU9250_ADDRESS << 1, 1); /* Send MPU9250's address, read/write bit not set (AD + R) */  
    I2C_write(reg_address, 1);          /* Send the register address (RA) */  
    I2C_restart();                      /* Send repeated start condition */  
    I2C_write(MPU9250_ADDRESS << 1 | 1, 1); /* Send MPU9250's address, read/write bit set (AD + W) */  
    I2C_read(value, 1);                 /* Read value from the I2C bus */  
    I2C_stop();                         /* Send stop condition */  
}

unsigned char main()
{
    unsigned char value;

    // Set performance to ultra rad
    set_performance_mode();

    // Moved all the ANSEL, TRIS and LAT settings to their own function
    setup_ports();        

    // Enable multi-vectored interrupts mode
    INTCONbits.MVEC = 1;

    // No need to set up PPS, I2C hardware is fixed to certain pins. SCL1 = RA14, SDA1 = RA15

    // Initialise I2C1 at 100kHz
    I2C_init(100000);

    while (1)
    {
        /* Read the value at register 0x75, the MPU9250's WHOAMI register. Should return 0x68, 0x71 or 0x73 depending on version. */  
        MPU9250_read(MPU9250_WHOAMI, &value);

        /* Wait 10ms before trying again so as not to overwhelm the MPU9250 or the PIC32MZ's I2C peripheral */  
        delay_ms(10);
    }
}

So I ran that, and here's what I got on my oscilloscope:

PIC32MZ - I2C - Full wave

Cyan is SDA and magenta is SCL. Just looking at that provides no real understanding of what's happening so I created another Photoshop monsterpiece for my dear nonexistent readers. Click on it to see it in its full glory:

PIC32MZ - I2C - Wave with captions

Yep, looks like the code. Phew. With that, another long-winded post is over. Good luck!

Here's the code

Tags: code, I2C