Pre-release 1.4 is now available for testing (Updated 23/11/20)

rxliuli · 8 November 2020 13:46

There is a simple conversion method, I believe python should be able to do similar things

rxliuli/joplin-api/blob/e146e3098de8302d4216840eb9aa8fc26bc95ea6/src/util/PageUtil.ts#L20


      
          /**
           * 最大分页数量
           * @private
           */
          private static readonly MaxLimit = 100
          
          /**
           * 循环获取所有分页的数据
           * 每次都获取最大分页数量，尽可能减少请求次数
           */
          static async pageToAllList<
            F extends (
              pageParam: PageParam<any> & Record<string, any>,
            ) => Promise<PageRes<any>>
          >(
            fn: F,
            pageParam?: Omit<Parameters<F>[0], 'cursor' | 'limit'>,
          ): Promise<PageResValueType<ReturnType<F>>[]> {
            let cursor: string | undefined
            const list: PageResValueType<ReturnType<F>>[] = []
            do {

Usage example

github.com

rxliuli/joplin-api/blob/e146e3098de8302d4216840eb9aa8fc26bc95ea6/test/util/PageUtil.test.ts#L4


      
          import { PageUtil } from '../../src/util/PageUtil'
          import { folderApi, noteApi } from '../../src'
          
          describe('测试 PageUtil', function () {
            it('测试获取所有目录', async function () {
              const folderList = await PageUtil.pageToAllList(folderApi.list)
              console.log('folderList.length: ', folderList.length)
              expect(folderList.length).toBeGreaterThan(0)
            })
            it('测试获取所有笔记', async function () {
              const noteList = await PageUtil.pageToAllList(noteApi.list, {
                fields: ['id'],
              })
              console.log('noteList.length: ', noteList.length, noteList)

Although the pagination added by v1.4.* is a bit sudden, I don’t think there is any problem with the acquisition of large amounts of data (though about /notes/${id}/tags and /notes/${id}/resources I still don’t agree with the use of paging for the two APIs...)

foxmask · 8 November 2020 13:49

may be but I don't understand anything of TS.

rxliuli · 8 November 2020 13:52

To put it simply, this is a high-order function. You need to pass an original paging query function and the required query parameters, and then call the passed function in the function loop until there is no next page (here, cousor is When empty), return the list data accumulated in the process

laurent · 8 November 2020 15:10

I can understand this is a bit annoying, I also don't like having to update perfectly working apps just because an API changed, but in this case it's hard to avoid.

If I allow retrieving everything in one go, people will test their plugins with their own notes, maybe a few hundred of them, not realising that some users have 100,000 notes or more. No only their plugin is likely to break but it might also freeze or crash the main app. So it's best to develop external apps or plugins with pagination in mind.

rxliuli method would work and I've also put a pseudo algorithm there to retrieve everything if that can help: https://joplinapp.org/api/references/rest_api/#pagination

foxmask · 8 November 2020 16:58

I understand, but how do you know on which page you are ? if you just know the current cursor, how do you know the previous / next one ?

laurent · 8 November 2020 17:50

You can't, so for now you could probably just retrieve everything and do the pagination yourself like you were previously doing.

rxliuli · 8 November 2020 23:46

Using the design of cousor does not support the scene of obtaining the specified page, it is most suitable for the scene of infinite scrolling

rxliuli · 9 November 2020 14:52

This cousor looks very strange now, it not only contains the last one for comparison, it even contains query parameters, making the set fields field invalid @laurent

Reproducible example:

github.com

rxliuli/joplin-api/blob/09d92d7eb4c8dcae0f1df685fd58f288a2b80259/test/api/NoteApi.test.ts#L89


      
            const createFolderRes = await folderApi.create({
              title: '测试目录 2',
              parent_id: '',
            })
            const res = await noteApi.update({
              id: data.noteId,
              parent_id: createFolderRes.id,
            })
            expect(res.parent_id).toBe(createFolderRes.id)
          })
          it('测试获取全部的笔记', async () => {
            const time1 = Date.now()
            const res1 = await PageUtil.pageToAllList(noteApi.list)
            const time2 = Date.now()
            const res2 = await PageUtil.pageToAllListForParallel(noteApi.list, {
              fields: ['id', 'title', 'parent_id'],
            })
            const time3 = Date.now()
            console.log(res1.length, res2.length)
            expect(res1).toEqual(res2)
            console.log('time diff: ', time2 - time1, time3 - time2)

Note: If the performance cannot be improved through concurrent requests, I can't think of how to quickly get the list of all notes on the client. . .(I have a scenario where you need to get all the note data to display some kind of chart, such as note relationship diagram)

https://rxliuli.com/joplin-charts/

laurent · 9 November 2020 15:15

I don't understand the issue. Do you have an example query that fails?

rxliuli · 9 November 2020 15:21

sample graph

Note that you only need to pass the cursor to the next request, as it will continue the fetching process using the same parameters you initially provided.

I think it is unreasonable for cousor to include query parameters, especially query parameters that do not affect the number of returned results. Generally speaking, it should be the last id of the current page, right? Why not?

This is an example of query failure, when you run this unit test, it will fail(The number of notes must be at least 101)

You can modify the token here for testing: joplin-api/test/util/setupTestEnv.ts at dc44a50f4d222c27789dfcdec2ab66ebd07150ae · rxliuli/joplin-api · GitHub

laurent · 9 November 2020 15:29

Do you mean that it adds extra parameters like updated_time and so on?

rxliuli · 9 November 2020 15:31

No, I mean the title and parent_id fields are missing from the results

rxliuli · 9 November 2020 15:38

In general, I hope to improve the efficiency of obtaining data through concurrent calls, so I made an attempt, but the current API cannot support this. I believe @foxmask also likes to have simple ways to improve the performance of obtaining full data

github.com

rxliuli/joplin-api/blob/09d92d7eb4c8dcae0f1df685fd58f288a2b80259/src/util/PageUtil.ts#L80


    console.log('cursorList: ', cursorList)
    cursorList.unshift(undefined)
    const callback = asyncLimiting(async (cursor: string | undefined) => {
      const res = await fn({
        ...pageParam,
        cursor,
        limit: this.MaxLimit,
      })
      return res.items
    }, options?.limit || 10)
    return new AsyncArray(cursorList).parallel().flatMap(callback)
  }
}

laurent · 9 November 2020 15:44

Ok there's probably a bug there then. What API end point is it? just /notes?

rxliuli · 9 November 2020 15:46

I'm not sure, but I guess all paging queries have this problem. The core reason is that cousor contains the query parameter fields/limit/more...

laurent · 9 November 2020 16:29

No it should return the fields you've requested in the initial query. Any field you specify when you do a subsequent cursor query will be ignored, as documented:

Note that you only need to pass the cursor to the next request, as it will continue the fetching process using the same parameters you initially provided.

https://joplinapp.org/api/references/rest_api/#pagination

laurent · 9 November 2020 17:23

I can't replicate this field issue although I've added a test for good mesure. If there's an issue, I'll need curl calls or something I can use to replicate without installing a whole application.

rxliuli · 9 November 2020 23:20

This is the problem, I think this is a wrong design

The cursor of pre query ad-hoc is simply a clever approach. . .

laurent · 10 November 2020 00:29

This is the right approach to iterate over a feed, and there’s info on the doc on how to retrieve all the data like before. Field param is also working so at this point I still don’t know what the issue is.

Best would be to provide a minimal reproducible example if you think there’s a problem: https://stackoverflow.com/help/minimal-reproducible-example

rxliuli · 10 November 2020 00:42

In short, I plan to get all the cousor lists first, and then concurrently (currently 10) to get the real data

import { PageParam, PageRes } from '../modal/PageData'
import { asyncLimiting } from './asyncLimiting'
import { AsyncArray } from './AsyncArray'

type PageResValueType<T extends Promise<PageRes<any>>> = T extends Promise<
  PageRes<infer U>
>
  ? U
  : never

export class PageUtil {
  /**
   * Maximum number of pages
   * @private
   */
  private static readonly MaxLimit = 100

  /**
   * Retrieve all paged data in a loop (concurrent, currently set the concurrent number to 10)
   * Suitable for all-at-a-time acquisition of large amounts of data, for example, all notes need to be acquired to display a certain chart
   * Get the maximum number of pages each time to minimize the number of requests
   */
  static async pageToAllListForParallel<
    F extends (
      pageParam: PageParam<any> & Record<string, any>,
    ) => Promise<PageRes<any>>
  >(
    fn: F,
    pageParam?: Omit<Parameters<F>[0], 'cursor' | 'limit'>,
    options?: { limit: number },
  ): Promise<PageResValueType<ReturnType<F>>[]> {
    let cursor: string | undefined
    const cursorList: (string | undefined)[] = []
    do {
      // noinspection JSUnusedAssignment
      const res = await fn({
        ...pageParam,
        fields: ['id'],
        cursor,
        limit: this.MaxLimit,
      })
      cursor = res.cursor
      if (res.cursor) {
        cursorList.push(res.cursor!)
      }
    } while (cursor)
    console.log('cursorList: ', cursorList)
    cursorList.unshift(undefined)
    const callback = asyncLimiting(async (cursor: string | undefined) => {
      const res = await fn({
        ...pageParam,
        cursor,
        limit: this.MaxLimit,
      })
      return res.items
    }, options?.limit || 10)
    return new AsyncArray(cursorList).parallel().flatMap(callback)
  }
}

    it('test and get all notes concurrently', async () => {
      const res = await PageUtil.pageToAllListForParallel(noteApi.list, {
        fields: ['id', 'title', 'parent_id'],
      })
      console.log('first page and second page: ', res[0], res[100])
    })

Why not try to run this unit test?

github.com

rxliuli/joplin-api/blob/8429d266fd7b64336f39906802506a36eea78941/test/api/NoteApi.test.ts#L102


      const time2 = Date.now()
      const res2 = await PageUtil.pageToAllListForParallel(noteApi.list, {
        fields: ['id', 'title', 'parent_id'],
      })
      const time3 = Date.now()
      console.log(res1.length, res2.length)
      expect(res1).toEqual(res2)
      console.log('time diff: ', time2 - time1, time3 - time2)
    })
    it('test and get all notes concurrently', async () => {
      const res = await PageUtil.pageToAllListForParallel(noteApi.list, {
        fields: ['id', 'title', 'parent_id'],
      })
      console.log('first page and second page: ', res[0], res[100])
    })
  })
})

Topic		Replies	Views
Pre-release v2.13 is now available (Updated 15/01/2024) Beta Testing	30	5683	15 January 2024
Joplin version 1.0.233 News	0	1320	8 August 2020
Desktop pre-release v3.0 is now available (Updated 21/08/2024) Beta Testing	29	4110	21 August 2024
Desktop pre-release v2.14 is now available (Updated 07/08/2024) Beta Testing	44	5128	7 August 2024
Joplin version 1.0.110 News	5	864	1 October 2018

Pre-release 1.4 is now available for testing (Updated 23/11/20)

Related topics