Skip to content

Commandline Tool

Command Line Tool¤

kirsche¤

Usage:

kirsche [OPTIONS] COMMAND [ARGS]...

Options:

  --help  Show this message and exit.

connections¤

Establish connections between the list of papers, either from a list of DOIs, bib file, or from download metadata file.

If no metadata_file provided, the metadata will be downloaded using parameters specified in paper_id or bib_file. If metadata_file is provided, the connections will be established using the metadata file.

About Metadata Sources¤

Download paper data from service provides (e.g., SemanticScholar).

There are two ways to provide a list of DOIs to be retrieved, provide paper DOIs directly using --paper_id or -p, or loading a bib file using --source_bib_file or -sb.

To save the downloaded data, provide a path to a file using --target or -t.

:param paper_id: Paper DOI, optional, can be multiple :param source_bib_file: Bib file path, optional :param source_metadata_path: Target data file path, optional :param connected_papers_path: path to save enhanced data file with connections calcualted :param sleep_time: Sleep time between requests, defaults to 1sec.

Usage:

kirsche connections [OPTIONS]

Options:

  -p, --paper_id TEXT             Paper ID
  -bib, --source_bib_file PATH    Bib file path
  -meta, --source_metadata_path PATH
                                  path to data file/folder with paper metadata
  -t, --connected_papers_path PATH
                                  path to save enhanced data file
  -st, --sleep_time INTEGER       Sleep time between requests
  --help                          Show this message and exit.

connections-from-metadata¤

Establish connections between the list of papers

:param source_metadata_path: path to data file with paper metadata :param connected_papers_path: path to save enhanced data file

Usage:

kirsche connections-from-metadata [OPTIONS]

Options:

  -s, --source_metadata_path PATH
                                  path to data file/folder with paper metadata
  -t, --connected_papers_path PATH
                                  path to save enhanced data file(s)
  --help                          Show this message and exit.

metadata¤

Download paper data from service provides (e.g., SemanticScholar).

There are two ways to provide a list of DOIs to be retrieved, provide paper DOIs directly using --paper_id or -p, or loading a bib file using --source_bib_file or -bib.

To save the downloaded data, provide a path to a file using --target_metadata_path or -t.

:param paper_id: Paper DOI, optional, can be multiple :param source_bib_file: Bib file path, optional :param target_metadata_path: Target data file path, optional :param sleep_time: Sleep time between requests, defaults to 1sec.

Usage:

kirsche metadata [OPTIONS]

Options:

  -p, --paper_id TEXT             Paper ID
  -bib, --source_bib_file PATH    Bib file path
  -t, --target_metadata_path PATH
                                  Target data file path
  -sleep, --sleep_time INTEGER    Sleep time between requests
  --help                          Show this message and exit.

visualization¤

Visualize the connections between the papers.

Usage:

kirsche visualization [OPTIONS]

Options:

  -p, --source_paper_id TEXT      Source: Paper ID
  -bib, --source_bib_file PATH    Source: Bib file path
  -meta, --source_metadata_path PATH
                                  Source: path to data file/folder with paper
                                  metadata
  -conn, --source_connected_papers_path PATH
                                  Source: path to save enhanced data
                                  file/folder
  --title TEXT                    title of the chart
  -t, --target_html_path PATH     Target: path to html file  [required]
  -sleep, --sleep_time INTEGER    Sleep time between requests
  --help                          Show this message and exit.

Helpers¤

connections(paper_id, source_bib_file, source_metadata_path, connected_papers_path, sleep_time) ¤

Establish connections between the list of papers, either from a list of DOIs, bib file, or from download metadata file.

If no metadata_file provided, the metadata will be downloaded using parameters specified in paper_id or bib_file. If metadata_file is provided, the connections will be established using the metadata file.

About Metadata Sources¤

Download paper data from service provides (e.g., SemanticScholar).

There are two ways to provide a list of DOIs to be retrieved, provide paper DOIs directly using --paper_id or -p, or loading a bib file using --source_bib_file or -sb.

To save the downloaded data, provide a path to a file using --target or -t.

Parameters:

Name Type Description Default
paper_id Union[str, Iterable]

Paper DOI, optional, can be multiple

required
source_bib_file Path

Bib file path, optional

required
source_metadata_path Path

Target data file path, optional

required
connected_papers_path Union[str, Path]

path to save enhanced data file with connections calcualted

required
sleep_time int

Sleep time between requests, defaults to 1sec.

required
Source code in kirsche/command.py
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
@kirsche.command()
@click.option("--paper_id", "-p", help="Paper ID", multiple=True)
@click.option("--source_bib_file", "-bib", type=click.Path(exists=True), help="Bib file path")
@click.option(
    "--source_metadata_path",
    "-meta",
    type=click.Path(exists=True),
    help="path to data file/folder with paper metadata",
)
@click.option(
    "--connected_papers_path",
    "-t",
    type=click.Path(exists=False),
    help="path to save enhanced data file",
)
@click.option("--sleep_time", "-st", default=1, help="Sleep time between requests")
def connections(
    paper_id: Union[str, Iterable],
    source_bib_file: Path,
    source_metadata_path: Path,
    connected_papers_path: Union[str, Path],
    sleep_time: int,
):
    """Establish connections between the list of papers, either from a list of DOIs, bib file, or from download metadata file.

    If no `metadata_file` provided, the metadata will be downloaded using parameters specified in `paper_id` or `bib_file`.
    If `metadata_file` is provided, the connections will be established using the metadata file.

    ## About Metadata Sources

    Download paper data from service provides (e.g., SemanticScholar).

    There are two ways to provide a list of DOIs to be retrieved, provide paper DOIs directly using `--paper_id` or `-p`, or loading a bib file using `--source_bib_file` or `-sb`.

    To save the downloaded data, provide a path to a file using `--target` or `-t`.


    :param paper_id: Paper DOI, optional, can be multiple
    :param source_bib_file: Bib file path, optional
    :param source_metadata_path: Target data file path, optional
    :param connected_papers_path: path to save enhanced data file with connections calcualted
    :param sleep_time: Sleep time between requests, defaults to 1sec.
    """
    if isinstance(connected_papers_path, str):
        connected_papers_path = Path(connected_papers_path)

    click.secho(f"Retrieving paper metadata...")
    if not source_metadata_path:
        if source_bib_file:
            logger.debug(f"Using bib file: {source_bib_file}")

        if connected_papers_path.exists():
            existing_connected_papers = load_batch_json(connected_papers_path)
        else:
            existing_connected_papers = []

        records = _metadata(
            paper_id,
            source_bib_file,
            None,
            sleep_time,
            existing_records=existing_connected_papers,
        )
    else:
        records = load_batch_json(source_metadata_path)
    click.secho(f"  Retrieved {len(records)} records.")

    click.secho(f"Connecting papers...")
    connected_papers = append_connections(records)
    click.secho(f"  Connected papers...")

    # Filter out unnecessary keys in the dictionary
    click.secho(f"Filtering and saving data...")
    connected_papers = save_connected_papers(connected_papers, target=connected_papers_path)
    click.secho(f"  Done...")

    if not connected_papers_path:
        click.secho(f"No saving path specified, printing simplified data view...")
        dv = DataViews(connected_papers)
        click.echo(dv.json_simple)

connections_from_metadata(source_metadata_path, connected_papers_path) ¤

Establish connections between the list of papers

Parameters:

Name Type Description Default
source_metadata_path Path

path to data file with paper metadata

required
connected_papers_path Path

path to save enhanced data file

required
Source code in kirsche/command.py
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
@kirsche.command()
@click.option(
    "--source_metadata_path",
    "-s",
    type=click.Path(exists=True),
    help="path to data file/folder with paper metadata",
)
@click.option(
    "--connected_papers_path",
    "-t",
    type=click.Path(exists=False),
    help="path to save enhanced data file(s)",
)
def connections_from_metadata(source_metadata_path: Path, connected_papers_path: Path):
    """Establish connections between the list of papers

    :param source_metadata_path: path to data file with paper metadata
    :param connected_papers_path: path to save enhanced data file
    """

    connected_papers = append_connections_for_file(source_metadata_path, connected_papers_path)

    return connected_papers

metadata(paper_id, source_bib_file, target_metadata_path, sleep_time) ¤

Download paper data from service provides (e.g., SemanticScholar).

There are two ways to provide a list of DOIs to be retrieved, provide paper DOIs directly using --paper_id or -p, or loading a bib file using --source_bib_file or -bib.

To save the downloaded data, provide a path to a file using --target_metadata_path or -t.

Parameters:

Name Type Description Default
paper_id Union[str, Iterable]

Paper DOI, optional, can be multiple

required
source_bib_file Path

Bib file path, optional

required
target_metadata_path Path

Target data file path, optional

required
sleep_time Optional[int]

Sleep time between requests, defaults to 1sec.

required
Source code in kirsche/command.py
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
@kirsche.command()
@click.option("--paper_id", "-p", help="Paper ID", multiple=True)
@click.option("--source_bib_file", "-bib", type=click.Path(exists=True), help="Bib file path")
@click.option(
    "--target_metadata_path",
    "-t",
    type=click.Path(exists=False),
    help="Target data file path",
)
@click.option("--sleep_time", "-sleep", default=1, help="Sleep time between requests")
def metadata(
    paper_id: Union[str, Iterable],
    source_bib_file: Path,
    target_metadata_path: Path,
    sleep_time: Optional[int],
):
    """Download paper data from service provides (e.g., SemanticScholar).

    There are two ways to provide a list of DOIs to be retrieved, provide paper DOIs directly using `--paper_id` or `-p`, or loading a bib file using `--source_bib_file` or `-bib`.

    To save the downloaded data, provide a path to a file using `--target_metadata_path` or `-t`.

    :param paper_id: Paper DOI, optional, can be multiple
    :param source_bib_file: Bib file path, optional
    :param target_metadata_path: Target data file path, optional
    :param sleep_time: Sleep time between requests, defaults to 1sec.
    """
    records = _metadata(paper_id, source_bib_file, target_metadata_path, sleep_time)

    return records

visualization(source_paper_id, source_bib_file, source_metadata_path, source_connected_papers_path, title, target_html_path, sleep_time) ¤

Visualize the connections between the papers.

Source code in kirsche/command.py
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
@kirsche.command()
@click.option("--source_paper_id", "-p", required=False, help="Source: Paper ID", multiple=True)
@click.option(
    "--source_bib_file",
    "-bib",
    required=False,
    type=click.Path(exists=True),
    help="Source: Bib file path",
)
@click.option(
    "--source_metadata_path",
    "-meta",
    required=False,
    type=click.Path(exists=True),
    help="Source: path to data file/folder with paper metadata",
)
@click.option(
    "--source_connected_papers_path",
    "-conn",
    required=False,
    type=click.Path(exists=True),
    help="Source: path to save enhanced data file/folder",
)
@click.option("--title", default="Kirsche: Paper Graph", help="title of the chart")
@click.option(
    "--target_html_path",
    "-t",
    required=True,
    type=click.Path(exists=False),
    help="Target: path to html file",
)
@click.option("--sleep_time", "-sleep", default=1, help="Sleep time between requests")
def visualization(
    source_paper_id,
    source_bib_file,
    source_metadata_path,
    source_connected_papers_path,
    title,
    target_html_path,
    sleep_time,
):
    """Visualize the connections between the papers."""
    if source_connected_papers_path:
        connected_papers = load_batch_json(source_connected_papers_path)
    else:
        click.secho(f"Retrieving paper metadata...")
        if not source_metadata_path:
            if source_bib_file:
                logger.debug(f"Using bib file: {source_bib_file}")
            records = _metadata(source_paper_id, source_bib_file, None, sleep_time)
        else:
            records = load_batch_json(source_metadata_path)
        click.secho(f"  Retrieved {len(records)} records.")

        click.secho(f"Connecting papers...")
        connected_papers = append_connections(records)
        click.secho(f"  Connected papers...")

        # Filter out unnecessary keys in the dictionary
        click.secho(f"Filtering and saving data...")
        connected_papers = save_connected_papers(connected_papers)
        click.secho(f"  Done...")

    g = PaperGraph(connected_papers, title=title)
    nodes = g.nodes
    edges = g.edges

    click.secho(f"Saving html file...")
    visualize(nodes, edges, g.title, target_html_path)