User Tools

Site Tools


pipeline:utilities:e2bdb
This version (2019/04/02 09:20) is a draft.
Approvals: 0/1

e2bdb

Berkeley Data Base Utilities: Examine and interact with a EMAN2DB directory.


Usage

Usage in command line

e2bdb.py  --cleanup  --force  --delete  --all  --long  --short  --filt='SUBSTRING'  --filtexclude='SUBSTRING'  --match='REGULAR_EXPRESSION'  --exclude='bdb_NAME'  --dump  --smalldump  --extractplots  --check  --merge  --makevstack='bdb:/Home/user/project#output'  --appendvstack='bdb:output'  --list='abc.txt'  --exlist='xyz.txt'  --restore='bdb:vstack'  --ppid=PPID  --checkctf  --step=STEP  --nocache  --verbose=LEVEL


Typical usage

e2bdb does not support MPI.

1. Clean up the database cache so files can safely be moved or accessed on another computer via NFS.

e2bdb.py --clean 


2. Delete (or at least empty) the named database(s).

e2bdb.py bdb:filename --delete 


3. Only include dictionary names containing the specified substring.

e2bdb.py . --makevstack=bdb:vstack --filt=abc 

A virtual stack will be generated, which contains all bdb files that have “abc” in the file name.

4. Only include dictionaries matching the provided Python regular expression.

e2bdb.py . --makevstack=bdb:vstack --match=abc 

A virtual stack will be generated, which the bdb file that is named “abc”.

5. List contents of an entire database.

e2bdb.py --dump refine_01#register 


6. Make a 'virtual' bdb image stack from one or more other stacks.

e2bdb.py bdb:db1 bdb:db2 --makevstack=bdb:db3 

This will combine images in db1 and db2 into db3. Please note that any change in metadata of db1 or db2, there won't be changes in db3. However, if there are changes in the image data, it will appear at db3.

In order to include all stack with a common root in the name (for example, db1, db2, db3) into one virtual stack called data, use

e2bdb.py . --makevstack=bdb:data --filt=db 


7. Append to or create a 'virtual' bdb image stack with its own metadata.

e2bdb.py bdb:db1 bdb:db2 --appendvstack=bdb:db3 

db1 and db2 virtual stack will be appended to db3, if it exists. If not, a new db3 will be created.

8. Creates a new virtual bdb image stack by extracting images whose IDs are listed in the specified ASCII file from existed virtual stacks.

e2bdb.py bdb:#db1 bdb:#db2 --makevstack=bdb:.#db3 --list=abc.txt 

Here, abc.txt is a file which contains the IDs of the selected images in each bdb file. A virtual stack db3 will be generated and contain the selected images of db1 and db2. If change makevstack to appendvstack, virtual stack of db1 and db2 will be appended to db3.

e2bdb.py bdb:#db1 bdb:#db2 --appendvstack=bdb:.#db3 --list=abc.txt 

'This option allows you to select some specific images which you would like to manipulate to generate a virtual stack.'

9. Write changes in the derived virtual stack back to the original stack.

e2bdb.py --restore=bdb:vstack 

vstack is a derived image stack. –restore will allow it to write the change back to the parent image stack. This option is very useful when an operation (such as alignment) is performed on a subset of data (extracted as a virtual fact) and then the results are to be merged with the parent stack.


Input

Main Parameters

--cleanup
Clean up database cache: Clean up the database cache so files can safely be moved or accessed on another computer via NFS. (default False)
--force
Force an action: Force an action that would normally fail due to failed checks. (default False)
--delete
Delete named database(s): Delete (or at least empty) the named database(s) (default False)
--all
List per-particle info: List per-particle info. (default False)
--long
Print details of databases: Print details of databases. (default False)
--short
Print database names: Print each database name in 'bdb:database' format for use. (default False)
--filt
Include dictionary names with specified substring: Only include dictionary names containing the specified substring. (default none)
--filtexclude
Exclude dictionary names with specified substring: Exclude dictionary names containing the specified substring. (default none)
--match
Include dictionaries matching regular expression: Only include dictionaries matching the provided Python regular expression. (default none)
--exclude
Database name of exclusion key list: The name of a database containing a list of exclusion keys. (default none)
--dump
Print entire database contents: List contents of an entire database. (default False)
--smalldump
Print summary of entire database contents: List contents of an entire database, but only list 2 items per dictionary to better see headers. (default False)
--extractplots
Extract plots as text files: If a database contains sets of plots, such as bdb:refine_xx#convergence.results, this will extract the plots as text files. (default False)
--check
Check for self-consistency and errors: Check for self-consistency and errors in the structure of specified databases. (default False)
--merge
Merge database contents: Merge the contents of bdb 2-N into bdb 1 (including bdb 1's contents). (default False)
--makevstack
Output virtual image stack: Make a 'virtual' bdb image stack with the specified name from one or more other stacks. (default none)
--appendvstack
Append to or create virtual image stack: Append to or create a virtual bdb image stack with its own metadata. (default none)
--list
Image selection file: Input selection text file containing a list of selected image IDs (or indexes of the data subset) to create a new virtual bdb image stack from an existed stack or virtual stack. (default none)
--exlist==none --step==0,1
--exlist
Image exclusion file: Input exclusion text file containing a list of excluded image IDs (or indexes of the data subset) to create a new virtual bdb image stacks from an existed stack or virtual stack. (default none)
--list==none --step==0,1
--restore
Write changes in virtual stack back to original stack: Write changes in the derived virtual stack back to the original stack. (default none)
--ppid
Set PID of parent process: Set the PID of the parent process, used for cross platform PPID. (default -1)
--checkctf
Verifies CTF entry: Verify that all images in the file contain CTF information, and give some basic statistics. (default False)
--step
Processes only subset: Specify <init>,<step>[,<max>]. Process only a subset of the input data. For example, 0,2 would process only the even numbered particles. (default 0,1)
--list==none --exlist==none


Advanced Parameters

--nocache
Do not use database cache: Do not use the database cache for this operation. Cache was permanently disabled as of August 2013. (default False)
--verbose
Verbose level [0-9]: Higner number means higher level of verboseness. (default 0)


Output

Output depends on the option settings.


Description

IN EMAN2, the usage of Berkeley DB file format has been increased since it is very convenient to manipulate the metadata separately from the image data, and it is faster that other formats, such as HDF. As it has this special structure, its files are stored in a folder named EMAN2DB, which cannot manually copy or remove files in it.

Note: It is important to emphasize that one cannot manually rename or edit the files in the EMAN2DB directory. Doing so can corrupt the entire database such that programs will no longer be able to access it properly. You can safely move the directory as a whole to a different location, but otherwise it should not be modified. The e2bdb program is using to interact with the EMAN2DB directory.

Virtual stack is a stack of metadata associated with the image data. When generate a virtual stack. Unlike doing this same task with e2proc2d.py bdb:.#db1 bdb:.#db3; e2proc2d.py bdb:.#db2 bdb:.#db3, it will only generate the metadata stack. When generate a virtual stack, in the metadata, image location will be recored as data_path, original image ID will be recored as data_n, and the original stack will be recored as data_source.

By using the –list option, you can generate a virtual stack which contains metadata information of selected images. The virtual stack can be manipulate just as the original bdb files. Changes in virtual stack won't appear on the original stack. You can use the –restore option to write the change back to the original stack. It is very convenient to do so, and this functionality is expected to be very useful if one would like to use virtual stack for iteration.


Method


Reference


Author / Maintainer

Steve Ludtke, Ran Lin


Keywords

Category 1:: APPLICATIONS


Files

programs/e2bdb.py


See also

Maturity

No Bugs known so far.


pipeline/utilities/e2bdb.txt · Last modified: 2019/04/02 09:20 by lusnig