You can use "sorted()"
Built-in Functions Python - sorted()
from pathlib import Path
for subdir in sorted(Path('/some/path').iterdir()):
print(subdir)
NOTE: @NamitJuneja points out, This changes iterating over a generator to iterating over a list. Hence if there are a huge number of files in the memory, loading them all into the memory (by loading them into a list) might cause problems.
On my Mac, the iterdir() method returns the list already sorted. So that looks system dependent. What OS are you using?
You should include a at the beginning of every function, class, and module you write. This will allow documentation to identify what your code is supposed to do. This also helps other readers understand how your code works. I see that you already have a couple for your functions, but stay consistent.docstring
Parameter names should be descriptive enough to be able to tell what should be passed. While might be obvious to some programmers as an l, to others it might not. Since you're passing a iterable, renaming it to list (to avoid using reserved word list_) makes it more obvious what you're passing, and accepting.list
When you have a constant in your program, it should be UPPER_CASE to identify it as such.
You want as little code as possible in your program. So, instead of:
def subject_list_fastafiles():
""" Method Docstring """
subject_fastafiles = sorted_nicely([fastafile for fastafile in os.listdir(subject_path) if os.path.isfile(os.path.join(subject_path, fastafile))])
return subject_fastafiles
def query_list_fastafiles():
""" Method Docstring """
query_fastafiles = sorted_nicely([fastafile for fastafile in os.listdir(query_path) if os.path.isfile(os.path.join(query_path, fastafile))])
return query_fastafiles
def filter_files_ending_with_one(sorted_files):
""" Method Docstring """
files_end_with_one = [name for name in subject_fastafiles if name[-1].isdigit() and not name[-2].isdigit() == 1]
return files_end_with_one
You can simply return the function call, instead of assigning it to a variable and returning the variable, like so:
def subject_list_fastafiles():
"""
Method Docstring
"""
return sorted_nicely([fastafile for fastafile in os.listdir(SUBJECT_PATH) if os.path.isfile(os.path.join(SUBJECT_PATH, fastafile))])
def query_list_fastafiles():
"""
Method Docstring
"""
return sorted_nicely([fastafile for fastafile in os.listdir(QUERY_PATH) if os.path.isfile(os.path.join(QUERY_PATH, fastafile))])
def filter_files_ending_with_one():
"""
The function filters the files end with 1
"""
return [name for name in SUBJECT_FASTAFILES if name[-1].isdigit() and not name[-2].isdigit() == 1]
This is an excerpt from this fabulous StackOverflow answer.
When your script is run by passing it as a command to the Python interpreter,
python myscript.py
all of the code that is at indentation level 0 gets executed. Functions and classes that are defined are, well, defined, but none of their code gets run. Unlike other languages, there's no function that gets run automatically - the main() function is implicitly all the code at the top level.main()
In this case, the top-level code is an block. if is a built-in variable which evaluates to the name of the current module. However, if a module is being run directly (as in __name__ above), then myscript.py instead is set to the string __name__. Thus, you can test whether your script is being run directly or being imported by something else by testing"__main__"
if __name__ == "__main__":
...
If your script is being imported into another module, its various function and class definitions will be imported and its top-level code will be executed, but the code in the then-body of the clause above won't get run as the condition is not met.if
"""
Module Docstring (A description of your program goes here)
"""
import os
import re
SUBJECT_PATH = "/Users/catuf/Desktop/subject_fastafiles/"
QUERY_PATH = "/Users/catuf/Desktop/query_fastafiles"
def sorted_nicely(list_):
"""
Sort the given iterable in the way that humans expect. https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/
"""
convert = lambda text: int(text) if text.isdigit() else text
alphanum_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key)]
return sorted(list_, key=alphanum_key)
def subject_list_fastafiles():
"""
Method Docstring
"""
return sorted_nicely([fastafile for fastafile in os.listdir(SUBJECT_PATH) if os.path.isfile(os.path.join(SUBJECT_PATH, fastafile))])
def query_list_fastafiles():
"""
Method Docstring
"""
return sorted_nicely([fastafile for fastafile in os.listdir(QUERY_PATH) if os.path.isfile(os.path.join(QUERY_PATH, fastafile))])
def filter_files_ending_with_one():
"""
The function filters the files end with 1
"""
return [name for name in SUBJECT_FASTAFILES if name[-1].isdigit() and not name[-2].isdigit() == 1]
if __name__ == '__main__':
SUBJECT_FASTAFILES = subject_list_fastafiles()
QUERY_FASTAFILES = query_list_fastafiles()
SUBJECT_FILES_ENDING_WITH_ONE = filter_files_ending_with_one()
One can use natsort lib (pip install natsort. It should look simple too.
[! This works, at least tested for versions 5.5 and 7.1 (current)]from natsort import natsorted
image_list = Path('./pages').glob('*.jpg')
image_list = natsorted(image_list, key=str)
# Or convert list of paths to list of string and (naturally)sort it, then convert back to list of paths
image_list = [Path(p) for p in natsorted([str(p) for p in image_list ])]
You're reading your paths from a file with a path per line, but you didn't strip the newlines from each string, so you're looking for files that include a newline character, . As your error message notes, the invalid path was:\n
'C:\\Test\\Project1\n' <-- Single slash followed by n; directories are double slash separated
Strip them off as you read, and you'll be fine:
for line in infile:
line = line.rstrip("\r\n") # Removes all trailing carriage returns and newlines
if line[0] == "R":
Expansion of my comment: Why put the API to extra work parsing and testing against a filter pattern when you could just... not?
is better when you need to make use of the filtering feature and the filter is simple and string-based, as it simplifies the work. Sure, hand-writing simple matches (filtering glob via iterdir instead of if path.endswith('.txt'):) might be more efficient than the regex based pattern matching glob('*.txt') hides, but it's generally not worth the trouble of reinventing the wheel given that disk I/O is orders of magnitude slower.glob
But if you don't need the filtering functionality at all, don't use it. is gaining you nothing in terms of code simplicity or functionality, and hurting performance, so just use glob.iterdir