简体   繁体   中英

Custom laravel artisan command processing large nested file structure and large dataset

I'm trying to construct a custom artisan command which cleans up my filesystem each day, there are a few steps to it and I cant figure out a way to do it without crashing the memory. There are thousands of folders and hundreds of thousands of users.

What i need to do is;

  1. Get all folders that are older than 50 days (based on folder structure below)
  2. From those collected folders, query the user id (which is the nested folder name below) and check if the data has been completed $user->isCompleted()
  3. If that users data is completed, then delete the directory which stores that users data

In my filesystem i have stored data for each user id based on day, so the filesystem looks like so;

data/2022-01-15/7482947
data/2022-01-15/7482946
data/2022-01-15/7482945
data/2022-01-16/2353234
data/2022-01-16/2353233
data/2022-01-16/2353232

The format is;

data/<date>/<user_id>

So far I have managed to return the folders which are older than 50 days, using the code below, but iam unsure of how to continue getting the nested folder and then querying the DB

collect(Storage::directories('data/'))
  ->filter(function ($directory) {
    $directoryDate = Str::after($directory, '/');

    if (! Carbon::parse($directoryDate)->lte(now()->subDays(50))) {
      return false;
    }

    return true;
  });

Any help would be greatly appreciated.

Something to think about.

Whenever I run into a situation where you are manipulating big data sets or memory-heavy stuff, I'll always try to split the heaviest logic into jobs.

So, in this case, ill make a top-level command that schedules multiple jobs that checks the user and queries the DB.

This gives you the ability to scale, and when using something like laravel horizon, it gives you the option to run multiple tasks simultaneously!

When making the job ill want to make sure that each job is a unique instance of the logic that I am running, so ill use the shouldBeUnique trait found on the job and pass the user id or the subdirectory to the unique() function.

Hopefully, this gives you some ideas!

  1. get all the folder's names older than 50 days

  2. get all the user ids from the folder name

  3. check-in DB whether the user is completed or not

  4. delete only completed users' folder

     collect(Storage::directories('data/')) ->filter(function ($directory) { $directoryDate = Str::after($directory, '/'); // check directory is older than 50 days if (Carbon::parse($directoryDate)->lte(now()->subDays(50))) { // get all user ids from directoryDate folder $userids = collect(Storage::directories('data/'.$directoryDate)) ->map(function ($userDirectory) use ($directoryDate) { return Str::after($userDirectory, 'data/'.$directoryDate.'/'); }); // check whether the user is completed or not. return only completed user ids onlys // instead using one query for one folder we are doing it in single query. it will reduce read operation in db $complete_duser_ids = User::where('is_complete',1)->wherein('id', $userids->all())->get()->pluck('id')->toArray(); // delete the user folder who completed foreach ($complete_duser_ids as $completed_user) { Storage::deleteDirectory('data/'.$directoryDate.'/'.$completed_user); } } });

we are fetching all the user folder ids and then we are checking in DB. the query will return only the completed user once we get that we are deleting it. you can use this logic on scheduled jobs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM