简体   繁体   English

如何创建一个以txt作为Linux降序子目录的txt文件?

[英]how to create a txt file with columns being the descending sub-directories in Linux?

My data follow the structure: 我的数据遵循以下结构:

../data/study_ID/FF_Number/Exam_Number/date, ../data/study_ID/FF_Number/Exam_Number/date,

Where the data dir contains 176 participants` sub-directories. 数据目录中包含176个参与者的子目录。 The ID number represents the participants ID, and each of the following sub-directories represents some experimental number. ID号代表参与者ID,以下每个子目录代表一些实验编号。 I want to create a txt file with one line per participants and the following columns: study ID, FF_number, Exam_Number and date. 我想创建一个txt文件,其中每个参与者一行,以下几列:研究ID,FF_number,Exam_Number和日期。

However it gets a bit more complicated as I want to divide the participants into chunks of ~ 15-20 ppt per chunk for the following analysis. 但是,这变得更加复杂了,因为我想将参与者分成每块〜15-20 ppt的块进行以下分析。

Any suggestions? 有什么建议么? Cheers. 干杯。

Hmm, nobody? 嗯,没人吗?

You should redirect output of "find" command, consider switches -type d, and -maxdepth, and probably parse it with sed, replacing "/" with "spaces". 您应该重定向“ find”命令的输出,考虑开关-type d和-maxdepth,并可能用sed解析它,用“ spaces”替换“ /”。 Maybe piping through "cut" and "column -t" commands, and "sort" and "uniq" will be useful. 也许通过“ cut”和“ column -t”命令进行管道传递,“ sort”和“ uniq”将很有用。 Do names, except FF and ID, contain spaces, or special characters eg related to names of participants? 除FF和ID外,其他名称中是否包含空格或特殊字符(例如,与参与者的名称有关)?

It should be possible to get a TXT with "one liner" and a few pipes. 带有“一个衬管”和几个管道的TXT应该是可能的。

You should try, and post first results of your work on this :) 您应该尝试一下,并发布有关此工作的初步结果:)

EDIT: Alright, I created for me a structure with several thousands of directories and subdirectories numbered by participant, by exam number etc., which look like this ( maybe it's not identical with what you have, but don't worry ). 编辑:好的,我为我创建了一个结构,其中包含数千个目录和子目录,这些目录和子目录由参与者,考试编号等编号,看起来像这样(也许与您所拥有的不完全相同,但是不用担心)。 Studies are numbered from 5 to 150, FF from 45 to 75, and dates from 2012_01_00 to 2012_01_30 - which makes really huge quantity of directories in total. 研究编号从5到150,FF从45到75,日期从2012_01_00到2012_01_30-这实际上使目录总数非常庞大。

/Users/pwadas/bzz/data
/Users/pwadas/bzz/data/study_005
/Users/pwadas/bzz/data/study_005/05_Num
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_00
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_01
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_02
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_03
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_04
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_05
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_06
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_07
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_08
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_09
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_10
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_11
/Users/pwadas/bzz/data/study_005/05_Num/45_Exam/2012_01_12

Now, I want ( quote ) "txt file with one line per participants and the following columns: study ID, FF_number, Exam_Number and date." 现在,我想(引用)“每个参与者一行一行的txt文件,其以下各列:研究ID,FF_number,Exam_Number和日期。

So I use the following one-liner: 因此,我使用以下一线:

find /Users/pwadas/bzz/data -type d | head -n 5000 |cut -d'/' -f5-7  | uniq |while read line; do echo -n "$line: " && ls -d /Users/pwadas/bzz/$line/*Exam/* | perl -0pe 's/.*2012/2012/g;s/\n/ /g' && echo ; done  > out.txt

and here is the output ( a few first lines from out.txt ). 这是输出(out.txt的前几行)。 Lines are very long, I cutted it on output for first 80-90 characters: 行很长,我在输出的前80-90个字符处剪切了它:

dtpwmbp:data pwadas$ cat out.txt |cut -c1-90
data: 
data/study_005: 
data/study_005/05_Num: 2012_01_00 2012_01_01 2012_01_02 2012_01_03 2012_01_04 2012_01_05 2
data/study_005/06_Num: 2012_01_00 2012_01_01 2012_01_02 2012_01_03 2012_01_04 2012_01_05 2
data/study_005/07_Num: 2012_01_00 2012_01_01 2012_01_02 2012_01_03 2012_01_04 2012_01_05 2
data/study_005/08_Num: 2012_01_00 2012_01_01 2012_01_02 2012_01_03 2012_01_04 2012_01_05 2
dtpwmbp:data pwadas$ 

I hope this will help you a little, and you'll be able to modify it according to your needs and patterns, and that seems to be all I can do :) You should analyze the one liner, especially "cut" command, and perl-regex part, which removes newlines and full directory name from "ls" output. 我希望这会对您有所帮助,并且您可以根据自己的需要和模式进行修改,这似乎就是我所能做的一切:)您应该分析一个衬板,尤其是“ cut”命令,并且perl-regex部分,该部分从“ ls”输出中删除换行符和完整目录名称。 This is probably fair from optimal, but beautifying is not the point here, I guess :) So, good luck :) PS. 从最佳角度来看,这可能是公平的,但是我想,美化不是重点,:)所以,祝你好运:) PS。 "head" command limits output for N first lines, you'll probably want to skip out | “ head”命令限制了N行的输出,您可能想跳过| head .. | 头.. | part. 部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM