![](/img/trans.png)
[英]Copy and Rename Files from sub-directories to different sub-directories based on two lists?
[英]create sub-directories and files from pandas dataframe
手头有这个 dataframe:
data = {'user': [7, 7, 7, 7, 7, 7, 7, 11, 11, 11],
'session_id': [15, 15, 15, 15, 31, 31, 31, 43, 43, 43],
'logtime': ['2016-04-13 07:58:40','2016-04-13 07:58:41','2016-04-13 07:58:42',
'2016-04-13 07:58:43','2016-04-01 20:29:37','2016-04-01 20:29:42',
'2016-04-01 20:29:47','2016-03-30 06:21:59','2016-03-30 06:22:04',
'2016-03-30 06:22:09'],
'lat': [41.1872084,41.1870716,41.1869719,41.1868664,41.1471521,
41.1472466,41.1473038,41.2372125,41.2371444,41.2369725],
'lon': [-8.6038931,-8.6037318,-8.6036908,-8.6036423,-8.5878757,
-8.5874314,-8.586632,-8.6720773,-8.6721269,-8.6718833]}
d = pd.DataFrame(data)
d
user session_id logtime lat lon
0 7 15 2016-04-13 07:58:40 41.187208 -8.603893
1 7 15 2016-04-13 07:58:41 41.187072 -8.603732
2 7 15 2016-04-13 07:58:42 41.186972 -8.603691
3 7 15 2016-04-13 07:58:43 41.186866 -8.603642
4 7 31 2016-04-01 20:29:37 41.147152 -8.587876
5 7 31 2016-04-01 20:29:42 41.147247 -8.587431
6 7 31 2016-04-01 20:29:47 41.147304 -8.586632
7 11 43 2016-03-30 06:21:59 41.237212 -8.672077
8 11 43 2016-03-30 06:22:04 41.237144 -8.672127
9 11 43 2016-03-30 06:22:09 41.236973 -8.671883
我想:
为每个用户创建一个子目录(在当前工作目录中)。
在每个用户的子目录中,我将为该用户的每个 session 创建 1 个CSV
文件。
写入每个文件、会话的日志时间logtime, lat, lon
(没有 session ID),以file1.csv, file2.csv
等格式命名这些文件。
然后是下一个用户,直到所有用户。
预期 output
这样最终的目录结构和文件内容就是这样的形式(显示文件内容):
Data/
├── 11
│ └── file1.csv
| logtime,lat,lon
| 2016-03-30 06:21:59,41.2372125,-8.6720773
| 2016-03-30 06:22:04,41.2371444,-8.6721269
| 2016-03-30 06:22:09,41.2369725,-8.6718833
└── 7
├── file1.csv
| logtime,lat,lon
| 2016-04-13 07:58:40,41.187208,-8.603893
| 2016-04-13 07:58:41,41.187072,-8.603732
| 2016-04-13 07:58:42,41.186972,-8.603691
| 2016-04-13 07:58:43,41.186866,-8.603642
└── file2.csv
logtime,lat,lon
2016-04-01 20:29:37,41.147152,-8.587876
2016-04-01 20:29:42,41.147247,-8.587431
2016-04-01 20:29:47,41.147304,-8.586632
这可以通过os.makedirs
和groupby
来完成:
import os
# make the data folder if needed, change the path if needed
base_folder = '/Data'
os.makedirs(base_folder, exist_ok=True)
for (user_id,sess_id), data in df.groupby(['user', 'session_id']):
user_folder = f'{base_folder}/{user_id}'
os.makedirs(user_folder, exist_ok=True)
filename = f'{user_fodler}/file_{session_id}.csv'
data.drop(['user', 'session_id'], axis=1).to_csv(filename, index=False)
请注意,这会将文件保存在session_id
下。 如果你想随心所欲地命名,那么你可以做两个 groupby; 像这样的东西:
for user_id, user_data in df.groupby('user'):
user_folder = f'{base_folder}/{user_id}'
os.makedirs(user_folder, exist_ok=True)
for file_id, (sess_id, data) in user_data.groupby('session_id'):
filname = f'{user_folder}/file_{file_id}.csv'
....
另一种可能的解决方案:
# Create folders, assuming current working directory as root
for folder in d['user'].unique():
os.makedirs(str(folder), exist_ok=True)
((d.groupby('user')
.apply(lambda x: (x.assign(id = x.groupby('session_id').ngroup()+1))))
.groupby(['user', 'session_id'])
.apply(lambda y: y.iloc[:, 2:(len(y.columns)-1)]
.to_csv(os.path.join(
os.getcwd(), str(y['user'].unique()[0]),
f'file{str(y.id.unique()[0])}.csv'), index=False)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.