[英]Splitting data into alternating groups in python 2.7
day city temperature windspeed event
2017-01-01 new york 32 6 Rain
2017-01-02 new york 36 7 Sunny
2017-01-03 new york 28 12 Snow
2017-01-04 new york 33 7 Sunny
2017-01-05 new york 31 7 Rain
2017-01-06 new york 33 5 Sunny
2017-01-07 new york 27 12 Rain
2017-01-08 new york 23 7 Rain
2017-01-01 mumbai 90 5 Sunny
2017-01-02 mumbai 85 12 Fog
2017-01-03 mumbai 87 15 Fog
2017-01-04 mumbai 92 5 Rain
2017-01-05 mumbai 89 7 Sunny
2017-01-06 mumbai 80 10 Fog
2017-01-07 mumbai 85 9 Sunny
2017-01-08 mumbai 89 8 Rain
2017-01-01 paris 45 20 Sunny
2017-01-02 paris 50 13 Cloudy
2017-01-03 paris 54 8 Cloudy
2017-01-04 paris 42 10 Cloudy
2017-01-05 paris 43 20 Sunny
2017-01-06 paris 48 4 Cloudy
2017-01-07 paris 40 14 Rain
2017-01-08 paris 42 15 Cloudy
2017-01-09 paris 53 8 Sunny
The above shows the .txt file. 上面显示了.txt文件。
My goal is to create 4 groups as evenly distributed as possible, containing all the cities, meaning that each group has 'new york','mumbai','paris'. 我的目标是创建4个尽可能均匀分布的组,其中包含所有城市,这意味着每个组都有“纽约”,“孟买”,“巴黎”。
Since there are 25 data, 3 groups will have 6 lines while 1 group will have 7 lines. 由于有25个数据,所以3组将有6行,而1组将有7行。
What I have in mind right now is that, since the data are already sorted by their city, I can read the text file lines by lines and then for each line, i will append it to 4 groups (G1-G4) in an alternating pattern. 我现在要记住的是,由于数据已经按其城市进行了排序,因此我可以逐行读取文本文件行,然后针对每一行,将其交替添加到4个组(G1-G4)中图案。 Meaning to say, the first line, it will append it to G1, then 2nd line to G2, 3rd to G3, 4th to G4 , 5th will append back to G1, 6th append to G2 and so on.
意思是说,第一行将其添加到G1,然后第二行添加到G2,第三行添加到G3,第四行添加到G4,第五行将添加回到G1,第六行添加到G2,依此类推。 This can ensure that all the groups have all the 3 cities.
这样可以确保所有组都具有全部3个城市。
Is it possible to code in this way? 是否可以通过这种方式进行编码?
Expected result: 预期结果:
G1: Row/Line 1 , Row 5, Row 9, G1:第1行/第1行,第5行,第9行
G2: Row 2, Row 6, Row 10, G2:第2行,第6行,第10行,
G3: Row 3, Row 7, Row 11, G3:第3行,第7行,第11行
G4: Row 4, Row 8, Row 12, and so on. G4:第4行,第8行,第12行,依此类推。
Since your input is already sorted, you can split the string into a list and then slice them using a step of 4: 由于您的输入已经排序,因此可以将字符串拆分为列表,然后使用4的步骤将它们切成薄片:
data = ''' 2017-01-01 new york 32 6 Rain
2017-01-02 new york 36 7 Sunny
2017-01-03 new york 28 12 Snow
2017-01-04 new york 33 7 Sunny
2017-01-05 new york 31 7 Rain
2017-01-06 new york 33 5 Sunny
2017-01-07 new york 27 12 Rain
2017-01-08 new york 23 7 Rain
2017-01-01 mumbai 90 5 Sunny
2017-01-02 mumbai 85 12 Fog
2017-01-03 mumbai 87 15 Fog
2017-01-04 mumbai 92 5 Rain
2017-01-05 mumbai 89 7 Sunny
2017-01-06 mumbai 80 10 Fog
2017-01-07 mumbai 85 9 Sunny
2017-01-08 mumbai 89 8 Rain
2017-01-01 paris 45 20 Sunny
2017-01-02 paris 50 13 Cloudy
2017-01-03 paris 54 8 Cloudy
2017-01-04 paris 42 10 Cloudy
2017-01-05 paris 43 20 Sunny
2017-01-06 paris 48 4 Cloudy
2017-01-07 paris 40 14 Rain
2017-01-08 paris 42 15 Cloudy
2017-01-09 paris 53 8 Sunny'''
lines = data.splitlines()
groups = [lines[i::4] for i in range(4)]
for g in groups:
print(g)
This outputs: 输出:
[' 2017-01-01 new york 32 6 Rain', ' 2017-01-05 new york 31 7 Rain', ' 2017-01-01 mumbai 90 5 Sunny', ' 2017-01-05 mumbai 89 7 Sunny', ' 2017-01-01 paris 45 20 Sunny', ' 2017-01-05 paris 43 20 Sunny', ' 2017-01-09 paris 53 8 Sunny']
[' 2017-01-02 new york 36 7 Sunny', ' 2017-01-06 new york 33 5 Sunny', ' 2017-01-02 mumbai 85 12 Fog', ' 2017-01-06 mumbai 80 10 Fog', ' 2017-01-02 paris 50 13 Cloudy', ' 2017-01-06 paris 48 4 Cloudy']
[' 2017-01-03 new york 28 12 Snow', ' 2017-01-07 new york 27 12 Rain', ' 2017-01-03 mumbai 87 15 Fog', ' 2017-01-07 mumbai 85 9 Sunny', ' 2017-01-03 paris 54 8 Cloudy', ' 2017-01-07 paris 40 14 Rain']
[' 2017-01-04 new york 33 7 Sunny', ' 2017-01-08 new york 23 7 Rain', ' 2017-01-04 mumbai 92 5 Rain', ' 2017-01-08 mumbai 89 8 Rain', ' 2017-01-04 paris 42 10 Cloudy', ' 2017-01-08 paris 42 15 Cloudy']
I only keep row index for easy explanation 我只保留行索引以便于说明
rows = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
Then you can use slicing 然后可以使用切片
G1, G2, G3, G4 = [rows[i::4] for i in range(4)]
The results will be 结果将是
G1 == [1, 5, 9, 13, 17, 21, 25]
G2 == [2, 6, 10, 14, 18, 22]
G3 == [3, 7, 11, 15, 19, 23]
G4 == [4, 8, 12, 16, 20, 24]]
You can use pandas
and some math operations to replicate your groups. 您可以使用
pandas
和一些数学运算来复制您的组。
n, r = df.shape[0] // 4, df.shape[0] % 4
df['group'] = [1,2,3,4]*n + [1,2,3,4][:r]
day city temperature windspeed event group
0 2017-01-01 new york 32 6 Rain 1
1 2017-01-02 new york 36 7 Sunny 2
2 2017-01-03 new york 28 12 Snow 3
3 2017-01-04 new york 33 7 Sunny 4
4 2017-01-05 new york 31 7 Rain 1
5 2017-01-06 new york 33 5 Sunny 2
6 2017-01-07 new york 27 12 Rain 3
7 2017-01-08 new york 23 7 Rain 4
8 2017-01-01 mumbai 90 5 Sunny 1
9 2017-01-02 mumbai 85 12 Fog 2
10 2017-01-03 mumbai 87 15 Fog 3
11 2017-01-04 mumbai 92 5 Rain 4
12 2017-01-05 mumbai 89 7 Sunny 1
13 2017-01-06 mumbai 80 10 Fog 2
14 2017-01-07 mumbai 85 9 Sunny 3
15 2017-01-08 mumbai 89 8 Rain 4
16 2017-01-01 paris 45 20 Sunny 1
17 2017-01-02 paris 50 13 Cloudy 2
18 2017-01-03 paris 54 8 Cloudy 3
19 2017-01-04 paris 42 10 Cloudy 4
20 2017-01-05 paris 43 20 Sunny 1
21 2017-01-06 paris 48 4 Cloudy 2
22 2017-01-07 paris 40 14 Rain 3
23 2017-01-08 paris 42 15 Cloudy 4
24 2017-01-09 paris 53 8 Sunny 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.