[英]Transpose the data in a column every nth rows in PANDAS
For a research project, I need to process every individual's information from the website into an excel file. 对于研究项目,我需要将网站上每个人的信息都处理成一个excel文件。 I have copied and pasted everything I need from the website onto a single column in an excel file, and I loaded that file using PANDAS.
我已经将网站上需要的所有内容复制并粘贴到excel文件的单个列中,然后使用PANDAS加载了该文件。 However, I need to present each individual's information horizontally instead of vertically like it is now.
但是,我需要水平显示每个人的信息,而不是像现在这样垂直显示信息。 For example, this is what I have right now.
例如,这就是我现在所拥有的。 I only have one column of unorganized data.
我只有一列无组织的数据。
df= pd.read_csv("ior work.csv", encoding = "ISO-8859-1")
Data: 数据:
0 Andrew
1 School of Music
2 Music: Sound of the wind
3 Dr. Seuss
4 Dr.Sass
5 Michelle
6 School of Theatrics
7 Music: Voice
8 Dr. A
9 Dr. B
I want transpose every 5 lines to organize the data into this organizational format; 我想每5行换位以将数据组织成这种组织格式; the labels below are labels of the columns.
下面的标签是列的标签。
Name School Music Mentor1 Mentor2
What is the most efficient way to do this? 最有效的方法是什么?
If no data are missing, you can use numpy.reshape
: 如果没有数据丢失,可以使用
numpy.reshape
:
print (np.reshape(df.values,(2,5)))
[['Andrew' 'School of Music' 'Music: Sound of the wind' 'Dr. Seuss'
'Dr.Sass']
['Michelle' 'School of Theatrics' 'Music: Voice' 'Dr. A' 'Dr. B']]
print (pd.DataFrame(np.reshape(df.values,(2,5)),
columns=['Name','School','Music','Mentor1','Mentor2']))
Name School Music Mentor1 Mentor2
0 Andrew School of Music Music: Sound of the wind Dr. Seuss Dr.Sass
1 Michelle School of Theatrics Music: Voice Dr. A Dr. B
More general solution with generating length
of new array
by shape
divide by number of columns: 通过按
shape
除以列数生成新array
的length
的更通用的解决方案:
print (pd.DataFrame(np.reshape(df.values,(df.shape[0] / 5,5)),
columns=['Name','School','Music','Mentor1','Mentor2']))
Name School Music Mentor1 Mentor2
0 Andrew School of Music Music: Sound of the wind Dr. Seuss Dr.Sass
1 Michelle School of Theatrics Music: Voice Dr. A Dr. B
Thank you piRSquared for another solution: 谢谢piRSquared提供了另一个解决方案:
print (pd.DataFrame(df.values.reshape(-1, 5),
columns=['Name','School','Music','Mentor1','Mentor2']))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.