繁体   English   中英

如何操作数据框,以便我访问单元格内列表中的每个元素并根据另一列对它们进行分组?

[英]How can manipulate a dataframe such that i access every element in a list inside a cell and group them according to another column?

这可能会令人困惑,所以这是数据框前 ​​5 行的副本。

number  cap words
0   ['Ages', 'Online', 'Python', 'Coding', 'CoursesAdwwwcodetodaycoukLearn', 'Python', 'Live', 'Taught', 'Experts', 'Making', 'Coding', 'Fun', 'Courses', 'Summer', 'Weekly', 'EthosSimple', 'Low', 'Cost', 'PricingFAQAccess', 'Free', 'Content']
1   ['Become', 'Python', 'Programmer', 'Study', 'Python', 'Online', 'FreeAdwwwpythoninstituteorgLearn', 'Python', 'Become', 'Python', 'Certified', 'Take', 'Your', 'Career', 'Next', 'Level', 'Kostenfreie', 'Lernplattform', 'Tausende', 'Studenten', 'Lass', 'Dich', 'Highlights', 'Offering', 'SelfStudy', 'Courses', 'Free', 'Courses', 'Available', 'Flexible', 'DeadlinesResources', 'Free', 'Education', 'Platform', 'Get', 'Certification', 'About']
2   ['Python', 'For', 'Beginners', 'Pythonorgwwwpythonorg', 'Python', 'Its', 'NonProgrammers', 'Python', 'Programmers', 'Python', 'Frequently', 'Asked', 'Books']
3   ['People', 'Python', 'I', 'PythonIs', 'Python', 'Python']
4   ['PythonHighlevel', 'Created', 'Guido', 'Rossum', 'Pythons', 'WikipediaTyping', 'Duck', 'July', 'August', 'Guido', 'RossumOS', 'Linux', 'Windows', 'Vista', 'IDEsIDLEPyCharmMicrosoft', 'Visual', 'StudioSpyderEclipsePyDevPeople']
5   ['Welcome', 'PythonorgwwwpythonorgThe', 'Python', 'Programming', 'Language', 'Python', 'For', 'Beginners', 'Beginners', 'Guide', 'Python', 'Docs', 'Python', 'Books']
6   ['BeginnersGuide', 'Python', 'Wikiwikipythonorg', 'BeginnersGuide4', 'Jul', 'New', 'Python', 'This', 'Chinese']
7   ['Learn', 'Python', 'Codecademywwwcodecademycom', 'Python', 'By']
8   ['Python', 'Wikipediaenwikipediaorg', 'PythonprogramminglanguagePython', 'Created', 'Guido', 'Rossum', 'Pythons', 'History', 'Features', 'Syntax', 'Python', 'Developer', 'Python', 'Software', 'Foundation', 'Paradigm', 'Multiparadigm', 'Designed', 'Guido', 'Rossum', 'Typing', 'Duck']
9   ['Related']

我试图从单元格内的列表中解压缩每个单词,并将它们全部分组在它们的共享索引号下。

就这样

    0
0   Ages
0   Online
0   Python
0   Coding
0   CoursesAdwwwcodetodaycoukLearn
0   Python
0   Live
0   Taught
0   Experts
0   Making
0   Coding
0   Fun
0   Courses
0   Summer
0   Weekly
0   EthosSimple
0   Low
0   Cost
0   PricingFAQAccess
0   Free
0   Content

下面的 1 表示“Become”、“Python”、“Programmer”、“Study”、“Python”、“Online”等词...

我希望这很清楚。

谢谢

你可以使用explode

x = [['Ages', 'Online', 'Python', 'Coding', 'CoursesAdwwwcodetodaycoukLearn', 'Python', 'Live', 'Taught', 'Experts', 'Making', 'Coding', 'Fun', 'Courses', 'Summer', 'Weekly', 'EthosSimple', 'Low', 'Cost', 'PricingFAQAccess', 'Free', 'Content']
,['Become', 'Python', 'Programmer', 'Study', 'Python', 'Online', 'FreeAdwwwpythoninstituteorgLearn', 'Python', 'Become', 'Python', 'Certified', 'Take', 'Your', 'Career', 'Next', 'Level', 'Kostenfreie', 'Lernplattform', 'Tausende', 'Studenten', 'Lass', 'Dich', 'Highlights', 'Offering', 'SelfStudy', 'Courses', 'Free', 'Courses', 'Available', 'Flexible', 'DeadlinesResources', 'Free', 'Education', 'Platform', 'Get', 'Certification', 'About']
,['Python', 'For', 'Beginners', 'Pythonorgwwwpythonorg', 'Python', 'Its', 'NonProgrammers', 'Python', 'Programmers', 'Python', 'Frequently', 'Asked', 'Books']
,['People', 'Python', 'I', 'PythonIs', 'Python', 'Python']
,['PythonHighlevel', 'Created', 'Guido', 'Rossum', 'Pythons', 'WikipediaTyping', 'Duck', 'July', 'August', 'Guido', 'RossumOS', 'Linux', 'Windows', 'Vista', 'IDEsIDLEPyCharmMicrosoft', 'Visual', 'StudioSpyderEclipsePyDevPeople']
,['Welcome', 'PythonorgwwwpythonorgThe', 'Python', 'Programming', 'Language', 'Python', 'For', 'Beginners', 'Beginners', 'Guide', 'Python', 'Docs', 'Python', 'Books']
,['BeginnersGuide', 'Python', 'Wikiwikipythonorg', 'BeginnersGuide4', 'Jul', 'New', 'Python', 'This', 'Chinese']
,['Learn', 'Python', 'Codecademywwwcodecademycom', 'Python', 'By']
,['Python', 'Wikipediaenwikipediaorg', 'PythonprogramminglanguagePython', 'Created', 'Guido', 'Rossum', 'Pythons', 'History', 'Features', 'Syntax', 'Python', 'Developer', 'Python', 'Software', 'Foundation', 'Paradigm', 'Multiparadigm', 'Designed', 'Guido', 'Rossum', 'Typing', 'Duck']
,['Related']]

df = pd.DataFrame({
    'number': np.arange(10),
    'cap words' :pd.Series(x)
})

df.explode('cap words').reset_index(drop=True)

出去:

     number                       cap words
0         0                            Ages
1         0                          Online
2         0                          Python
3         0                          Coding
4         0  CoursesAdwwwcodetodaycoukLearn
..      ...                             ...
140       8                           Guido
141       8                          Rossum
142       8                          Typing
143       8                            Duck
144       9                         Related

[145 rows x 2 columns]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM