[英]How can I split Pandas arrays into columns?
I'm trying to split array values to columns.我正在尝试将数组值拆分为列。
I've created a Google Colab notebook and you can find my code here .我创建了一个 Google Colab 笔记本,您可以在此处找到我的代码。
Here is a screenshot of the data (Hashtags):这是数据的屏幕截图(Hashtags):
Here is a representation of the data.这是数据的表示。
codes
1 [71020]
2 [77085]
3 [36415]
4 [99213, 99287]
5 [99233, 99233, 99233]
I want to split this arrays into different columns.我想将此数组拆分为不同的列。
To something like this (screenshot - Hashtags split to columns):对于这样的事情(屏幕截图 - 主题标签拆分为列):
Here is a representation of it.这是它的一个表示。
code_1 code_2 code_3
1 71020
2 77085
3 36415
4 99213 99287
5 99233 99233 99233
I tried the following code which I got form this Stack Overflow post , but it doesn't give the expected results:我尝试了从这个Stack Overflow 帖子中获得的以下代码,但它没有给出预期的结果:
df_hashtags_splitted = pd.DataFrame(df['hashtags'].tolist())
What am I doing wrong?我究竟做错了什么?
The reason is the lists are still stored as strings in the hashtags
column when you read them with read_csv
.原因是当您使用
read_csv
hashtags
中。 You can convert them upon reading of the data (follwing code taken from the Colab notebook):您可以在读取数据时转换它们(以下代码取自 Colab 笔记本):
import pandas as pd
from ast import literal_eval
url = "https://raw.githubusercontent.com/hashimputhiyakath/datasets/main/hashtags10.csv"
# Notice the added converter to turn strings into lists.
df = pd.read_csv(url, converters={'hashtags': literal_eval})
And then the solution you mentioned will work as expected.然后您提到的解决方案将按预期工作。
df_hashtags_splitted = pd.DataFrame(df['hashtags'].tolist(), index=df.index).add_prefix('hashtag_')
print(df_hashtags_splitted.head(10))
hashtag_0 hashtag_1 hashtag_2 hashtag_3 hashtag_4 hashtag_5 hashtag_6 hashtag_7 hashtag_8 hashtag_9 hashtag_10 hashtag_11
0 longcovid covidhelp None None None None None None None None None None
1 mumbai covid hospitalbeds covidemergency mahacovid oxygenbed mumbaicovid covid19indiahelp covidhelp covidresources None None
2 kawahcoffeeshop coffeelover kawah costarica puravida heredia oxygen None None None None None
3 lucknow mumbai hyderabad delhi verified covidresources covidhelp covid19indiahelp None None None None
4 oxygen None None None None None None None None None None None
5 covid19indiahelp mahara None None None None None None None None None None
6 oxygen amadoda None None None None None None None None None None
7 plasmadonordelhi plasmamumbai covid19indiahelp covidhelp covidemergency2021 None None None None None None None
8 oxygen conservation wilding rewilding environment sustainability restorative agriculture wildlife biodiversity water wildswim
9 covid verified mumbai oxygen covidemergency2021 covid19 covidhelp covidresources None None None None
Alternatively, to convert the lists to strings after you read the csv you can do:或者,要在阅读 csv 后将列表转换为字符串,您可以执行以下操作:
df['hashtags'] = df['hashtags'].map(literal_eval)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.