简体   繁体   English

python pandas:将逗号分隔的列拆分为新列 - 每个值一个

[英]python pandas: split comma-separated column into new columns - one per value

I have a dataframe like this:我有一个这样的数据框:

data = np.array([["userA","event2, event3"],
            ['userB',"event3, event4"],
            ['userC',"event2"]])

data = pd.DataFrame(data)

        0         1
0   userA   "event2, event3"
1   userB   "event3, event4"
2   userC   "event2"

now I would like to get a dataframe like this:现在我想得到一个这样的数据框:

       0    event2      event3      event4
0   userA     1           1
1   userB                 1           1
2   userC     1

can anybody help please?有人可以帮忙吗?

It seems you need get_dummies with replace 0 to empty string s:看来你需要get_dummies用替换0来空string s:

df = data[[0]].join(data[1].str.get_dummies(', ').replace(0, ''))
print (df)
       0 event2 event3 event4
0  userA      1      1       
1  userB             1      1
2  userC      1              

Detail :详情

print (data[1].str.get_dummies(', '))
   event2  event3  event4
0       1       1       0
1       0       1       1
2       1       0       0

If you have a lot of features (words), then it makes sense to use sparse matrices in order to use memory much more efficiently:如果你有很多特征(词),那么使用稀疏矩阵来更有效地使用内存是有意义的:

In [120]: from sklearn.feature_extraction.text import CountVectorizer

In [121]: cvect = CountVectorizer()

In [122]: data = data.join(pd.SparseDataFrame(cvect.fit_transform(data.pop(1)),
                                              data.index,
                                              cvect.get_feature_names(),
                                              default_fill_value=0))

In [123]: data
Out[123]:
       0  event2  event3  event4
0  userA       1       1       0
1  userB       0       1       1
2  userC       1       0       0

In [124]: data.memory_usage()
Out[124]:
Index     80
0         24
event2    16
event3    16
event4     8
dtype: int64

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas 将逗号分隔的字符串列拆分为两个单独的列,并向下分解行 - Pandas split comma-separated string column into two separate columns, and explode rows down groupby逗号分隔值在单个DataFrame列python / pandas中 - groupby comma-separated values in single DataFrame column python/pandas 当使用 PySpark 在列中以逗号分隔时,如何将列拆分为它们自己的行? - How can I split columns to their own row when comma-separated in column using PySpark? SQL/Python:将“链接”表中的多个条目组合成一个逗号分隔的字符串,每个条目 - SQL/Python: Combine multiple entries from “linked” tables into one single comma-separated string per entry 熊猫数据框:当列值在逗号分隔的字符串列中时标记一行 - Pandas dataframe: Flag a row when column value is in comma-separated string column 将 Pandas 数据框列的所有行转换为逗号分隔的值,每个值都用单引号 - Convert all rows of a Pandas dataframe column to comma-separated values with each value in single quote 用逗号分隔逗号分隔的键值对 - split comma-separated key-value pairs with commas 将逗号分隔字符串的熊猫列转换为整数 - Converting pandas column of comma-separated strings into integers 如何在 pandas 的单个列中合并(逗号分隔的)行值? - How to combine (comma-separated) row values in a single column in pandas? 将逗号分隔字符串的 pandas 列转换为虚拟变量 - Converting pandas column of comma-separated strings into dummy variables
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM