[英]How do I manipulate pandas DataFrame with a column that has rows with strings that needs spliting
I have a pandas data frame similar to table A and I will like to get table B. What will be the easiest way to do this using pandas? 我有一个类似于表A的熊猫数据框,我想获取表B。用熊猫做这件事的最简单方法是什么?
Thanks 谢谢
table A(ColofInt has varying length of string to parse out): 表A(ColofInt具有不同长度的字符串要解析):
ColA ColB ColofInt ColD
A B StrA;StrB;StrC; 1
A B StrD;StrB;StrC;StrD; 3
A B StrC;StrB; 2
A B StrB; 5
table B: 表B:
ColA ColB ColofInt1 ColofInt2 ColofInt2 ColofInt3 ColD
A B StrA StrB StrC 1
A B StrD StrB StrC StrD 3
A B StrC StrB 2
A B StrB 5
Assuming a file 'tableA.csv' containing the following: 假设文件“ tableA.csv”包含以下内容:
ColA,ColB,ColofInt,ColD
A,B,StrA;StrB;StrC;,1
A,B,StrD;StrB;StrC;StrD;,3
A,B,StrC;StrB;,2
A,B,StrB;,5
Then: 然后:
import pandas as pd
tableA= pd.read_csv('tableA.csv')
This generates a dataframe with your new columns 这将使用您的新列生成一个数据框
data_aux = pd.DataFrame(list(tableA.ColofInt.str.split(';').apply(lambda x: x[:-1])))
cols = []
for e in data_aux .columns:
cols.append('ColofInt' + str(e+1))
data_aux .columns = cols
Heres 'data_aux': 继承人“ data_aux”:
ColofInt1 ColofInt2 ColofInt3 ColofInt4
0 StrA StrB StrC None
1 StrD StrB StrC StrD
2 StrC StrB None None
3 StrB None None None
And this joins the dataframes, dropping the original column. 并且这将连接数据框,并删除原始列。
tableB = pd.concat([tableA,data_aux],axis=1).drop('ColofInt',axis=1)
Here's the resulting 'tableB': 这是生成的“ tableB”:
ColA ColB ColD ColofInt1 ColofInt2 ColofInt3 ColofInt4
0 A B 1 StrA StrB StrC None
1 A B 3 StrD StrB StrC StrD
2 A B 2 StrC StrB None None
3 A B 5 StrB None None None
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.