[英]Pandas, DataFrame: Splitting one column into multiple columns
I have the following DataFrame.我有以下数据帧。 I am wondering whether it is possible to break the
data
column into multiple columns.我想知道是否可以将
data
列分成多列。 Eg, from this:例如,从这个:
ID Date data 6 21/05/2016 A: 7, B: 8, C: 5, D: 5, A: 8 6 21/01/2014 B: 5, C: 5, D: 7 6 02/04/2013 A: 4, D:7 7 05/06/2014 C: 25 7 12/08/2014 D: 20 8 18/04/2012 A: 2, B: 3, C: 3, E: 5, B: 4 8 21/03/2012 F: 6, B: 4, F: 5, D: 6, B: 4
into this:进入这个:
ID Date data A B C D E F 6 21/05/2016 A: 7, B: 8, C: 5, D: 5, A: 8 15 8 5 5 0 0 6 21/01/2014 B: 5, C: 5, D: 7 0 5 5 7 0 0 6 02/04/2013 B: 4, D: 7, B: 6 0 10 0 7 0 0 7 05/06/2014 C: 25 0 0 25 0 0 0 7 12/08/2014 D: 20 0 0 0 20 0 0 8 18/04/2012 A: 2, B: 3, C: 3, E: 5, B: 4 2 7 3 0 5 0 8 21/03/2012 F: 6, B: 4, F: 5, D: 6, B: 4 0 8 0 6 0 11
I have tried this Split strings in tuples into columns, in Pandas , and this pandas: How do I split text in a column into multiple rows?我已经尝试过将元组中的字符串拆分为列,在 Pandas和这个熊猫:如何将列中的文本拆分为多行? but they are not working in my case.
但他们在我的情况下不起作用。
EDIT编辑
There is a bit of complexity the data
column has duplicate values for example in first row A
is repeated, and therefore these values are summed up under the A
column (please see second table). data
列具有重复值有点复杂,例如在第一行A
重复,因此这些值汇总在A
列下(请参阅第二个表)。
Here is a function that can convert the string to a dictionary and aggregate values based on the key;这是一个函数,可以将字符串转换为字典并根据键聚合值; After the conversion it will be easy to get the results with the
pd.Series
method:转换后,使用
pd.Series
方法很容易得到结果:
def str_to_dict(str1):
import re
from collections import defaultdict
d = defaultdict(int)
for k, v in zip(re.findall('[A-Z]', str1), re.findall('\d+', str1)):
d[k] += int(v)
return d
pd.concat([df, df['dictionary'].apply(str_to_dict).apply(pd.Series).fillna(0).astype(int)], axis=1)
df = pd.DataFrame([
[6, "a: 1, b: 2"],
[6, "a: 1, b: 2"],
[6, "a: 1, b: 2"],
[6, "a: 1, b: 2"],
], columns=['ID', 'dictionary'])
def str2dict(s):
split = s.strip().split(',')
d = {}
for pair in split:
k, v = [_.strip() for _ in pair.split(':')]
d[k] = v
return d
df.dictionary.apply(str2dict).apply(pd.Series)
Or:或者:
pd.concat([df, df.dictionary.apply(str2dict).apply(pd.Series)], axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.