![](/img/trans.png)
[英]Merge two python pandas data frames of different length but keep all rows in output data frame
[英]merge Two data frames of different size in python pandas
我有这样的数据框:df1:
Id name Checksum
2001 A e882
2002 B 2884
2002 C ee12,ee84
2003 D ee23
2004 E ee42,ee43
2006 F 2884,2993,3884,3855
2006 G 344,122,288
我想要这样的输出:
Id name Checksum
2001 A e882
2002 B 2884
2002 C ee12
2002 C ee84
2003 D ee23
2004 E ee42
2004 E ee43
2006 F 2884
2006 F 2993
2006 F 3884
2006 F 3855
2006 G 344
2006 G 122
2006 G 288
我想像上面一样创建新的数据框
我如何在python熊猫中做到这一点?
您可以将str.split
用于list
,然后将len
作为length
。
最后创建新的DataFrame
的constructor
与numpy.repeat
和numpy.concatenate
:
s = df['Checksum'].str.split(',')
print (s)
0 [e882]
1 [2884]
2 [ee12, ee84]
3 [ee23]
4 [ee42, ee43]
5 [2884, 2993, 3884, 3855]
6 [344, 122, 288]
Name: Checksum, dtype: object
l = s.str.len()
print (l)
0 1
1 1
2 2
3 1
4 2
5 4
6 3
Name: Checksum, dtype: int64
cols = ['Id','name']
df = pd.DataFrame({x : np.repeat(df[x].values, l) for x in cols})
df['Checksum'] = np.concatenate(s)
df = df.reindex_axis(df.columns, axis=1)
print (df)
Id name Checksum
0 2001 A e882
1 2002 B 2884
2 2002 C ee12
3 2002 C ee84
4 2003 D ee23
5 2004 E ee42
6 2004 E ee43
7 2006 F 2884
8 2006 F 2993
9 2006 F 3884
10 2006 F 3855
11 2006 G 344
12 2006 G 122
13 2006 G 288
与chain.from_iterable
交替使用:
from itertools import chain
s = df['Checksum'].str.split(',')
l = s.str.len()
cols = ['Id','name']
df = pd.DataFrame({x : np.repeat(df[x].values, l) for x in cols})
df['Checksum'] = list(chain.from_iterable(s))
df = df.reindex_axis(df.columns, axis=1)
print (df)
Id name Checksum
0 2001 A e882
1 2002 B 2884
2 2002 C ee12
3 2002 C ee84
4 2003 D ee23
5 2004 E ee42
6 2004 E ee43
7 2006 F 2884
8 2006 F 2993
9 2006 F 3884
10 2006 F 3855
11 2006 G 344
12 2006 G 122
13 2006 G 288
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.