[英]How to split by range of characters pandas dataframe string into separate rows
[英]How to split a string in a pandas dataframe into bigrams that can then exploded into new rows?
尝试使用zip
, apply
并explode
:
df.Name = df.Name.str.split()
df.Name.apply(lambda x: tuple(zip(x,x[1:]))).explode().map(lambda x: f"{x[0]} {x[1]}")
或者
使用列表理解:
df2 = pd.Series([ f"{a} {b}" for val in df.Name for (a,b) in (zip(val,val[1:]))])
0 John Doe
1 John Doe
1 Doe Mike
1 Mike Smith
2 John Doe
2 Doe Mike
2 Mike Smith
2 Smith Steve
2 Steve Johnson
3 Smith Mike
3 Mike J.
3 J. Doe
3 Doe Johnson
3 Johnson Steve
4 Steve J.
4 J. M
4 M Smith
Name: Name, dtype: object
编辑:
第二部分:
df2 = pd.DataFrame([ [idx+1, f"{a} {b}"] for idx,val in enumerate(df.Name) for (a,b) in (zip(val,val[1:]))], columns=['ID', 'Names'])
bigrams = [[id, ' '.join(b)] for id, l in zip(df['ID'].tolist(), df['Name'].tolist()) for b in zip(l.split(" ")[:-1], l.split(" ")[1:])]
bigrams_df = pd.DataFrame(bigrams, columns = ['ID','Name'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.