简体   繁体   English

使用 pyspark 或 pandas 交换列中的单词

[英]swap the words in a column using pyspark or pandas

I have a dataframe where I want to swap or reverse the order of words in a column per ID.我有一个数据框,我想在其中交换或颠倒每个 ID 列中的单词顺序。

Input data:输入数据:

+----------+-----+-------------+
| date     | ID  | words       |
+----------+-----+-------------+
| 09-01-20 | xyz | pixel pearl |
| 09-01-20 | xyz | place order |
| 09-01-20 | xyz | current pro |
| 09-01-20 | xyz | order place |
| 09-01-20 | abc | hello there |
| 09-01-20 | abc | there hello |
| 09-01-20 | abc | very good   |
| 09-01-20 | abc | order place |

Below is the desired result where the 'order place' has been reversed in 4th row.以下是第 4 行中“订单位置”已反转的所需结果。 The 1st and 3rd rows remain unchanged since no reverse was present within the same ID 'xyz'.第 1 行和第 3 行保持不变,因为在同一 ID 'xyz' 中不存在反向。 Then for ID 'abc' the order of words will change from 'there hello' to 'hello there', but will do nothing with 'order place'.然后对于 ID 'abc',单词的顺序将从 'there hello' 变为 'hello there',但不会对 'order place' 做任何事情。

+----------+-----+-------------+
| date     | ID  | words       |
+----------+-----+-------------+
| 09-01-20 | xyz | pixel pearl |
| 09-01-20 | xyz | place order |
| 09-01-20 | xyz | current pro |
| 09-01-20 | xyz | place order |
| 09-01-20 | abc | hello there |
| 09-01-20 | abc | hello there |
| 09-01-20 | abc | very good   |
| 09-01-20 | abc | order place |

Thanks!谢谢! and much appreciated!非常感谢!

In pandas try this:在熊猫中试试这个:

df['words'] = df.groupby(df['words'].apply(lambda x: ', '.join(set(x.split(' ')))))['words']\
  .transform('first')

Output:输出:

       date   ID        words
0  09-01-20  xyz  pixel pearl
1  09-01-20  xyz  place order
2  09-01-20  xyz  current pro
3  09-01-20  xyz  place order
4  09-01-20  abc  hello there
5  09-01-20  abc  hello there
6  09-01-20  abc    very good
7  09-01-20  abc  place order

Details, first create groups using split and set to get like words together no matter the order.详细信息,首先使用 split 和 set 创建组,无论顺序如何,都可以将类似的单词放在一起。 Then, in each group, transform the value in 'words' to get the first occurrence of 'words'.然后,在每组中,转换 'words' 中的值以获得第一次出现的 'words'。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM