简体   繁体   English

如何用某些字符替换列的开头和结尾 python dataframe

[英]How to replace the start and end of a column with certain characters python dataframe

I have a dataframe that looks like this:我有一个看起来像这样的 dataframe:

 clients_x                 clients_y              coords_x               coords_y 
7110001002                7100019838    -23.63013,-46.704887  -23.657433,-46.744095   
7110001002                7100021875    -23.63013,-46.704887    -23.7729,-46.591366   
7110001002                0700245857    -23.63013,-46.704887      -23.7074,-46.5698 
[7110052941, 7110107795]  7100019838        -23.609,-46.6974  -23.657433,-46.744095
[7110052941, 7110107795]  7100021875        -23.609,-46.6974    -23.7729,-46.591366
[7110052941, 7110107795]  0700245857        -23.609,-46.6974       -23.7074,-46.569

What I want to do is for all values in clients_x column to start and end with "[ ]".我想要做的是让clients_x列中的所有值都以“[]”开头和结尾。 Therefore, my expected output is this one:因此,我预期的 output 就是这个:

 clients_x                 clients_y              coords_x               coords_y 
[7110001002]                7100019838    -23.63013,-46.704887  -23.657433,-46.744095   
[7110001002]                7100021875    -23.63013,-46.704887    -23.7729,-46.591366   
[7110001002]                0700245857    -23.63013,-46.704887      -23.7074,-46.5698 
[7110052941, 7110107795]  7100019838        -23.609,-46.6974  -23.657433,-46.744095
[7110052941, 7110107795]  7100021875        -23.609,-46.6974    -23.7729,-46.591366
[7110052941, 7110107795]  0700245857        -23.609,-46.6974       -23.7074,-46.569

To do that first I tried to do something like this:为此,我首先尝试做这样的事情:

df["clients_x"] = "[" + "df["clients_x"]" + "]"

However, by doing that will actually add "[ ]" at the beginning and end of each value, but for those rows that already have "[ ]" will duplicate them.但是,这样做实际上会在每个值的开头和结尾添加“[]”,但对于那些已经有“[]”的行,它们会重复它们。 The output is this one: output 是这个:

 clients_x                 clients_y              coords_x               coords_y 
[7110001002]                7100019838    -23.63013,-46.704887  -23.657433,-46.744095   
[7110001002]                7100021875    -23.63013,-46.704887    -23.7729,-46.591366   
[7110001002]                0700245857    -23.63013,-46.704887      -23.7074,-46.5698 
[[7110052941, 7110107795]]  7100019838        -23.609,-46.6974  -23.657433,-46.744095
[[7110052941, 7110107795]]  7100021875        -23.609,-46.6974    -23.7729,-46.591366
[[7110052941, 7110107795]]  0700245857        -23.609,-46.6974       -23.7074,-46.569

To avoid that issue I've tried using the following code where basically I want to add "[ ]" at the beginning and at the end of each value in the clients_x column that starts with a digit.为了避免这个问题,我尝试使用以下代码,基本上我想在以数字开头的clients_x列中每个值的开头和结尾添加“[]”。

df['clients_x'] = df['clients_x'].mask(df['clients_x'].astype(str).str.startswith(r'^\d'), f'[{df.clients_x}]')

However, the output that this line of code is generating is the same as my original dataframe.但是,这行代码生成的 output 和我原来的 dataframe 是一样的。 If anyone has any idea about how to approach this problem I would really appreciate your help.如果有人对如何解决此问题有任何想法,我将非常感谢您的帮助。

Use np.where -使用np.where -

df['clients_x'] = np.where(df['clients_x'].str.startswith('['), df['clients_x'], '[' + df['clients_x'] + ']')

Using df.where -使用df.where -

df['clients_x'].where(df['clients_x'].str.startswith('['), '[' + df['clients_x'] + ']')

Output Output

0               [7110001002]
1               [7110001002]
2               [7110001002]
3    [7110052941,7110107795]
4    [7110052941,7110107795]
5    [7110052941,7110107795]
Name: clients_x, dtype: object

You need to use where , not mask (see the doc ):您需要使用where ,而不是mask (请参阅文档):

df["clients_x"] = df.clients_x.where(
  df.clients_x.astype(str).str.startswith("["), 
  "[" + df.clients_x + "]"
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM