将新列添加到Pandas DataFrame，并用同一df的另一列填充第一个单词

Question

I have a dataset of crimes reported by Gloucestershire Constabulary from 2011-16. 我拥有格罗斯特郡警察局从2011-16年报告的犯罪数据集。 It's a .csv file that I have imported to a Pandas dataframe. 这是我导入到Pandas数据框的.csv文件。 The data include a column stating the Lower Super Output Area (LSOA) in which the crime occurred, so for crimes in Tewkesbury, for instance, each record has the corresponding LSOA name, eg 'Tewkesbury 009D'; 数据包括指出犯罪发生的下超级输出区域（LSOA）的列，例如，对于图克斯伯里（Tewkesbury）的犯罪，每条记录都有对应的LSOA名称，例如“ Tewkesbury 009D”； 'Tewkesbury 009E'. 'Tewkesbury 009E'。

I want to group these data by the town/city they relate to, eg 'Gloucester', 'Tewkesbury', ignoring the specific LSOAs within each conurbation. 我想将这些数据按与之相关的城镇/城市进行分组，例如“ Gloucester”，“ Tewkesbury”，而忽略每个城市中特定的LSOA。 Ideally, I would append a new column to the dataframe, with just the place name copied across, and group on that. 理想情况下，我会将一个新列添加到数据框，其中仅复制地名，并在其上进行分组。 I am comfortable with how to do the grouping, just not the new column in the first place. 我对如何进行分组感到很满意，但首先不是新的列。 Any advice on how to do this is gratefully received. 非常感谢您提供有关如何执行此操作的任何建议。

Answer 1

I am no Pandas expert but I think you can do string slicing to strip out the last five digits (it supports regex too if I recall correctly, so you can do a proper 'search' if required). 我不是Pandas专家，但我认为您可以进行字符串切片以去除最后五个数字（如果我没记错的话，它也支持正则表达式，因此如果需要，可以进行适当的“搜索”）。

#x is the original dataframe
new_col = x.lsoa.str[:-5]    #lsoa is the column containing city names
pd.concat([x, new_col], axis=1)

The str method can be used to extract a string out of the lsoa column of the dataframe. str方法可用于从数据帧的lsoa列中提取字符串。

Answer 2

遵循以下原则应该可以：

df['town'] = [x.split()[0] for x in df['LSOA']]

Answer 3

You can use regex to extract the city name from the DataFrame and then join the result to the original DataFrame. 您可以使用正则表达式从DataFrame中提取城市名称，然后将结果加入到原始DataFrame中。 If your inital DataFrame is df 如果您的初始DataFrame是df

df = pd.DataFrame([ 'Tewkesbury 009D', 'Tewkesbury 009E'], columns=['LSOA'])
In [2]: df
Out[2]: 
              LSOA
0  Tewkesbury 009D
1  Tewkesbury 009E

Then you can extract the city name and optionally the LSOA code in to a new DataFrame df_new 然后，您可以将城市名称和LSOA代码（可选）提取到新的DataFrame df_new

df_new = df['LSOA'].str.extract('(\w*)\s(\d+\w*)', expand=True)

In [10]: df_new
Out[10]: 
            0     1
0  Tewkesbury  009D
1  Tewkesbury  009E

If you want to discard the code and just keep the city name remove the second bracket from the regex as '(\\w*)\\s\\d+\\w*' . 如果您想放弃代码而只保留城市名称，请从正则表达式中删除第二个括号为'(\\w*)\\s\\d+\\w*' 。 Now you can append this result to the original DataFrame 现在您可以将此结果附加到原始DataFrame中

In [11]: df.join(df_new)
Out[11]: 
              LSOA           0     1
0  Tewkesbury 009D  Tewkesbury  009D
1  Tewkesbury 009E  Tewkesbury  009E

将新列添加到Pandas DataFrame，并用同一df的另一列填充第一个单词

问题描述

3 个解决方案

解决方案1
0 2017-04-22 19:49:49

解决方案2
0 2017-04-22 19:51:37

解决方案3
0 2017-04-22 20:07:39

将新列添加到Pandas DataFrame，并用同一df的另一列填充第一个单词

问题描述

3 个解决方案

解决方案1 0 2017-04-22 19:49:49

解决方案2 0 2017-04-22 19:51:37

解决方案3 0 2017-04-22 20:07:39

解决方案1
0 2017-04-22 19:49:49

解决方案2
0 2017-04-22 19:51:37

解决方案3
0 2017-04-22 20:07:39