简体   繁体   English

将新列添加到Pandas DataFrame,并用同一df的另一列填充第一个单词

[英]Add new column to Pandas DataFrame and fill with first word from another column from same df

I have a dataset of crimes reported by Gloucestershire Constabulary from 2011-16. 我拥有格罗斯特郡警察局从2011-16年报告的犯罪数据集。 It's a .csv file that I have imported to a Pandas dataframe. 这是我导入到Pandas数据框的.csv文件。 The data include a column stating the Lower Super Output Area (LSOA) in which the crime occurred, so for crimes in Tewkesbury, for instance, each record has the corresponding LSOA name, eg 'Tewkesbury 009D'; 数据包括指出犯罪发生的下超级输出区域(LSOA)的列,例如,对于图克斯伯里(Tewkesbury)的犯罪,每条记录都有对应的LSOA名称,例如“ Tewkesbury 009D”; 'Tewkesbury 009E'. 'Tewkesbury 009E'。

I want to group these data by the town/city they relate to, eg 'Gloucester', 'Tewkesbury', ignoring the specific LSOAs within each conurbation. 我想将这些数据按与之相关的城镇/城市进行分组,例如“ Gloucester”,“ Tewkesbury”,而忽略每个城市中特定的LSOA。 Ideally, I would append a new column to the dataframe, with just the place name copied across, and group on that. 理想情况下,我会将一个新列添加到数据框,其中仅复制地名,并在其上进行分组。 I am comfortable with how to do the grouping, just not the new column in the first place. 我对如何进行分组感到很满意,但首先不是新的列。 Any advice on how to do this is gratefully received. 非常感谢您提供有关如何执行此操作的任何建议。

I am no Pandas expert but I think you can do string slicing to strip out the last five digits (it supports regex too if I recall correctly, so you can do a proper 'search' if required). 我不是Pandas专家,但我认为您可以进行字符串切片以去除最后五个数字(如果我没记错的话,它也支持正则表达式,因此如果需要,可以进行适当的“搜索”)。

#x is the original dataframe
new_col = x.lsoa.str[:-5]    #lsoa is the column containing city names
pd.concat([x, new_col], axis=1)

The str method can be used to extract a string out of the lsoa column of the dataframe. str方法可用于从数据帧的lsoa列中提取字符串。

遵循以下原则应该可以:

df['town'] = [x.split()[0] for x in df['LSOA']]

You can use regex to extract the city name from the DataFrame and then join the result to the original DataFrame. 您可以使用正则表达式从DataFrame中提取城市名称,然后将结果加入到原始DataFrame中。 If your inital DataFrame is df 如果您的初始DataFrame是df

df = pd.DataFrame([ 'Tewkesbury 009D', 'Tewkesbury 009E'], columns=['LSOA'])
In [2]: df
Out[2]: 
              LSOA
0  Tewkesbury 009D
1  Tewkesbury 009E

Then you can extract the city name and optionally the LSOA code in to a new DataFrame df_new 然后,您可以将城市名称和LSOA代码(可选)提取到新的DataFrame df_new

df_new = df['LSOA'].str.extract('(\w*)\s(\d+\w*)', expand=True)

In [10]: df_new
Out[10]: 
            0     1
0  Tewkesbury  009D
1  Tewkesbury  009E

If you want to discard the code and just keep the city name remove the second bracket from the regex as '(\\w*)\\s\\d+\\w*' . 如果您想放弃代码而只保留城市名称,请从正则表达式中删除第二个括号为'(\\w*)\\s\\d+\\w*' Now you can append this result to the original DataFrame 现在您可以将此结果附加到原始DataFrame中

In [11]: df.join(df_new)
Out[11]: 
              LSOA           0     1
0  Tewkesbury 009D  Tewkesbury  009D
1  Tewkesbury 009E  Tewkesbury  009E

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何向 Pandas df 添加一个新列,该列从另一个数据帧返回同一组中较大的最小值 - How to add a new column to a pandas df that returns the smallest value that is greater in the same group from another dataframe 你能帮我从 pandas df 中找到一行内容到另一个 df 中,然后将发现的计数添加到第一个 df 的新列中吗? - Can you help me finding a row content from a pandas df into another df and then add the count of the findings into a new column of the first df? Pandas Dataframe:df 从另一个 df1 dataframe 添加列 - Pandas Dataframe: df adding a column from another df1 dataframe 根据来自另一个数据帧的行中的范围添加/填充 Pandas 列 - Add/fill pandas column based on range in rows from another dataframe Pandas df:用另一列中的特定值填充新列中的值(具有多列的条件) - Pandas df: fill values in new column with specific values from another column (condition with multiple columns) 向 pandas dataframe 添加一个新列,其中包含来自另一列的转换值? - Add a new column to pandas dataframe with coverted values from another column? Pandas.DataFrame:创建一个新列,使用当前df中的一列并在另一个df中查找一列,并进行计算 - Pandas.DataFrame: Create a new column, using one column from current df and by looking up one column in another df, with calculation Pandas - 基于来自同一 df 的动态列的新列 - Pandas - New column based on dynamic column from same df Python:将df的行数添加到另一个df作为新列 - Python: Add Count of Rows from df to Another df as New Column Pandas 从同一数据框中查找条件,然后添加到右侧作为新列 - Pandas lookup from same dataframe for criteria then add to right as new column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM